Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
Session Overview
Session
Session 7B: SS3: Multimedia Computing for Intelligent Life
Time:
Friday, 06/Jan/2017:
10:50am - 12:10pm

Session Chair: Wei Zhang
Location: V102
1st floor, 2nd room on left.

Session Abstract

Show help for 'Increase or decrease the abstract text size'
Presentations
10:50am - 11:10am

Compact CNN Based Video Representation for Efficient Video Copy Detection

Ling Wang, Yu Bao, Haojie Li, Xin Fan, Zhongxuan Luo

Dalian University of Technology, China

Many content-based video copy detection (CCD) systems have been proposed to identify the copies of a copyrighted video. Due to storage cost and retrieval response requirements, most CCD systems represent video contents using sparsely sampled features, which tends to lose information to some extend and thus results in unsatisfactory performance. In this paper, we propose a compact video representation based on convolutional neural network (CNN) and sparse coding (SC) for video copy detection. We first extract CNN features from the densely sampled video frames and then encode them into a fixed length vector via the SC method. The proposed representation presents two advantages. First, it is compact while is regardless of the sampling frame rate. Second, it is discriminative for video copy detection by encoding the densely sampled frames’ CNN features. We evaluate the performance of proposed representation on video copy detection over a real complex video dataset and marginal performance improvement has been achieved as compared to state-of-the-art CCD systems.


11:10am - 11:30am

i-Stylist: Finding the Right Dress Through Your Social Networks

Jordi Sanchez-Riera1, Jun-Ming Lin2, Kai-Lung Hua2, Wen-Huang Cheng1, Arvin Wen Tsui3

1Academia Sinica, Taiwan, Republic of China; 2Dept. of CSIE, National Taiwan University of Science and Technology; 3Industrial Technology Research Institute

Searching the Web has become an everyday task for most people. However, the presence of too much information can cause information overload. For example, when shopping online, a user can easily be overwhelmed by too many choices. To this end, we propose a personalized clothing recommendation system, namely i-Stylist, through the analysis of personal images in social networks. To access the available personal images of a user, the i-Stylist system extracts a number of characteristics from each clothing item such as CNN feature vectors and metadata such as color, material and pattern of the fabric. Then, these clothing items are organized as a fully connected graph to later infer the personalized probability distribution of how the user will like each clothing item in a shopping website. The user is able to modify the graph structure, e.g. adding and deleting vertices by giving feedback about the retrieved cloth- ing items. The i-Stylist system is compared against two other baselines and demonstrated to have better performance.


11:30am - 11:50am

Efficient multi-scale plane extraction based RGBD video segmentation

Hong Liu, Jun Wang, Xiangdong Wang, Yueliang Qian

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, China, People's Republic of

To improve the robustness and efficiency of RGBD video segmentation, we propose a novel video segmentation method combining multi-scale plane extraction and hierarchical graph-based video segmentation. Firstly, to reduce depth data noise, we extract plane structures of 3D RGBD point clouds in three levels including voxel, pixel and neighborhood with geometry and color features. To solve uneven distribution of depth data and object occlusion problem, we further propose multi-scale voxel based plane fusion algorithm and use amodal completion strategy to improve plane extraction performance. Then hierarchical graph-based RGBD video segmentation is used to segment the rest of the non-plane pixels. Finally, we fuse above plane extraction and video segmentation results to get final RGBD video scene segmentation results. The qualitative and quantitative results of plane extraction and RGBD scene video segmentation show the effectiveness of proposed methods.


11:50am - 12:10pm

Micro-expression Recognition by Aggregating Local Spatio-Temporal Patterns

Shiyu Zhang1,3, Bailan Feng2, Zhineng Chen3, Xiangsheng Huang3

1Beijing Institute of Technology, China, People's Republic of; 2Shannon Cognitive Computing Laboratory, 2012Labs, Huawei Technologies, Co.,Ltd.; 3Institute of Automation, Chinese Academy of Sciences, China, People's Republic of

Micro-expression is an extremely quick facial expression that reveals people’s hidden emotions, which has become one of the most important clues for lies as well as many other applications. Current methods mostly focus on the micro-expression recognition based on the simplified environment. This paper aims at developing a discriminative feature descriptor that are less sensitive to variants in pose, illumination, etc., and thus better implement the recognition task. Our novelty lies in the use of local statistical features from interest regions in which AUs (Action Units) indicate micro-expressions and the combination of these features for the recognition. To this end, we first use a face alignment algorithm to locate the face landmarks in each video frame. The positioned face is then divided to several specific regions (facial cubes) based on the location of the feature points. In the following, the movement tendency and intensity in each region are extracted using optical flow orientation histogram and Local Binary Patterns from Three Orthogonal Planes (LBP-TOP) feature respectively. The two kinds of features are concatenated region-by-region to generate the proposed local statistical descriptor. We evaluate the local descriptor using state-of-the-art classifiers in the experiments. It is observed that the proposed local statistical descriptor, which is located by the facial spatial distribution, can capture more detailed and representative information than the global features, and the fusion of different local features can inspire more characteristics of micro-expressions than the single feature, leading to better experimental results.



 
Contact and Legal Notice · Contact Address:
Conference: MMM2017
Conference Software - ConfTool Pro 2.6.107+TC
© 2001 - 2017 by H. Weinreich, Hamburg, Germany