Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
Session Overview
Session
Session 3A: Multimedia Classification
Time:
Thursday, 05/Jan/2017:
9:00am - 10:20am

Session Chair: Tat-Seng Chua
Location: V101
1st floor, 1st room on left.

Show help for 'Increase or decrease the abstract text size'
Presentations
9:00am - 9:20am

Supervised Class Graph Preserving Hashing for Image Retrieval and Classification

Lu Feng, Xin-Shun Xu, Shanqing Guo, Xiao-Lin Wang

Shandong University, China, People's Republic of

With the explosive growth of data, hashing-based techniques have attracted significant attention due to their efficient retrieval and storage reduction ability. However, most hashing methods do not have the ability of predicting the labels directly. In this paper, we propose a novel supervised hashing approach, namely Class Graph Preserving Hashing (CGPH), which can well incorporate label information into hashing codes and classify the samples with binary codes directly. Specifically, CGPH learns hashing functions by ensuring label consistency and preserving class graph similarity among hashing codes simultaneously. Then, it learns effective binary codes through orthogonal transformation by minimizing the quantization error between hashing function and binary codes. In addition, an iterative method is proposed for the optimization problem in CGPH. Extensive experiments on two large scale real-world image data sets show that CGPH outperforms or is comparable to state-of-the-art hashing methods in both image retrieval and classification tasks.


9:20am - 9:40am

Graph-Based Multimodal Music Mood Classification in Discriminative Latent Space

Feng Su, Hao Xue

Nanjing University, China, People's Republic of

Automatic music mood classification is an important and challenging problem in the field of music information retrieval (MIR) and has attracted growing attention from variant research areas. In this paper, we proposed a novel multimodal method for music mood classification that exploits the complementarity of the lyrics and audio information of music to enhance the classification accuracy. We first extract descriptive sentence-level lyrics and audio features from the music. Then, we project the paired low-level features of two different modalities into a learned common discriminative latent space, which not only eliminates between modality heterogeneity, but also increases the discriminability of the resulting descriptions. On the basis of the latent representation of music, we employ a graph learning based multi-modal classification model for music mood, which takes the cross-modality similarity between local audio and lyrics descriptions of music into account for effective exploitation of correlations between different modalities. The acquired predictions of mood category for every sentence of music are then aggregated by a simple voting scheme. The effectiveness of the proposed method has been demonstrated in the experiments on a real dataset composed of more than 3,000 minutes of music and corresponding lyrics.


9:40am - 10:00am

Robust Image Classification via Low-Rank Double Dictionary Learning

Yi Rong1,2, Shengwu Xiong1, Yongsheng Gao2

1Wuhan University of Technology, China; 2Griffith University, Australia

In recent years, dictionary learning has been widely used in various image classification applications. However, how to construct an effective dictionary for robust image classification task, in which both the training and the testing image samples are corrupted, is still an open problem. To address this, we propose a novel low-rank double dictionary learning (LRD2L) method. Unlike traditional dictionary learning methods, LRD2L simultaneously learns three components from training data: 1) a low-rank class-specific sub-dictionary for each class to capture the most discriminative features owned by each class, 2) a low-rank class-shared dictionary which models the common patterns shared by different classes and 3) a sparse error container to fit the noises in data. As a result, the class-specific information, the class-shared information and the noises contained in data are separated from each other. Therefore, the dictionaries learned by LRD2L are noiseless, and the class-specific sub-dictionary of each class can be more discriminative. Also since the common features across different classes, which are essential to the reconstruction of image samples, are preserved in class-shared dictionary, LRD2L has a powerful reconstructive capability for newly coming testing samples. Experimental results on three public available datasets reveal the effectiveness and the superiority of our approach compared to the state-of-the-art dictionary learning methods.


10:00am - 10:20am

Large-Scale Product Classification via Spatial Attention based CNN Learning and Multi-Class Regression

Shanshan Ai1, Caiyan Jia1, Zhineng Chen2

1School of Computer and Information Technology, Beijing Jiaotong University; 2Institute of Automation, Chinese Academy of Sciences

Large-scale product classification is an essential technique for better product understanding. It is possible to provide support to online retailers from a number of aspects. This paper discusses the CNN based product classification with the existence of a class hierarchy. A SaCNN-MCR method is developed to settle this problem. It decomposes the classification into two stages. Firstly, a spatial attention based CNN model that directly classifies an image to the leaf classes is proposed. Then, the outputted CNN score together with the class hierarchy cluesare jointly optimized by employing a multi-class regression (MCR) based refinement. The introduced spatial attention learning can guide the network parameter learning to focus more on product region rather the whole image, while the MCR based refinement provides another kind of data fitting that further benefits the classification. Experiments on nearly one million real-world product images show that, based on the two innovations, SaCNN-MCR steadily improves the classification performance over CNN models without these modules. Moreover, the CNN feature characterizes product images much better than the traditional feature, whose classification performance outperforms that of the traditional feature by a large margin.



 
Contact and Legal Notice · Contact Address:
Conference: MMM2017
Conference Software - ConfTool Pro 2.6.107+TC
© 2001 - 2017 by H. Weinreich, Hamburg, Germany