Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
M3LH: Multi-Modal Multi-Label Hashing for Large Scale Data Search
Guan-Qun Yang, Xin-Shun Xu, Shanqing Guo, Xiao-Lin Wang
Shandong University, China, People's Republic of
Recently, hashing based technique has attracted much attention in media search community. In many applications, data have multiple modalities and multiple labels. Many hashing methods have been proposed for multi-modal data; however, they seldom consider the scenario of multiple labels or only use such information to build a simple similarity matrix, e.g., the corresponding value is 1 when two samples share at least one same label. Apparently, such methods cannot make full use of the information contained in multiple labels. Thus, a model is expected to have good performance if it can make full use of information in multi-modal and multi-label data. Motivated by this, in this paper,we propose a new method, multi-modal multi-label hashing--M3LH, which can not only work on multi-modal data, but also make full use of information contained in multiple labels. Specifically, in M3LH, we assume every label is associated with a binary code in Hamming space, and the binary code of a sample can be generated by combining the binary codes of its labels. While minimizing the Hamming distance between similar pairs and maximizing the hamming distance between dissimilar pairs, we also learn a project matrix which can be used to generate binary codes for out-of-samples. Experimental results on three three widely used data sets show that M3LH outperforms or is comparable to several state-of-the-art hashing methods.
11:10am - 11:30am
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Giorgos Kordopatis-Zilos1, Symeon Papadopoulos1, Ioannis Patras2, Yiannis Kompatsiaris1
1Centre for Research and Technology Hellas, Greece; 2Queen Mary University of London
The problem of near-duplicate video retrieval has attracted increasing interest due to the the exponential growth of video content on the Web, which exhibits considerable amounts of near duplicity. Thus, it is essential to come up with efficient approaches to tackle this problem. Motivated by the outstanding performance of Convolutional Neural Networks (CNNs) over a wide variety of computer vision problems, we leverage the intermediate CNN features in a novel global video representation by means of a layer-based feature aggregation scheme. We perform extensive experiments on the widely used CC_WEB_VIDEO dataset, evaluating three popular deep architectures (AlexNet, VGGNet, GoogLeNet) and demonstrating that the proposed approach exhibits superior performance over the state-of-the-art, achieving a mean Average Precision (mAP) score of 0.976, which is a 2.5\% relative improvement compared to the best competing approach.
11:30am - 11:50am
Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network
Wu Feng, Dong Liu
Univ Sci Tech China, China, People's Republic of
Image recognition using deep network models has achieved remarkable progress in recent years. However, fine-grained recognition such as dog breed classification remains a big challenge due to the lack of large-scale well labeled dataset to train the network. In this paper, we study a deep network based method for dog breed recognition by utilizing the click-through logs from search engines. We use both click times and probability values to filter out the noise in click-through logs. Furthermore, we propose a deep siamese network model to fine-tune the classifier, emphasizing the subtle difference between different breeds and tolerating the variation within the same breed. Our method is evaluated by training with the Bing clickture-dog dataset and testing with the well labeled dog breed dataset. The results demonstrate great improvement achieved by our method compared with naive training.
11:50am - 12:10pm
No-Reference Image Quality Assessment based on Internal Generative Mechanism
Xinchun Qian, Wengang Zhou, Houqiang Li
University of Science and Technology of China, China, People's Republic of
No-reference (NR) image quality assessment (IQA) research aims to measure the visual quality of a distorted image without access to its non-distorted reference image. Recent neuroscience research indicates that human visual system (HVS) perceives and understands perceptual signals with an internal generative mechanism (IGM). Based on the IGM, we propose a novel and effective no-reference IQA framework in this paper. First, we decompose an image into an orderly part and a disorderly one using a computational prediction model. Then we extract the joint statistics of two local contrast features from the orderly part and local binary pattern (LBP) based structural distributions from the other part, respectively. And finally, two groups of features extracted from the complementary parts are combined to train a regression model for image quality estimation. Extensive experiments on some standard databases validate that the proposed IQA method shows highly competitive performance to state-of-the-art NR-IQA ones. Moreover, the proposed metric also demonstrates its effectiveness on the multiply-distorted images.