Proceedings Article10.1109/CVPR.2017.340
HOPE: Hierarchical Object Prototype Encoding for Efficient Object Instance Search in Videos
Tan Yu,Yuwei Wu,Junsong Yuan +2 more
- 01 Jul 2017
- pp 3195-3204
TL;DR: This paper presents a simple yet effective hierarchical object prototype encoding (HOPE) model to accelerate the object instance search without sacrificing accuracy, which exploits both the spatial and temporal self-similarity property existing in object proposals generated from video frames.
read more
Abstract: This paper tackles the problem of efficient and effective object instance search in videos. To effectively capture the relevance between a query and video frames and precisely localize the particular object, we leverage the object proposals to improve the quality of object instance search in videos. However, hundreds of object proposals obtained from each frame could result in unaffordable memory and computational cost. To this end, we present a simple yet effective hierarchical object prototype encoding (HOPE) model to accelerate the object instance search without sacrificing accuracy, which exploits both the spatial and temporal self-similarity property existing in object proposals generated from video frames. We design two types of sphere k-means methods, i.e., spatially-constrained sphere k-means and temporally-constrained sphere k-means to learn frame-level object prototypes and dataset-level object prototypes, respectively. In this way, the object instance search problem is cast to the sparse matrix-vector multiplication problem. Thanks to the sparsity of the codes, both the memory and computational cost are significantly reduced. Experimental results on two video datasets demonstrate that our approach significantly improves the performance of video object instance search over other state-of-the-art fast search schemes.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Product Quantization Network for Fast Image Retrieval
Tan Yu,Junsong Yuan,Chen Fang,Hailin Jin +3 more
- 08 Sep 2018
TL;DR: Through the proposed product quantization network, the author can obtain a discriminative and compact image representation in an end-to-end manner, which further enables a fast and accurate image retrieval.
GilBERT: Generative Vision-Language Pre-Training for Image-Text Retrieval
Weixiang Hong,Kaixiang Ji,Jiajia Liu,Wang Jian,Jingdong Chen,Wei Chu +5 more
- 11 Jul 2021
TL;DR: Zhang et al. as mentioned in this paper proposed a generative visual-linguistic pre-training approach to simultaneously learn generic representations of image-text data and complete the missing modality for incomplete pairs.
36
Fried Binary Embedding for High-Dimensional Visual Features
Weixiang Hong,Junsong Yuan,Sreyasee Das Bhattacharjee +2 more
- 01 Jul 2017
TL;DR: This paper introduces a new type of binary embedding method, called "d-dimensional embedding", which automates the very labor-intensive and therefore high computational and memory cost of projecting high-dimensional visual features in binary codes.
Data-Driven Lightweight Interest Point Selection for Large-Scale Visual Search
TL;DR: This paper proposes a data-driven lightweight interest point selection approach to significantly improve the performance of visual search, while ameliorating the efficiency of extracting feature descriptors.
9
References
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
51.9K
Model selection and estimation in regression with grouped variables
Ming Yuan,Yi Lin +1 more
TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Selective Search for Object Recognition
TL;DR: This paper introduces selective search which combines the strength of both an exhaustive search and segmentation, and shows that its selective search enables the use of the powerful Bag-of-Words model for recognition.
Locality-sensitive hashing scheme based on p-stable distributions
Mayur Datar,Nicole Immorlica,Piotr Indyk,Vahab Mirrokni +3 more
- 08 Jun 2004
TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.