Journal Article10.1109/TPAMI.2011.273
Cross-Domain Multicue Fusion for Concept-Based Video Indexing
Ming-Fang Weng,Yung-Yu Chuang +1 more
27
TL;DR: A framework that jointly exploits multiple cues across multiple video domains, and recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations, which achieves significant improvements over popular baselines.
read more
Abstract: The success of query-by-concept, proposed recently to cater to video retrieval needs, depends greatly on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or to extract an objective linguistic description from it because of the semantic gap, that is, the lack of correspondence between machine-extracted low-level features and human high-level conceptual interpretation. This paper studies three issues with the aim to reduce such a gap: 1) how to explore cues beyond low-level features, 2) how to combine diverse cues to improve performance, and 3) how to utilize the learned knowledge when applying it to a new domain. To solve these problems, we propose a framework that jointly exploits multiple cues across multiple video domains. First, recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations. Second, all concept labels for all shots are simultaneously refined in a single fusion model. Additionally, unseen shots are assigned pseudolabels according to their initial prediction scores so that contextual and temporal relationships can be learned, thus requiring no additional human effort. Integration of cues embedded within training and testing video sets accommodates domain change. Experiments on popular benchmarks show that our framework is effective, achieving significant improvements over popular baselines.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks
TL;DR: Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics.
466
A Survey of Content-Aware Video Analysis for Sports
TL;DR: This paper focuses on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges.
251
Multisensor Data Fusion
Hugh Durrant-Whyte,Thomas C. Henderson +1 more
- 01 Jan 2016
TL;DR: Multisensor data fusion is the process of combining observations from a number of different sensors to provide a robust and complete description of an environment or process of interest.
192
Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification
Zuxuan Wu,Yu-Gang Jiang,Jun Wang,Jian Pu,Xiangyang Xue +4 more
- 03 Nov 2014
TL;DR: A novel unified framework that jointly learns feature relationships and exploits the class relationships for improved video classification performance is proposed and demonstrates that the proposed framework exhibits superior performance over several state-of-the-art approaches.
184
Cross-camera knowledge transfer for multiview people counting.
TL;DR: A novel two-pass framework for counting the number of people in an environment, where multiple cameras provide different views of the subjects, and an algorithm that matches groups of pedestrians in images captured by different cameras is introduced.
71
References
Correlative multi-label video annotation
Guo-Jun Qi,Xian-Sheng Hua,Yong Rui,Jinhui Tang,Tao Mei,Hong-Jiang Zhang +5 more
- 29 Sep 2007
TL;DR: A third paradigm is proposed which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework and is compared with the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set.
Content--Based Multimedia Information Retrieval
黄丽娟
TL;DR: This paper introduces concepts and methods for content-based multimedia retrieval, analyzing characteristics, key technologies, system models, and various types of multimedia information retrieval, providing a comprehensive overview of the field.
Multimodal Video Indexing: A Review of the State-of-the-art
Cees G. M. Snoek,Marcel Worring +1 more
- 01 Jan 2001
TL;DR: In this paper, a unifying and multimodal framework is proposed to view a video document from the perspective of its author, which forms the guiding principle for identifying index types, for which automatic methods are found in literature.
VisualRank: Applying PageRank to Large-Scale Image Search
Yushi Jing,Shumeet Baluja +1 more
TL;DR: This work cast the image-ranking problem into the task of identifying "authority" nodes on an inferred visual similarity graph and proposes VisualRank to analyze the visual link structures among images and describes the techniques required to make this system practical for large-scale deployment in commercial search engines.
Intelligent Multimedia Group of Tsinghua University at TRECVID 2006
Jie Cao,Yanxiang Lan,Jianmin Li,Qiang Li,Xirong Li,Fuzong Lin,Xiaobing Liu,Linjie Luo,Wanli Peng,Dong Wang,Huiyi Wang,Zhikun Wang,Zhen Xiang,Jinhui Yuan,Bo Zhang,Jun Zhang,Leigang Zhang,Xiao Zhang,Wujie Zheng +18 more
- 01 Jan 2006
TL;DR: The results indicate that the weight and select fusion algorithm works surprisingly well, better than all variations of the RankBoost and the StackSVM fusion algorithm.