Journal Article10.1109/TPAMI.2011.273
Cross-Domain Multicue Fusion for Concept-Based Video Indexing
Ming-Fang Weng,Yung-Yu Chuang +1 more
27
TL;DR: A framework that jointly exploits multiple cues across multiple video domains, and recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations, which achieves significant improvements over popular baselines.
read more
Abstract: The success of query-by-concept, proposed recently to cater to video retrieval needs, depends greatly on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or to extract an objective linguistic description from it because of the semantic gap, that is, the lack of correspondence between machine-extracted low-level features and human high-level conceptual interpretation. This paper studies three issues with the aim to reduce such a gap: 1) how to explore cues beyond low-level features, 2) how to combine diverse cues to improve performance, and 3) how to utilize the learned knowledge when applying it to a new domain. To solve these problems, we propose a framework that jointly exploits multiple cues across multiple video domains. First, recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations. Second, all concept labels for all shots are simultaneously refined in a single fusion model. Additionally, unseen shots are assigned pseudolabels according to their initial prediction scores so that contextual and temporal relationships can be learned, thus requiring no additional human effort. Integration of cues embedded within training and testing video sets accommodates domain change. Experiments on popular benchmarks show that our framework is effective, achieving significant improvements over popular baselines.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks
TL;DR: Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics.
466
A Survey of Content-Aware Video Analysis for Sports
TL;DR: This paper focuses on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges.
251
Multisensor Data Fusion
Hugh Durrant-Whyte,Thomas C. Henderson +1 more
- 01 Jan 2016
TL;DR: Multisensor data fusion is the process of combining observations from a number of different sensors to provide a robust and complete description of an environment or process of interest.
192
Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification
Zuxuan Wu,Yu-Gang Jiang,Jun Wang,Jian Pu,Xiangyang Xue +4 more
- 03 Nov 2014
TL;DR: A novel unified framework that jointly learns feature relationships and exploits the class relationships for improved video classification performance is proposed and demonstrates that the proposed framework exhibits superior performance over several state-of-the-art approaches.
184
Cross-camera knowledge transfer for multiview people counting.
TL;DR: A novel two-pass framework for counting the number of people in an environment, where multiple cameras provide different views of the subjects, and an algorithm that matches groups of pedestrians in images captured by different cameras is introduced.
71
References
Correlative multilabel video annotation with temporal kernels
TL;DR: This article proposes another paradigm of the video annotation method that simultaneously annotates the concepts as well as model correlations between them in one step by the proposed Correlative Multilabel (CML) method, which benefits from the compensation of complementary information between different labels.
Context-Based Concept Fusion with Boosted Conditional Random Fields
Wei Jiang,Shih-Fu Chang,Alexander C. Loui +2 more
- 15 Apr 2007
TL;DR: A new context-based concept fusion (CBCF) method for semantic concept detection by a conditional random field (CRF) that improves detection results from independent detectors by taking into account the inter-correlation among concepts.
Exploiting human actions and object context for recognition tasks
Darnell Moore,Irfan Essa,Monson H. Hayes +2 more
- 01 Jan 1999
TL;DR: This work introduces a framework for recognizing actions and objects by measuring image-, object- and action-based information from video, which is appropriate for locating and classifying objects under a variety of conditions including full occlusion.
Towards optimal bag-of-features for object categorization and semantic video retrieval
Yu-Gang Jiang,Chong-Wah Ngo,Jun Yang +2 more
- 09 Jul 2007
TL;DR: This paper evaluates various factors which govern the performance of Bag-of-features, and proposes a novel soft-weighting method to assess the significance of a visual word to an image and experimentally shows it can consistently offer better performance than other popular weighting methods.