Cross-Domain Multicue Fusion for Concept-Based Video Indexing

doi:10.1109/TPAMI.2011.273

Journal Article10.1109/TPAMI.2011.273

Cross-Domain Multicue Fusion for Concept-Based Video Indexing

Ming-Fang Weng, +1 more

- 01 Oct 2012

- IEEE Transactions on Pattern Analysis an...

- Vol. 34, Iss: 10, pp 1927-1941

27

TL;DR: A framework that jointly exploits multiple cues across multiple video domains, and recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations, which achieves significant improvements over popular baselines.

Abstract: The success of query-by-concept, proposed recently to cater to video retrieval needs, depends greatly on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or to extract an objective linguistic description from it because of the semantic gap, that is, the lack of correspondence between machine-extracted low-level features and human high-level conceptual interpretation. This paper studies three issues with the aim to reduce such a gap: 1) how to explore cues beyond low-level features, 2) how to combine diverse cues to improve performance, and 3) how to utilize the learned knowledge when applying it to a new domain. To solve these problems, we propose a framework that jointly exploits multiple cues across multiple video domains. First, recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations. Second, all concept labels for all shots are simultaneously refined in a single fusion model. Additionally, unseen shots are assigned pseudolabels according to their initial prediction scores so that contextual and temporal relationships can be learned, thus requiring no additional human effort. Integration of cues embedded within training and testing video sets accommodates domain change. Experiments on popular benchmarks show that our framework is effective, achieving significant improvements over popular baselines.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/TPAMI.2017.2670560

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

Yu-Gang Jiang, +4 more

- 01 Feb 2018

- IEEE Transactions on Pattern Analysis an...

TL;DR: Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics.

...read moreread less

466

•Journal Article•10.1109/TCSVT.2017.2655624

A Survey of Content-Aware Video Analysis for Sports

Huang-Chia Shih

- 01 May 2018

- IEEE Transactions on Circuits and System...

TL;DR: This paper focuses on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges.

...read moreread less

251

Book Chapter•10.1007/978-3-319-32552-1_35

Multisensor Data Fusion

Hugh Durrant-Whyte, +1 more

- 01 Jan 2016

TL;DR: Multisensor data fusion is the process of combining observations from a number of different sensors to provide a robust and complete description of an environment or process of interest.

...read moreread less

192

Proceedings Article•10.1145/2647868.2654931

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification

Zuxuan Wu, +4 more

- 03 Nov 2014

TL;DR: A novel unified framework that jointly learns feature relationships and exploits the class relationships for improved video classification performance is proposed and demonstrates that the proposed framework exhibits superior performance over several state-of-the-art approaches.

...read moreread less

184

Journal Article•10.1109/TIP.2014.2363445

Cross-camera knowledge transfer for multiview people counting.

Nick C. Tang, +3 more

- 01 Jan 2015

- IEEE Transactions on Image Processing

TL;DR: A novel two-pass framework for counting the number of people in an environment, where multiple cameras provide different views of the subjects, and an algorithm that matches groups of pedestrians in images captured by different cameras is introduced.

...read moreread less

71

...

Expand

References

•Book

Numerical recipes in C (2nd ed.): the art of scientific computing

William H. Press, +3 more

- 01 Dec 1992

•Journal Article•10.1145/1404880.1404883

Correlative multilabel video annotation with temporal kernels

Guo-Jun Qi, +6 more

- 30 Oct 2008

- ACM Transactions on Multimedia Computing...

TL;DR: This article proposes another paradigm of the video annotation method that simultaneously annotates the concepts as well as model correlations between them in one step by the proposed Correlative Multilabel (CML) method, which benefits from the compensation of complementary information between different labels.

...read moreread less

Proceedings Article•10.1109/ICASSP.2007.366066

Context-Based Concept Fusion with Boosted Conditional Random Fields

Wei Jiang, +2 more

- 15 Apr 2007

TL;DR: A new context-based concept fusion (CBCF) method for semantic concept detection by a conditional random field (CRF) that improves detection results from independent detectors by taking into account the inter-correlation among concepts.

...read moreread less

•Proceedings Article•10.1109/ICCV.1999.791201

Exploiting human actions and object context for recognition tasks

Darnell Moore, +2 more

- 01 Jan 1999

TL;DR: This work introduces a framework for recognizing actions and objects by measuring image-, object- and action-based information from video, which is appropriate for locating and classifying objects under a variety of conditions including full occlusion.

...read moreread less

Proceedings Article•10.1145/1282280.1282352

Towards optimal bag-of-features for object categorization and semantic video retrieval

Yu-Gang Jiang, +2 more

- 09 Jul 2007

TL;DR: This paper evaluates various factors which govern the performance of Bag-of-features, and proposes a novel soft-weighting method to assess the significance of a visual word to an image and experimentally shows it can consistently offer better performance than other popular weighting methods.

...read moreread less

...

Expand

Cross-Domain Multicue Fusion for Concept-Based Video Indexing

Chat with Paper

AI Agents for this Paper

Citations

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

A Survey of Content-Aware Video Analysis for Sports

Multisensor Data Fusion

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification

Cross-camera knowledge transfer for multiview people counting.

References

Numerical recipes in C (2nd ed.): the art of scientific computing

Correlative multilabel video annotation with temporal kernels

Context-Based Concept Fusion with Boosted Conditional Random Fields

Exploiting human actions and object context for recognition tasks

Towards optimal bag-of-features for object categorization and semantic video retrieval

Related Papers (5)

Distinctive Image Features from Scale-Invariant Keypoints

Early versus late fusion in semantic video analysis

Convex multi-task feature learning

Concept-Based Video Retrieval

Multiple Features But Few Labels?: A Symbiotic Solution Exemplified for Video Analysis