Cross-Domain Multicue Fusion for Concept-Based Video Indexing

doi:10.1109/TPAMI.2011.273

Journal Article10.1109/TPAMI.2011.273

Cross-Domain Multicue Fusion for Concept-Based Video Indexing

Ming-Fang Weng, +1 more

- 01 Oct 2012

- IEEE Transactions on Pattern Analysis an...

- Vol. 34, Iss: 10, pp 1927-1941

27

TL;DR: A framework that jointly exploits multiple cues across multiple video domains, and recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations, which achieves significant improvements over popular baselines.

Abstract: The success of query-by-concept, proposed recently to cater to video retrieval needs, depends greatly on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or to extract an objective linguistic description from it because of the semantic gap, that is, the lack of correspondence between machine-extracted low-level features and human high-level conceptual interpretation. This paper studies three issues with the aim to reduce such a gap: 1) how to explore cues beyond low-level features, 2) how to combine diverse cues to improve performance, and 3) how to utilize the learned knowledge when applying it to a new domain. To solve these problems, we propose a framework that jointly exploits multiple cues across multiple video domains. First, recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations. Second, all concept labels for all shots are simultaneously refined in a single fusion model. Additionally, unseen shots are assigned pseudolabels according to their initial prediction scores so that contextual and temporal relationships can be learned, thus requiring no additional human effort. Integration of cues embedded within training and testing video sets accommodates domain change. Experiments on popular benchmarks show that our framework is effective, achieving significant improvements over popular baselines.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/TPAMI.2017.2670560

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

Yu-Gang Jiang, +4 more

- 01 Feb 2018

- IEEE Transactions on Pattern Analysis an...

TL;DR: Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics.

...read moreread less

466

•Journal Article•10.1109/TCSVT.2017.2655624

A Survey of Content-Aware Video Analysis for Sports

Huang-Chia Shih

- 01 May 2018

- IEEE Transactions on Circuits and System...

TL;DR: This paper focuses on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges.

...read moreread less

251

Book Chapter•10.1007/978-3-319-32552-1_35

Multisensor Data Fusion

Hugh Durrant-Whyte, +1 more

- 01 Jan 2016

TL;DR: Multisensor data fusion is the process of combining observations from a number of different sensors to provide a robust and complete description of an environment or process of interest.

...read moreread less

192

Proceedings Article•10.1145/2647868.2654931

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification

Zuxuan Wu, +4 more

- 03 Nov 2014

TL;DR: A novel unified framework that jointly learns feature relationships and exploits the class relationships for improved video classification performance is proposed and demonstrates that the proposed framework exhibits superior performance over several state-of-the-art approaches.

...read moreread less

184

Journal Article•10.1109/TIP.2014.2363445

Cross-camera knowledge transfer for multiview people counting.

Nick C. Tang, +3 more

- 01 Jan 2015

- IEEE Transactions on Image Processing

TL;DR: A novel two-pass framework for counting the number of people in an environment, where multiple cameras provide different views of the subjects, and an algorithm that matches groups of pedestrians in images captured by different cameras is introduced.

...read moreread less

71

...

Expand

References

•Proceedings Article•10.1145/1291233.1291245

Correlative multi-label video annotation

Guo-Jun Qi, +5 more

- 29 Sep 2007

TL;DR: A third paradigm is proposed which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework and is compared with the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set.

...read moreread less

10.3969/j.issn.1002-1965.2000.05.011

Content--Based Multimedia Information Retrieval

黄丽娟

TL;DR: This paper introduces concepts and methods for content-based multimedia retrieval, analyzing characteristics, key technologies, system models, and various types of multimedia information retrieval, providing a comprehensive overview of the field.

...read moreread less

Multimodal Video Indexing: A Review of the State-of-the-art

Cees G. M. Snoek, +1 more

- 01 Jan 2001

TL;DR: In this paper, a unifying and multimodal framework is proposed to view a video document from the perspective of its author, which forms the guiding principle for identifying index types, for which automatic methods are found in literature.

...read moreread less

Journal Article•10.1109/TPAMI.2008.121

VisualRank: Applying PageRank to Large-Scale Image Search

Yushi Jing, +1 more

- 01 Nov 2008

- IEEE Transactions on Pattern Analysis an...

TL;DR: This work cast the image-ranking problem into the task of identifying "authority" nodes on an inferred visual similarity graph and proposes VisualRank to analyze the visual link structures among images and describes the techniques required to make this system practical for large-scale deployment in commercial search engines.

...read moreread less

...

Expand

Cross-Domain Multicue Fusion for Concept-Based Video Indexing

Chat with Paper

AI Agents for this Paper

Citations

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

A Survey of Content-Aware Video Analysis for Sports

Multisensor Data Fusion

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification

Cross-camera knowledge transfer for multiview people counting.

References

Correlative multi-label video annotation

Content--Based Multimedia Information Retrieval

Multimodal Video Indexing: A Review of the State-of-the-art

VisualRank: Applying PageRank to Large-Scale Image Search

Intelligent Multimedia Group of Tsinghua University at TRECVID 2006

Related Papers (5)

Distinctive Image Features from Scale-Invariant Keypoints

Early versus late fusion in semantic video analysis

Convex multi-task feature learning

Concept-Based Video Retrieval

Multiple Features But Few Labels?: A Symbiotic Solution Exemplified for Video Analysis