Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

doi:10.18653/V1/P18-1208

Open AccessProceedings Article10.18653/V1/P18-1208

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

AmirAli Bagher Zadeh, +4 more

- 01 Jul 2018

- Vol. 1, pp 2236-2246

1.1K

TL;DR: This paper introduces CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI), the largest dataset of sentiment analysis and emotion recognition to date and uses a novel multimodal fusion technique called the Dynamic Fusion Graph (DFG), which is highly interpretable and achieves competative performance when compared to the previous state of the art.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

10.1109/access.2023.3280187.

VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning.

Yanan Wang, +3 more

TL;DR: This paper proposes VideoAdviser, a video knowledge distillation method for multimodal transfer learning, achieving high efficiency-performance by transferring multimodal knowledge from a CLIP-based teacher to a RoBERTa-based student, improving performance by up to 12.3% in sentiment analysis and 3.4% in audio-visual retrieval.

...read moreread less

Journal Article•10.1016/J.PATREC.2021.03.025

Cross-modal context-gated convolution for multi-modal sentiment analysis

Huanglu Wen, +2 more

- 01 Jun 2021

- Pattern Recognition Letters

TL;DR: This work proposes cross-modal context-gated convolution for unaligned sequences, which captures the local cross- modal interactions, dealing with the misalignment while reducing the effect of unrelated information.

...read moreread less

•Proceedings Article•10.18653/V1/2021.NAACL-MAIN.216

MUSER: MUltimodal Stress detection using Emotion Recognition as an Auxiliary Task

Yiqun Yao, +4 more

- 01 Jun 2021

TL;DR: This work proposes MUSER – a transformer-based model architecture and a novel multi-task learning algorithm with speed-based dynamic sampling strategy that is effective for stress detection with both internal and external auxiliary tasks, and achieves state-of-the-art results.

...read moreread less

•Journal Article•10.7717/PEERJ-CS.246

Linking emotions to behaviors through deep transfer learning.

Haoqi Li, +2 more

- 06 Jan 2020

- PeerJ

TL;DR: Through the analysis, it is found that emotion-related information is an important cue for behavior recognition and the importance of emotional-context in the expression of behavior is investigated by constraining (or not) the neural networks’ contextual view of the data.

...read moreread less

Proceedings Article•10.1109/icassp48485.2024.10446040

A Novel Multimodal Sentiment Analysis Model Based on Gated Fusion and Multi-Task Learning

Xin Sun, +2 more

- 14 Apr 2024

TL;DR: This work proposes a novel model for multimodal sentiment analysis based on gated fusion and multi-task learning that outperforms the existing methods and achieves the state-of-the art performance.

...read moreread less

...

Expand

References

•Journal Article•10.1023/A:1010933404324

Random Forests

Leo Breiman

- 01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

113.1K

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Journal Article•10.1023/A:1022627411411

Support-Vector Networks

Corinna Cortes, +1 more

- 15 Sep 1995

- Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

42K

Proceedings Article•10.3115/V1/D14-1162

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

- 01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

41.6K