Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

doi:10.18653/V1/P18-1208

Open AccessProceedings Article10.18653/V1/P18-1208

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

AmirAli Bagher Zadeh, +4 more

- 01 Jul 2018

- Vol. 1, pp 2236-2246

1.1K

TL;DR: This paper introduces CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI), the largest dataset of sentiment analysis and emotion recognition to date and uses a novel multimodal fusion technique called the Dynamic Fusion Graph (DFG), which is highly interpretable and achieves competative performance when compared to the previous state of the art.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.23919/iccas59377.2023.10316831

Feature Refinement via Canonical Correlation Analysis for Multimodal Emotion Recognition

Sunyoung Cho, +2 more

- 17 Oct 2023

TL;DR: This paper presents a method to identify inconsistent and noisy modality elements via Canonical Correlation Analysis for multimodal emotion recognition by first computes correlation scores between different modalities at the level of elements of each modality features.

...read moreread less

Journal Article•10.23919/ccc63176.2024.10661855

CRGMR: A Contextualized RGAT and GraphTransformer Method for Multimodal Emotion Recognition

Chen Guoshun, +4 more

- 28 Jul 2024

TL;DR: This paper proposes CRGMR, a contextualized method for multimodal emotion recognition, leveraging local speaker dependencies and global dialogue context through a heterogeneous graph, achieving state-of-the-art results on IEMOCAP and MELD datasets.

...read moreread less

•Posted Content

Unsupervised Multimodal Language Representations using Convolutional Autoencoders.

Panagiotis Koromilas, +1 more

- 06 Oct 2021

- arXiv: Computation and Language

TL;DR: In this paper, word-level aligned multimodal sequences are mapped to 2-D matrices and then CNNs are used to learn embeddings by combining multiple datasets.

...read moreread less

•Posted Content•10.48550/arxiv.2210.14556

Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal Prediction for Multimodal Sentiment Analysis

26 Oct 2022

TL;DR: In this paper , a multi-modal contrastive learning (MMCL) framework is proposed to capture intra-and inter-modality dynamics simultaneously, which can be used as the pillar of the skyscraper and benefit the model to extract the most important features contained in the multimodal data.

...read moreread less

Journal Article•10.1109/tit.2022.3207420

Estimating Structurally Similar Graphical Models

01 Feb 2023

- IEEE Transactions on Information Theory

TL;DR: In this article , the problem of estimating the structure of structurally similar graphical models in high dimensions was considered and sufficient conditions on the sample complexity for a bounded probability of error were characterized.

...read moreread less

...

Expand

References

•Journal Article•10.1023/A:1010933404324

Random Forests

Leo Breiman

- 01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

113.1K

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Journal Article•10.1023/A:1022627411411

Support-Vector Networks

Corinna Cortes, +1 more

- 15 Sep 1995

- Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

42K

Proceedings Article•10.3115/V1/D14-1162

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

- 01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

41.6K