Dimensional Emotion Prediction Based on Interactive Context in Conversation.

doi:10.21437/INTERSPEECH.2020-1820

Proceedings Article10.21437/INTERSPEECH.2020-1820

Dimensional Emotion Prediction Based on Interactive Context in Conversation.

Xiaohan Shi, +2 more

- 25 Oct 2020

- pp 4193-4197

8

TL;DR: This work investigates the effects of interactive information in four conversation situations on emotion prediction, in which emotional tendencies of interlocutors are consistent or inconsistent in both valence and arousal.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1145/3503161.3547831

Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks

Jingbei Li, +7 more

- 10 Oct 2022

TL;DR: A context modeling method is proposed which models the dependencies among the multi-modal information in context with multi-scale relational graph convolutional network (MSRGCN) and is utilized to infer the global and local speaking styles of the current utterance for speech synthesis.

...read moreread less

13

Journal Article•10.21437/interspeech.2023-1236

Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation

Xiaohan Shi, +2 more

- 20 Aug 2023

TL;DR: Emotion awareness in multi-utterance turn significantly improves emotion prediction in multi-speaker conversation by modeling potential emotional interactive information within a speaker’s multi-utterance turn.

...read moreread less

11

Journal Article•10.21437/interspeech.2024-2350

Multimodal Fusion of Music Theory-Inspired and Self-Supervised Representations for Improved Emotion Recognition

Xiaohan Shi, +2 more

- 01 Sep 2024

TL;DR: This study proposes a multimodal emotion recognition method combining self-supervised and music theory-inspired representations to capture emotional information, achieving state-of-the-art results with a 3.55% improvement over the baseline through a novel multimodal fusion approach.

...read moreread less

4

Journal Article•10.48550/arxiv.2311.07093

On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition

Xiaohan Shi, +3 more

- 13 Nov 2023

- arXiv.org

TL;DR: A new method for NSER is introduced by adopting the automatic speech recognition (ASR) model as a noise-robust feature extractor to eliminate non-vocal information in noisy speech.

...read moreread less

4

Journal Article•10.15293/2658-6762.2306.09

The unity of subjective predictors of speech actions as an indicator of the subject’s agency in learning: An empirical study

T S Vershinina, +1 more

- 31 Dec 2023

- Science for Education Today

TL;DR: The unity of subjective predictors of speech actions is an indicator of students’ agency in learning. The study revealed four types of students’ agency in learning based on the empirical data analysis.

...read moreread less

References

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Proceedings Article•10.3115/V1/D14-1179

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

- 01 Jan 2014

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

28.6K

Proceedings Article•10.18653/V1/N19-1423

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

24.6K

Journal Article•10.1007/S10579-008-9076-6

IEMOCAP: interactive emotional dyadic motion capture database

Carlos Busso, +8 more

- 05 Nov 2008

TL;DR: A new corpus named the “interactive emotional dyadic motion capture database” (IEMOCAP), collected by the Speech Analysis and Interpretation Laboratory at the University of Southern California (USC), which provides detailed information about their facial expressions and hand movements during scripted and spontaneous spoken communication scenarios.

...read moreread less

3.8K

Proceedings Article•10.1145/1873951.1874246

Opensmile: the munich versatile and fast open-source audio feature extractor

Florian Eyben, +2 more

- 25 Oct 2010

TL;DR: The openSMILE feature extraction toolkit is introduced, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities and has a modular, component based architecture which makes extensions via plug-ins easy.

...read moreread less

3K