Proceedings Article10.21437/INTERSPEECH.2020-1820
Dimensional Emotion Prediction Based on Interactive Context in Conversation.
Xiaohan Shi,Sixia Li,Jianwu Dang +2 more
- 25 Oct 2020
- pp 4193-4197
8
TL;DR: This work investigates the effects of interactive information in four conversation situations on emotion prediction, in which emotional tendencies of interlocutors are consistent or inconsistent in both valence and arousal.
read more
Abstract: Emotion prediction in conversation is important for humans to conduct a fluent conversation, which is an underexplored research topic in the affective computing area. In previous studies, predicting the coming emotion only considered the context information from one single speaker. However, there are two sides of the speaker and listener in interlocutors, and their emotions are influenced by one another during the conversation. For this reason, we propose a dimensional emotion prediction model based on interactive information in conversation from both interlocutors. We investigate the effects of interactive information in four conversation situations on emotion prediction, in which emotional tendencies of interlocutors are consistent or inconsistent in both valence and arousal. The results showed that the proposed method performance better by considering the interactive context information than the ones considering one single side alone. The prediction result is affected by the conversation situations. In the situation interlocutors have consistent emotional tendency in valence and inconsistent tendency in arousal, the prediction performance of valence is the best. In the situation that interlocutors’ emotional tendency is inconsistent in both valence and arousal, the prediction performance of arousal is the best.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks
Jingbei Li,Yi Meng,Xixin Wu,Zhiyong Wu,Jia Jia,Helen Meng,Qiao Tian,Yuping Wang +7 more
- 10 Oct 2022
TL;DR: A context modeling method is proposed which models the dependencies among the multi-modal information in context with multi-scale relational graph convolutional network (MSRGCN) and is utilized to infer the global and local speaking styles of the current utterance for speech synthesis.
Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation
Xiaohan Shi,Xingfeng Li,Tomoki Toda +2 more
- 20 Aug 2023
TL;DR: Emotion awareness in multi-utterance turn significantly improves emotion prediction in multi-speaker conversation by modeling potential emotional interactive information within a speaker’s multi-utterance turn.
11
Multimodal Fusion of Music Theory-Inspired and Self-Supervised Representations for Improved Emotion Recognition
Xiaohan Shi,Xingfeng Li,Tomoki Toda +2 more
- 01 Sep 2024
TL;DR: This study proposes a multimodal emotion recognition method combining self-supervised and music theory-inspired representations to capture emotional information, achieving state-of-the-art results with a 3.55% improvement over the baseline through a novel multimodal fusion approach.
4
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition
Xiaohan Shi,Jiajun He,Xingfeng Li,Tomoki Toda +3 more
TL;DR: A new method for NSER is introduced by adopting the automatic speech recognition (ASR) model as a noise-robust feature extractor to eliminate non-vocal information in noisy speech.
The unity of subjective predictors of speech actions as an indicator of the subject’s agency in learning: An empirical study
T S Vershinina,N V Zhukova +1 more
TL;DR: The unity of subjective predictors of speech actions is an indicator of students’ agency in learning. The study revealed four types of students’ agency in learning based on the empirical data analysis.
References
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
- 01 Jan 2014
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova +3 more
- 11 Oct 2018
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
24.6K
IEMOCAP: interactive emotional dyadic motion capture database
Carlos Busso,Murtaza Bulut,Chi-Chun Lee,Abe Kazemzadeh,Emily Mower,Samuel Kim,Jeannette N. Chang,Sungbok Lee,Shrikanth S. Narayanan +8 more
- 05 Nov 2008
TL;DR: A new corpus named the “interactive emotional dyadic motion capture database” (IEMOCAP), collected by the Speech Analysis and Interpretation Laboratory at the University of Southern California (USC), which provides detailed information about their facial expressions and hand movements during scripted and spontaneous spoken communication scenarios.
Opensmile: the munich versatile and fast open-source audio feature extractor
Florian Eyben,Martin Wöllmer,Björn Schuller +2 more
- 25 Oct 2010
TL;DR: The openSMILE feature extraction toolkit is introduced, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities and has a modular, component based architecture which makes extensions via plug-ins easy.
3K