Discovering emotion and reasoning its flip in multi-party conversations using masked memory network and transformer
TL;DR: In this paper , a Transformer-based network was proposed to identify past utterances which have triggered a speaker's emotional state to flip at a certain time in a multi-party conversation.
read more
Abstract: Efficient discovery of a speaker’s emotional states in a multi-party conversation is significant to design human-like conversational agents. During a conversation, the cognitive state of a speaker often alters due to certain past utterances, which may lead to a flip in their emotional state. Therefore, discovering the reasons (triggers) behind the speaker’s emotion-flip during a conversation is essential to explain the emotion labels of individual utterances. In this paper, along with addressing the task of emotion recognition in conversations (ERC), we introduce a novel task – Emotion-Flip Reasoning (EFR), that aims to identify past utterances which have triggered one’s emotional state to flip at a certain time. We propose a masked memory network to address the former and a Transformer-based network for the latter task. To this end, we consider MELD, a benchmark emotion recognition dataset in multi-party conversations for the task of ERC, and augment it with new ground-truth labels for EFR. An extensive comparison with five state-of-the-art models suggests improved performances of our models for both the tasks. We further present anecdotal evidence and both qualitative and quantitative error analyses to support the superiority of our models compared to the baselines.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations
TL;DR: In this paper, a Hindi-English code-mixed dataset, MaSaC, was developed for sarcasm detection and humor classification in conversational dialog, which to our knowledge is the first dataset of its kind.
59
Muformer: A long sequence time-series forecasting model based on modified multi-head attention
TL;DR: Zhang et al. as discussed by the authors proposed an efficient transformer-based predictive model called Muformer, which includes an input multiple perceptual domain (MPD) processing mechanism, which can process a single input data into N outputs of different perceptual domains, thereby playing a role in feature enhancement.
29
MEConformer: Highly representative embedding extractor for speaker verification via incorporating selective convolution into deep speaker encoder
Qiuyu Zheng,Zengzhao Chen,Zhifeng Wang,Hai Liu,Mengting Lin +4 more
TL;DR: This paper proposes MEConformer, a novel speaker verification model that combines CNN and transformer architectures to extract highly representative speaker embeddings, achieving state-of-the-art EER of 3.72% on VoxCeleb1 and 5.94% on VoxCeleb1-H datasets.
8
Context or Knowledge is Not Always Necessary: A Contrastive Learning Framework for Emotion Recognition in Conversations
Geng Tu,Bin Liang,Ruibin Mao,Min Yang,Ruifeng Xu +4 more
- 01 Jan 2023
TL;DR: A contrastive learning framework, CKCL, is proposed to improve emotion recognition in conversations by distinguishing utterances that require context and knowledge from those that do not.
Edge Computing with Complementary Capsule Networks for Mental State Detection in Underground Mining Industry
TL;DR: In this paper , an edge computing mental state framework of the internet of things in underground mining industry is proposed and a filtering algorithm using a defined threshold function is developed, and a complemented capsule network model is constructed by using two residual modules.
7
References
Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network.
Wenxiang Jiao,Michael R. Lyu,Irwin King +2 more
- 03 Apr 2020
TL;DR: An Attention Gated Hierarchical Memory Network (AGHMN) with a bidirectional GRU (BiGRU) as the utterance reader and a BiGRU fusion layer for the interaction between historical utterances to address the problems of prior work.
All-in-One: Emotion, Sentiment and Intensity Prediction using a Multi-task Ensemble Framework
TL;DR: A multi-task ensemble framework that jointly learns multiple related problems of emotion and sentiment analysis and outperforms the single-task frameworks in all experiments.
124
•Proceedings Article
Contextualized Emotion Recognition in Conversation as Sequence Tagging.
Wang Yan,Zhang Jiayu,Ma Jun,Shaojun Wang,Jing Xiao +4 more
- 01 Jan 2020
TL;DR: A method to model ERC task as sequence tagging where a Conditional Random Field layer is leveraged to learn the emotional consistency in the conversation and outperforms the current state-of-the-art methods on multiple emotion classification datasets.
119
Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure
Arianna Mencattini,Eugenio Martinelli,Giovanni Costantini,Massimiliano Todisco,Barbara Basile,Marco Bozzali,Corrado Di Natale +6 more
TL;DR: This study proposes the use of a PLS regression model, optimized according to specific features selection procedures and trained on the Italian speech corpus EMOVO, suggesting a way to automatically label the corpus in terms of arousal and valence.
92
Fine-Grained Emotion Detection in Health-Related Online Posts
Hamed Khanpour,Cornelia Caragea +1 more
- 01 Nov 2018
TL;DR: This paper proposes to detect fine-grained emotion types from health-related posts and shows how high-level and abstract features derived from deep neural networks combined with lexicon-based features can be employed to detect emotions.