Image-Chat: Engaging Grounded Conversations
Kurt Shuster,Samuel Humeau,Antoine Bordes,Jason Weston +3 more
- 01 Jul 2020
- pp 2414-2429
TL;DR: Automatic metrics and human evaluations of engagingness show the efficacy of this approach, and state-of-the-art performance on the existing IGC task is obtained, and the best performing model is almost on par with humans on the Image-Chat test set.
read more
Abstract: To achieve the long-term goal of machines being able to engage humans in conversation, our models should captivate the interest of their speaking partners. Communication grounded in images, whereby a dialogue is conducted based on a given photo, is a setup naturally appealing to humans (Hu et al., 2014). In this work we study large-scale architectures and datasets for this goal. We test a set of neural architectures using state-of-the-art image and text representations, considering various ways to fuse the components. To test such models, we collect a dataset of grounded human-human conversations, where speakers are asked to play roles given a provided emotional mood or style, as the use of such traits is also a key factor in engagingness (Guo et al., 2019). Our dataset, Image-Chat, consists of 202k dialogues over 202k images using 215 possible style traits. Automatic metrics and human evaluations of engagingness show the efficacy of our approach; in particular, we obtain state-of-the-art performance on the existing IGC task, and our best performing model is almost on par with humans on the Image-Chat test set (preferred 47.7% of the time).
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Deep Learning for Text Style Transfer: A Survey
TL;DR: Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others as mentioned in this paper .
•Posted Content
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey.
TL;DR: In this paper, a survey of state-of-the-art research outcomes in dialogue systems is presented, focusing mainly on the deep learning-based dialogue systems, and the authors comprehensively review the evaluation methods and datasets for dialogue systems.
135
Text is NOT Enough: Integrating Visual Impressions into Open-domain Dialogue Generation
TL;DR: In this paper, a co-attention encoder is used to generate a post representation with both visual and textual information, and then the response is generated based on the post and RVIs.
110
•Posted Content
The Adapter-Bot: All-In-One Controllable Conversational Model
TL;DR: The Adapter-Bot is proposed, a dialogue model that uses a fixed backbone conversational model such as DialGPT and triggers on-demand dialogue skills via different adapters, thus allowing a continual integration of skills without retraining the entire model.
73
•Posted Content
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions.
Stephen Roller,Y-Lan Boureau,Jason Weston,Antoine Bordes,Emily Dinan,Angela Fan,David Gunning,Da Ju,Margaret Li,Spencer Poff,Pratik Ringshia,Kurt Shuster,Eric Michael Smith,Arthur Szlam,Jack Urbanek,Mary Williamson +15 more
TL;DR: The properties of continual learning, providing engaging content, and being well-behaved are discussed -- and how to measure success in providing them and their recommendations to the community are discussed.
55
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Attention Is All You Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Łukasz Kaiser,Illia Polosukhin +7 more
- 01 Jan 2017
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
51.8K
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie,Ross Girshick,Piotr Dollár,Zhuowen Tu,Kaiming He +4 more
- 21 Jul 2017
TL;DR: ResNeXt as discussed by the authors is a simple, highly modularized network architecture for image classification, which is constructed by repeating a building block that aggregates a set of transformations with the same topology.
11.2K