Incorporating Copying Mechanism in Sequence-to-Sequence Learning
Jiatao Gu,Zhengdong Lu,Hang Li,Victor O. K. Li +3 more
- 21 Mar 2016
- Vol. 1, pp 1631-1640
TL;DR: CopyNet as discussed by the authors incorporates copying into neural network-based Seq2Seq learning and proposes a new model called CopyNet with encoder-decoder structure, which can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence.
read more
Abstract: We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation. In this paper, we incorporate copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure. CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence. Our empirical study on both synthetic data sets and real world data sets demonstrates the efficacy of CopyNet. For example, CopyNet can outperform regular RNN-based model with remarkable margins on text summarization tasks.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Grounding Language to Landmarks in Arbitrary Outdoor Environments
Matthew Berg,Deniz Bayazit,Rebecca Mathew,Ariel Rotter-Aboyoun,Ellie Pavlick,Stefanie Tellex +5 more
- 01 May 2020
TL;DR: This work presents a framework that parses references to landmarks, then assesses semantic similarities between the referring expression and landmarks in a predefined semantic map of the world, and ultimately translates natural language commands to motion plans for a drone.
23
Posterior Control of Blackbox Generation
Xiang Lisa Li,Alexander M. Rush +1 more
- 01 Jul 2020
TL;DR: The authors proposed a structured latent-variable approach to encode task-specific knowledge through a range of rich, posterior constraints that are effectively trained into the model, allowing users to ground internal model decisions based on prior knowledge, without sacrificing the representational power of neural generative models.
Semantic graph parsing with recurrent neural network DAG grammars
Federico Fancellu,Sorcha Gilroy,Adam Lopez,Mirella Lapata +3 more
- 04 Nov 2019
TL;DR: The authors proposed a graph-aware sequence model that generates well-formed graphs while sidestepping many difficulties in graph prediction, such as the difficulty of predicting linearized graphs in semantic parsing.
Investigating Math Word Problems using Pretrained Multilingual Language Models
Minghuan Tan,Lei Wang,Lingxiao Jiang,Jing Jiang +3 more
- 01 Jan 2022
TL;DR: Investigating Math Word Problems using Pretrained Multilingual Language Models TLDR: MWP solvers may not be transferred to a different language, but they can be better generalized if problem types exist on both source and target languages.
A survey on semantic processing techniques
Rui Mao,Kai He,Xulang Zhang,Guanyi Chen,Jinjie Ni,Yang Zhou,Zhaoxia Wang +6 more
TL;DR: A survey on semantic processing techniques covering word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. The survey reviews theoretical research, advanced methods, and downstream applications. It also discusses technical and application trends, and future directions.
22
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
- 01 Jan 2014
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
•Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau,Kyunghyun Cho,Yoshua Bengio +2 more
- 01 Jan 2015
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
25.7K
•Proceedings Article
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever,Oriol Vinyals,Quoc V. Le +2 more
- 08 Dec 2014
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Related Papers (5)
Ilya Sutskever,Oriol Vinyals,Quoc V. Le +2 more
- 08 Dec 2014
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015