Open Access
Cut-and-paste text summarization
Kathleen R. McKeown,Hongyan Jing +1 more
- 01 Jan 2002
TL;DR: This thesis presents a cut-and-paste approach to addressing the text generation problem in domain-independent, single-document summarization, and built a large-scale, reusable lexicon by combining multiple, heterogeneous resources.
read more
Abstract: Automatic text summarization provides a concise summary for a document. In this thesis, we present a cut-and-paste approach to addressing the text generation problem in domain-independent, single-document summarization.
We found that professional abstractors often reuse the text in an original document for producing the text in a summary. But rather than simply extracting the original text, as in most existing automatic summarizers, humans often edit the extracted sentences. We call such editing operations “revision operations”. Our summarizer simulates two revision operations that are frequently used by humans: sentence reduction and sentence combination. Sentence reduction removes inessential phrases from sentences and sentence combination merges sentences and phrases together. The sentence reduction algorithm we propose relies on multiple sources of knowledge to decide when it is appropriate to delete a phrase from a sentence, including linguistic knowledge, probabilities trained from corpus examples, and context information. The sentence combination module relies on a set of rules to decide how to combine sentences and phrases and when to combine them. Sentence reduction aims to improve the conciseness of generated summaries and sentence combination aims to improve the coherence of generated summaries. We call this approach “cut-and-paste” since it produces summaries by excerpting and combining sentences and phrases from original documents, unlike the extraction technique which produces summaries by simply extracting sentences or passages.
Our work also includes a Hidden Markov Model based sentence decomposition program which analyzes human-written summaries. The decomposition program identifies where the phrases of a summary originate in the original document, producing an aligned corpus of summaries and articles that we use to train and evaluate the summarizer. We also built a large-scale, reusable lexicon by combining multiple, heterogeneous resources. The lexicon contains lexical, syntactic, and semantic knowledge. It can be used in many applications.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
Automatic Summarization
Ani Nenkova,Sameer Maskey,Yang Liu +2 more
- 27 Jun 2011
TL;DR: The challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field are discussed.
889
•Book
Machine Learning for Text
Charu C. Aggarwal
- 01 Feb 2019
TL;DR: This textbook covers machine learning topics for text in detail and targets graduate students in computer science, as well as researchers, professors, and industrialpractitioners working in these related fields.
229
Dependency tree based sentence compression
Katja Filippova,Michael Strube +1 more
- 12 Jun 2008
TL;DR: A novel unsupervised method for sentence compression which relies on a dependency tree representation and shortens sentences by removing subtrees and it is demonstrated that the choice of the parser affects the performance of the system.
Sentence Fusion via Dependency Graph Compression
Katja Filippova,Michael Strube +1 more
- 25 Oct 2008
TL;DR: A novel unsupervised sentence fusion method which is applied to a corpus of biographies in German and outperforms the fusion approach of Barzilay & McKeown (2005) with respect to readability.
146
•Proceedings Article
Sentence Compression for Automated Subtitling: A Hybrid Approach
Vincent Vandeghinste,Yi Pan +1 more
- 01 Jan 2004
TL;DR: This paper describes how an input sentence gets analysed by using a.o. a tagger, a shallow parser and a subordinate clause detector, and how, based on this analysis, several compressed versions of this sentence are generated, each with an associated estimated probability.
64
References
A tutorial on hidden Markov models and selected applications in speech recognition
Lawrence R. Rabiner
- 01 Feb 1989
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
TL;DR: The upper bound is obtained for a specific probabilistic nonsequential decoding algorithm which is shown to be asymptotically optimum for rates above R_{0} and whose performance bears certain similarities to that of sequential decoding algorithms.
7.6K
Introduction to WordNet: An On-line Lexical Database
TL;DR: Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list.
Fast effective rule induction
William W. Cohen
- 09 Jul 1995
TL;DR: This paper evaluates the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems, and proposes a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5 and C 4.5rules with respect to error rates, but much more efficient on large samples.
4.5K