TL;DR: In the exploratory research described, the complete text of an article in machine-readable form is scanned by an IBM 704 data-processing machine and analyzed in accordance with a standard program.
Abstract: Excerpts of technical papers and magazine articles that serve the purposes of conventional abstracts have been created entirely by automatic means. In the exploratory research described, the complete text of an article in machine-readable form is scanned by an IBM 704 data-processing machine and analyzed in accordance with a standard program. Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the "auto-abstract."
TL;DR: New methods of automatically extracting documents for screening purposes, i.e. the computer selection of sentences having the greatest potential for conveying to the reader the substance of the document, indicate that the three newly proposed components dominate the frequency component in the production of better extracts.
Abstract: This paper describes new methods of automatically extracting documents for screening purposes, i.e. the computer selection of sentences having the greatest potential for conveying to the reader the substance of the document. While previous work has focused on one component of sentence significance, namely, the presence of high-frequency content words (key words), the methods described here also treat three additional components: pragmatic words (cue words); title and heading words; and structural indicators (sentence location).The research has resulted in an operating system and a research methodology. The extracting system is parameterized to control and vary the influence of the above four components. The research methodology includes procedures for the compilation of the required dictionaries, the setting of the control parameters, and the comparative evaluation of the automatic extracts with manually produced extracts. The results indicate that the three newly proposed components dominate the frequency component in the production of better extracts.
TL;DR: This paper presents an innovative unsupervised method for automatic sentence extraction using graph-based ranking algorithms and shows that the results obtained compare favorably with previously published results on established benchmarks.
Abstract: This paper presents an innovative unsupervised method for automatic sentence extraction using graph-based ranking algorithms. We evaluate the method in the context of a text summarization task, and show that the results obtained compare favorably with previously published results on established benchmarks.
TL;DR: This work focuses on automatic summarization of open-domain multiparty dialogues in diverse genres, and on the development of a robust practical text summarizer based on rhetorical structure extraction.
Abstract: generation based on rhetorical structure extraction. In Proceedings of the International Conference on Computational Linguistics, Kyoto, Japan, pages 344–348. Otterbacher, Jahna, Dragomir R. Radev, and Airong Luo. 2002. Revisions that improve cohesion in multi-document summaries: A preliminary study. In ACL Workshop on Text Summarization, Philadelphia. Papineni, K., S. Roukos, T. Ward, and W-J. Zhu. 2001. BLEU: A method for automatic evaluation of machine translation. Research Report RC22176, IBM. Radev, Dragomir, Simone Teufel, Horacio Saggion, Wai Lam, John Blitzer, Arda Celebi, Hong Qi, Elliott Drabek, and Danyu Liu. 2002. Evaluation of text summarization in a cross-lingual information retrieval framework. Technical Report, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, June. Radev, Dragomir R., Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In ANLP/NAACL Workshop on Summarization, Seattle, April. Radev, Dragomir R. and Kathleen R. McKeown. 1998. Generating natural language summaries from multiple on-line sources. Computational Linguistics, 24(3):469–500. Rau, Lisa and Paul Jacobs. 1991. Creating segmented databases from free text for text retrieval. In Proceedings of the 14th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, New York, pages 337–346. Saggion, Horacio and Guy Lapalme. 2002. Generating indicative-informative summaries with SumUM. Computational Linguistics, 28(4), 497–526. Salton, G., A. Singhal, M. Mitra, and C. Buckley. 1997. Automatic text structuring and summarization. Information Processing & Management, 33(2):193–207. Silber, H. Gregory and Kathleen McCoy. 2002. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics, 28(4), 487–496. Sparck Jones, Karen. 1999. Automatic summarizing: Factors and directions. In I. Mani and M. T. Maybury, editors, Advances in Automatic Text Summarization. MIT Press, Cambridge, pages 1–13. Strzalkowski, Tomek, Gees Stein, J. Wang, and Bowden Wise. 1999. A robust practical text summarizer. In I. Mani and M. T. Maybury, editors, Advances in Automatic Text Summarization. MIT Press, Cambridge, pages 137–154. Teufel, Simone and Marc Moens. 2002. Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), 409–445. White, Michael and Claire Cardie. 2002. Selecting sentences for multidocument summaries using randomized local search. In Proceedings of the Workshop on Automatic Summarization (including DUC 2002), Philadelphia, July. Association for Computational Linguistics, New Brunswick, NJ, pages 9–18. Witbrock, Michael and Vibhu Mittal. 1999. Ultra-summarization: A statistical approach to generating highly condensed non-extractive summaries. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, pages 315–316. Zechner, Klaus. 2002. Automatic summarization of open-domain multiparty dialogues in diverse genres. Computational Linguistics, 28(4), 447–485.
TL;DR: A multi-document summarizer, called MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and two new techniques, based on sentence utility and subsumption, are described.
Abstract: We present a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We also describe two new techniques, based on sentence utility and subsumption, which we have applied to the evaluation of both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.