Top 67 papers published in the topic of Multi-document summarization in 2004

Showing papers on "Multi-document summarization published in 2004"

Proceedings Article•

ROUGE: A Package for Automatic Evaluation of Summaries

[...]

Chin-Yew Lin¹•Institutions (1)

25 Jul 2004

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Abstract: ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper introduces four different ROUGE measures: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S included in the ROUGE summarization evaluation package and their evaluations. Three of them have been used in the Document Understanding Conference (DUC) 2004, a large-scale summarization evaluation sponsored by NIST.

...read moreread less

14,830 citations

Journal Article•10.1613/JAIR.1523•

LexRank: graph-based lexical centrality as salience in text summarization

[...]

Gunes Erkan¹, Dragomir R. Radev¹•Institutions (1)

University of Michigan¹

01 Jul 2004-Journal of Artificial Intelligence Research

TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.

...read moreread less

Abstract: We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.

...read moreread less

2,367 citations

Journal Article•10.1016/J.IPM.2003.10.006•

Centroid-based summarization of multiple documents

[...]

Dragomir R. Radev¹, Hongyan Jing², Małgorzata Styś², Daniel Tam¹•Institutions (2)

University of Michigan¹, IBM²

01 Nov 2004-Information Processing and Management

TL;DR: A multi-document summarizer, MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and an evaluation scheme based on sentence utility and subsumption is applied.

...read moreread less

Abstract: We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

...read moreread less

1,248 citations

Proceedings Article•10.7916/D80R9XVD•

Evaluating Content Selection in Summarization: The Pyramid Method

[...]

Ani Nenkova, Rebecca J. Passonneau

1 Jan 2004

TL;DR: It is argued that the method presented is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.

...read moreread less

Abstract: We present an empirically grounded method for evaluating content selection in summarization. It incorporates the idea that no single best model summary for a collection of documents exists. Our method quantifies the relative importance of facts to be conveyed. We argue that it is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.

...read moreread less

727 citations

Proceedings Article•10.7916/D8MG7XZT•

MEAD - A Platform for Multidocument Multilingual Text Summarization

[...]

Dragomir R. Radev¹, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda Çelebi, Stanko Dimitrov, Elliott F. Drabek, Ali Hakim, Wai Lam, Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio Saggion, Simone Teufel, Michael Topper, Adam Winkel, Zhu Zhang - Show less +13 more•Institutions (1)

University of Michigan¹

1 May 2004

TL;DR: The functionality of MEAD is described, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations.

...read moreread less

Abstract: This paper describes the functionality of MEAD, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations. MEAD has been used in a variety of summarization applications ranging from summarization for mobile devices to Web page summarization within a search engine and to novelty detection.

...read moreread less

416 citations

Proceedings Article•10.1145/1008992.1009035•

Web-page classification through summarization

[...]

Dou Shen¹, Zheng Chen², Qiang Yang³, Hua-Jun Zeng², Benyu Zhang², Yuchang Lu¹, Wei-Ying Ma² - Show less +3 more•Institutions (3)

Tsinghua University¹, Microsoft², Hong Kong University of Science and Technology³

25 Jul 2004

TL;DR: This paper gives empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web- page classification algorithms and proposes a new Web summarization-based classification algorithm that achieves an approximately 8.8% improvement over pure-text based methods.

...read moreread less

Abstract: Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web-page classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then propose a new Web summarization-based classification algorithm and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that our proposed summarization-based classification algorithm achieves an approximately 8.8% improvement as compared to pure-text-based classification algorithm. We further introduce an ensemble classifier using the improved summarization algorithm and show that it achieves about 12.9% improvement over pure-text based methods.

...read moreread less

212 citations

Task-Focused Summarization of Email

[...]

Simon Corston-Oliver, Eric K. Ringger, Michael Gamon, Richard John Campbell¹•Institutions (1)

Microsoft¹

1 Jul 2004

TL;DR: SmartMail, a prototype system for automatically identifying action items (tasks) in email messages, presents the user with a task-focused summary of a message that contains a list of action items extracted from the message.

...read moreread less

Abstract: We describe SmartMail, a prototype system for automatically identifying action items (tasks) in email messages. SmartMail presents the user with a task-focused summary of a message. The summary consists of a list of action items extracted from the message. The user can add these action items to their “to do” list.

...read moreread less

129 citations

Proceedings Article•10.3115/1220355.1220484•

Syntactic simplification for improving content selection in multi-document summarization

[...]

Advaith Siddharthan¹, Ani Nenkova¹, Kathleen R. McKeown¹•Institutions (1)

Columbia University¹

23 Aug 2004

TL;DR: It is shown how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information.

...read moreread less

Abstract: In this paper, we explore the use of automatic syntactic simplification for improving content selection in multi-document summarization. In particular, we show how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information. We argue that the inclusion of parenthetical information in a summary is a reference-generation task rather than a content-selection one, and implement a baseline reference rewriting module. We perform our evaluations on the test sets from the 2003 and 2004 Document Understanding Conference and report that simplifying parentheticals results in significant improvement on the automated evaluation metric Rouge.

...read moreread less

112 citations

LetSum, an automatic Legal Text Summarizing system

[...]

Atefeh Farzindar¹, Guy Lapalme¹•Institutions (1)

Université de Montréal¹

1 Jan 2004

TL;DR: LetSum (Legal text Sum- marizer), a prototype system, is described, which determines the thematic structure of a judgment in four themes INTRODUCTION, CONTEXT, JURIDICAL ANALYSIS and CONCLUSION, which identifies the relevant sentences for each theme.

...read moreread less

Abstract: This paper presents our work on the development of a new methodology for automatic summarization of justice decision. We describe LetSum (Legal text Sum- marizer), a prototype system, which determines the thematic structure of a judgment in four themes INTRODUCTION, CONTEXT, JURIDICAL ANALYSIS and CONCLUSION. Then it identifies the relevant sentences for each theme. We discuss the evaluation of produced summaries with statistical method and also human evaluation based on jurist judgment. The results so far indicate good performance of the system when compared with other summarization technologies.

...read moreread less

78 citations

Lakhas, an Arabic summarization system

[...]

Fouad Soufiane Douzidia¹, Guy Lapalme•Institutions (1)

Université de Montréal¹

1 Jan 2004

TL;DR: The structure of the system and the various compaction techniques developed in order to produce 10 words summaries of news articles are described and the score obtained using two different machine translation systems are presented.

...read moreread less

Abstract: This paper describes the Arabic summarization system that we have developed and evaluated on the very short summary of noisy text task of DUC2004. We describe the structure of the system and the various compaction techniques we developed in order to produce 10 words summaries of news articles. We also present the score we obtained using two different machine translation systems.

...read moreread less

65 citations

Journal Article•10.1145/986278.986284•

Text Summarization Challenge 2 text summarization evaluation at NTCIR workshop 3

[...]

Manabu Okumura¹, Takahiro Fukusima², Hidetsugu Nanba³, Tsutomu Hirao⁴•Institutions (4)

Tokyo Institute of Technology¹, Otemon Gakuin University², Hiroshima City University³, Nippon Telegraph and Telephone⁴

1 Jul 2004

TL;DR: The outline of Text Summarization Challenge 2 (TSC2 hereafter), a sequel text summarization evaluation conducted as one of the tasks at the NTCIR Workshop 3, is reported.

...read moreread less

Abstract: We report the outline of Text Summarization Challenge 2 (TSC2 hereafter), a sequel text summarization evaluation conducted as one of the tasks at the NTCIR Workshop 3. First, we describe briefly the previous evaluation, Text Summarization Challenge (TSC1) as introduction to TSC2. Then we explain TSC2 including the participants, the two tasks in TSC2, data used, evaluation methods for each task, and brief report on the results. Lastly we describe plans for the next evaluation, TSC3.

...read moreread less

Evaluation of Automatic Text Summarization

[...]

Martin Hassel

1 Jan 2004

Multi-document summarization by cluster/prole relevance and redundancy removal

[...]

Horacio Saggion¹, Robert Gaizauskas•Institutions (1)

University of Sheffield¹

1 Jan 2004

TL;DR: A sentence extraction system that produces two sorts of multi-document summaries: the rst is a general-purpose summary of a cluster of related documents while the second is an entity-based summary of documents related to a particular person.

...read moreread less

Abstract: We describe a sentence extraction system that produces two sorts of multi-document summaries: the rst is a general-purpose summary of a cluster of related documents while the second is an entity-based summary of documents related to a particular person. The general-purpose summary is generated by a process that ranks sentences based on their document and cluster \worthiness". The personality-based summary is constructed by a process that ranks sentences according to a metric that uses coreference and lexical information in a person prole. In both cases, a process of redundancy removal is applied to exclude repeated information.

...read moreread less

Proceedings Article•10.3115/1613984.1613985•

Using N-Grams to understand the nature of summaries

[...]

Michele Banko¹, Lucy Vanderwende¹•Institutions (1)

Microsoft¹

2 May 2004

TL;DR: Empirically characterize human-written summaries provided in a widely used summarization corpus and suggest that extraction-based techniques which have been successful for single-document summarization may not be sufficient when summarizing multiple documents.

...read moreread less

Abstract: Although single-document summarization is a well-studied task, the nature of multi-document summarization is only beginning to be studied in detail. While close attention has been paid to what technologies are necessary when moving from single to multi-document summarization, the properties of human-written multi-document summaries have not been quantified. In this paper, we empirically characterize human-written summaries provided in a widely used summarization corpus by attempting to answer the questions: Can multi-document summaries that are written by humans be characterized as extractive or generative? Are multi-document summaries less extractive than single-document summaries? Our results suggest that extraction-based techniques which have been successful for single-document summarization may not be sufficient when summarizing multiple documents.

...read moreread less

Book Chapter•10.1007/978-3-540-30498-2_31•

Automatic Text Summarization with Genetic Algorithm-Based Attribute Selection

[...]

Carlos N. Silla¹, Gisele L. Pappa², Alex A. Freitas², Celso A. A. Kaestner¹•Institutions (2)

Pontifícia Universidade Católica do Paraná¹, University of Kent²

22 Nov 2004

TL;DR: The goal of the paper is to investigate the effectiveness of Genetic Algorithm (GA)-based attribute selection in improving the performance of classification algorithms solving the automatic text summarization task.

...read moreread less

Abstract: The task of automatic text summarization consists of generating a summary of the original text that allows the user to obtain the main pieces of information available in that text, but with a much shorter reading time. This is an increasingly important task in the current era of information overload, given the huge amount of text available in documents. In this paper the automatic text summarization is cast as a classification (supervised learning) problem, so that machine learning-oriented classification methods are used to produce summaries for documents based on a set of attributes describing those documents. The goal of the paper is to investigate the effectiveness of Genetic Algorithm (GA)-based attribute selection in improving the performance of classification algorithms solving the automatic text summarization task. Computational results are reported for experiments with a document base formed by news extracted from The Wall Street Journal of the TIPSTER collection–a collection that is often used as a benchmark in the text summarization literature.

...read moreread less

Evaluation of automatic text summarizaiton : a practical implementation

[...]

Martin Hassel

1 Jan 2004

Proceedings Article•10.1109/CIT.2004.1357351•

A study of Chinese text summarization using adaptive clustering of paragraphs

[...]

Po Hu¹, Tingting He¹, Donghong Ji², Meng Wang¹•Institutions (2)

Central China Normal University¹, Institute for Infocomm Research Singapore²

14 Sep 2004

TL;DR: Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.

...read moreread less

Abstract: Automatic summarization is an important research issue in natural language processing. This paper presents a special summarization method to generate single-document summary with maximum topic completeness and minimum redundancy. It initially implements the semantic-class-based vector representations of various kinds of linguistic units in a document by means of HowNet (an existing ontology), which can improve the representation quality of traditional term-based vector space model in a certain degree. Then, by adopting K-means clustering algorithm as well as a clustering analysis algorithm, we can capture the number of different latent topic regions in a document adoptively. Finally, topic representative sentences are selected from each topic region to form the final summary. In order to evaluate the effectiveness of the proposed summarization method, a novel metric which is known as representation entropy is used for summarization redundancy evaluation. Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.

...read moreread less

Proceedings Article•10.3115/1220355.1220432•

Corpus and evaluation measures for multiple document summarization with multiple sources

[...]

Tsutomu Hirao, Takahiro Fukusima¹, Manabu Okumura², Chikashi Nobata, Hidetsugu Nanba³ - Show less +1 more•Institutions (3)

Otemon Gakuin University¹, Tokyo Institute of Technology², Hiroshima City University³

23 Aug 2004

TL;DR: A large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus, which annotates not only the important sentences in a document set, but also those among them that have the same content.

...read moreread less

Abstract: In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same content. Moreover, we define new evaluation metrics taking redundancy into account and discuss the effectiveness of redundancy minimization.

...read moreread less

Book Chapter•10.1007/978-3-540-30542-2_31•

News video summarization based on spatial and motion feature analysis

[...]

Wen-Nung Lie¹, Chun-Ming Lai¹•Institutions (1)

National Chung Cheng University¹

30 Nov 2004

TL;DR: The Lagrangian multiplier approach was employed to build optimization in allocating time-lengths for all the segmented shots and getting the best perceived motion activity of the summarized video.

...read moreread less

Abstract: In this paper, an efficient and effective summarization algorithm based on the extraction and analysis of spatial and motion features for MPEG news video is proposed. We focus on video feature analysis techniques based on the compressed domain (i.e., MVs and DCT coefficients), without the need of transformation back to the pixel domain. To give the viewers a quick and enough browse of the news content, we adopted a new strategy that the anchor audio is overlaid with the summarized news video. Hence, the detection of anchor shots and the summarization of news segment subject to a time-budget constraint constitute the two main works in this paper. In summarization of news segments, the Lagrangian multiplier approach was employed to build optimization in allocating time-lengths for all the segmented shots and getting the best perceived motion activity of the summarized video. Experiments show that our summarized news videos present an average MOS score of above 4.0 in a subjective test.

...read moreread less

Journal Article•10.1145/1039621.1039624•

Usefulness of temporal information automatically extracted from news articles for topic tracking

[...]

Pyung Kim¹, Sung-Hyon Myaeng²•Institutions (2)

Chungnam National University¹, Information and Communications University²

01 Dec 2004-ACM Transactions on Asian Language Information Processing

TL;DR: A relatively simple NLP method for extracting temporal information from Korean news articles, with the goal of improving performance of TDT tasks and showing that time information extracted from the text does indeed help to significantly improve both precision and recall.

...read moreread less

Abstract: Temporal information plays an important role in natural language processing (NLP) applications such as information extraction, discourse analysis, automatic summarization, and question-answering. In the topic detection and tracking (TDT) area, the temporal information often used is the publication date of a message, which is readily available but limited in its usefulness. We developed a relatively simple NLP method for extracting temporal information from Korean news articles, with the goal of improving performance of TDT tasks. To extract temporal information, we make use of finite state automata and a lexicon containing timerevealing vocabulary. Extracted information is converted into a canonicalized representation of a time point or a time duration. We first evaluated and investigated the extraction and canonicalization methods for their accuracy and the extent to which temporal information extracted as such can help TDT tasks. The experimental results show that time information extracted from the text does indeed help to significantly improve both precision and recall.

...read moreread less

Multi-document summarization using document set type classification

[...]

Junichi Fukumoto¹•Institutions (1)

Ritsumeikan University¹

1 Jan 2004

TL;DR: A summarization system which automatically classifies type of document set and summarizes a document set with its appropriate summarizationmechanism is proposed.

...read moreread less

Abstract: In this paper, we propose a summarization system which automatically classifies type of document set and summarizes a document set with its appropriate summarizationmechanism. This system will classify a document set into three types: (a) One topic type, (b) multi-topic type, and (c) others. These types will be identifi ed using information of high frequency nouns and Named Entity. In our multi-document summarization system, unnecessary parts are deleted after summarizing each documents and then multi-document summary is generated. In type (a), unnecessary parts are similar part between summarized documents by single document summarization. In type (b), unnecessary parts are unsimilar parts in documents. In type (c), unnecessaryparts are identified by scores used for single document summarization.

...read moreread less

Summarization Experiments in DUC 2004

[...]

Kenneth C. Litkowski

1 Jan 2004

TL;DR: ClCL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts, and the Knowledge Management System was extended to include a refined capability for identifying multiword units for use in keyword generation.

...read moreread less

Abstract: CL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts. We extended the Knowledge Management System to include (1) a refined capability for identifying multiword units (phrases) for use in keyword generation, (2) the incorporation of word-sense disambiguation to tag senses and identify semantic types, and (3) the integration of question-answering functionality into the summarization framework. We did not devote much effort in refining our system to create summaries for the five tasks, but achieved reasonable levels of performance. We viewed the length restrictions imposed on the tasks as not providing sufficient flexibility to investigate different modes of summarization. We viewed the tasks of summarizing machine translations of poor quality as not very interesting. We used Tasks 1 and 3 to develop and r efine a keywor d gen er ation capa bility, ach ieving levels of four th of 18 a nd fourth of 10 priority 1 systems. In the more general summarization tasks, our performance was near the bottom of participating systems, but still achieved acceptable levels of performance. We performed much better on quality measures with our extraction-based summaries, with an overall level of third of 14 systems for Task 5. For several quality measures, our performance was somewhat less; these levels identify specifically those areas of summarization analysis where the use of an XML representation are particularly amenable to improvement. While we will continue to improve our summarization capability within the general guidelines, we believe that summarization is only one part of document understanding and may not represent needs of users for document exploration at a much deeper level.

...read moreread less

Extending Document Summarization to Information Graphics

[...]

Sandra Carberry, Stephanie Elzer, Nancy L. Green, Kathleen F. McCoy, Daniel L. Chester - Show less +1 more

1 Jul 2004

TL;DR: It is argued that the message that the graphic designer intended to convey must play a major role in determining the content of the summary, and the approach to identifying this intended message and using it to construct the summary is outlined.

...read moreread less

Abstract: Information graphics (non-pictorial graphics such as bar charts or line graphs) are an important component of multimedia documents. Often such graphics convey information that is not contained elsewhere in the document. Thus document summarization must be extended to include summarization of information graphics. This paper addresses our work on graphic summarization. It argues that the message that the graphic designer intended to convey must play a major role in determining the content of the summary, and it outlines our approach to identifying this intended message and using it to construct the summary.

...read moreread less

Book Chapter•10.1007/978-3-540-30480-7_31•

Temporal Web Page Summarization

[...]

Adam Jatowt¹, Mitsuru Ishizuka¹•Institutions (1)

University of Tokyo¹

22 Nov 2004

TL;DR: A new method for temporal web page summarization based on trend and variance analysis is presented, which can be also used for summarization of dynamic collections of topically related web pages.

...read moreread less

Abstract: In the recent years the Web has become an important medium for communication and information storage. As this trend is predicted to continue, it is necessary to provide efficient solutions for retrieving and processing information found in WWW. In this paper we present a new method for temporal web page summarization based on trend and variance analysis. In the temporal summarization web documents are treated as dynamic objects that have changing contents and characteristics. The sequential versions of a single web page are retrieved during predefined time interval for which the summary is to be constructed. The resulting summary should represent the most popular, evolving concepts which are found in web document versions. The proposed method can be also used for summarization of dynamic collections of topically related web pages.

...read moreread less

Handling Figures in Document Summarization

[...]

Robert P. Futrelle¹•Institutions (1)

Northeastern University¹

1 Jul 2004

TL;DR: The focus is on diagrams (line drawings) because they allow parsing techniques to be used, in contrast to the difficulties of general image understanding, and the advances in raster image vectorization and parsing needed to produce corpora for diagram summarization.

...read moreread less

Abstract: Some document genres contain a large number of figures. This position paper outlines approaches to diagram summarization that can augment the many well-developed techniques of text summarization. We discuss figures as surrogates for entire documents, thumbnails, extraction, the relations between text and figures as well as how automation might be achieved. The focus is on diagrams (line drawings) because they allow parsing techniques to be used, in contrast to the difficulties of general image understanding. We describe the advances in raster image vectorization and parsing needed to produce corpora for diagram summarization.

...read moreread less

Proceedings Article•10.1109/ITCC.2004.1286634•

Information-content based sentence extraction for text summarization

[...]

D. Mallett, J. Elding, Mario A. Nascimento¹•Institutions (1)

University of Alberta¹

5 Apr 2004

TL;DR: The FULL-COVERAGE summarizer is proposed: an efficient, information retrieval oriented method to extract nonredundant sentences from text for summarization purposes that can produce sentence-based summaries that are up to 78% smaller than the original text with only 3% loss in retrieval performance.

...read moreread less

Abstract: We propose the FULL-COVERAGE summarizer: an efficient, information retrieval oriented method to extract nonredundant sentences from text for summarization purposes. Our method leverages existing information retrieval technology by extracting key-sentences on the premise that the relevance of a sentence is proportional to its similarity to the whole document. We show that our method can produce sentence-based summaries that are up to 78% smaller than the original text with only 3% loss in retrieval performance.

...read moreread less

Proceedings Article•10.1109/WI.2004.110•

Ontology-Based Structured Cosine Similarity in Speech Document Summarization

[...]

Soe-Tsyr Yuan¹, Jerry Sun²•Institutions (2)

National Chengchi University¹, Fu Jen Catholic University²

20 Sep 2004

TL;DR: A novel method named Structured Cosine Similarity is presented that furnishes document clustering with a new way of modeling on document summarization, considering the structure of terms in documents in order to improve the quality of speech document clusters.

...read moreread less

Abstract: Development of algorithms for automated text categorization in massive text document sets is an important research area of data mining and knowledge discovery. Most of the text-clustering methods were grounded in the term-based measurement of distance or similarity, ignoring the structure of terms in documents. In this paper we present a novel method named Structured Cosine Similarity that furnishes document clustering with a new way of modeling on document summarization, considering the structure of terms in documents in order to improve the quality of speech document clustering.

...read moreread less

Multi-Document Summarization Using Cross-Language Texts

[...]

Jung-Min Lim, In-Su Kang, Jong-Hyeok Lee¹•Institutions (1)

Pohang University of Science and Technology¹

1 Jan 2004

TL;DR: This work tries to generate a summary in source language, using translated documents by a machine translator and a summarization system in target language, and shows the possibility of multi-documents summarization, using crosslanguage texts.

...read moreread less

Abstract: Without a summarization system in source language, we try to generate a summary in source language, using translated documents by a machine translator and a summarization system in target language. For summarizing multiple documents translated by a machine translator, we extract important sentences, and remove redundant sentences using an improved term-weighting method. It assigns weights to words, using syntactic information. According to the score of the extracted sentence, we choose sentences, and map them to Japanese sentences in original documents. Finally, we arrange Japanese sentences in chronological order, and report them as the result of our system. We submitted both a short and long type of summary, and the evaluation of our results is not good. However, our approach shows the possibility of multi-documents summarization, using crosslanguage texts.

...read moreread less

Query-based Multidocument Summarization for Information Retrieval

[...]

Toshihiko Sakurai¹, Akira Utsumi¹•Institutions (1)

University of Electro-Communications¹

1 Jan 2004

TL;DR: This paper presents a genre-independent method of generating a single summary from a set of the retrieved documents for information retrieval, which generates the core part of the summary from the most relevant document to a query.

...read moreread less

Abstract: This paper presents a genre-independent method of generating a single summary from a set of the retrieved documents for information retrieval. The proposed method generates the core part of the summary from the most relevant document to a query, and then the additional part of the summary, which elaborates on the topics, from the other documents. In order to evaluate the validity of the proposed method, we participated in TSC (Text Summarization Challenge) in the 4th NTCIR Workshop. The performance was not satisfactory for the specific task, but we believe that our method would be useful for a set of documents including various genres, such as one retrieved by Web search engines.

...read moreread less

Proceedings Article•10.3115/1220355.1220418•

Multi-answer-focused multi-document summarization using a question-answering engine

[...]

Tatsunori Mori¹, Masanori Nozawa¹, Yoshiaki Asada¹•Institutions (1)

Yokohama National University¹

23 Aug 2004

TL;DR: A method to calculate sentence importance using scores produced by a Question-Answering engine in response to multiple questions and an integration of it into a generic multi-document summarization system is described.

...read moreread less

Abstract: Recent years, the answer-focused summarization is paid attention to as a technology complementary to information retrieval and question answering. In order to realize multi-document summarization focused by multiple questions, we propose a method to calculate sentence importance using scores produced by a Question-Answering engine in response to multiple questions. We also describe an integration of it into a generic multi-document summarization system. The evaluation results show that the proposed method has better performance than not only several baselines but also other participants' systems in the evaluation workshop NTCIR4 TSC3 Formal Run, although we have to take notice of the fact that some of the other systems do not use the information of questions.

...read moreread less