Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Multi-document summarization
  4. 2004
  1. Home
  2. Topics
  3. Multi-document summarization
  4. 2004
Showing papers on "Multi-document summarization published in 2004"
Proceedings Article•
ROUGE: A Package for Automatic Evaluation of Summaries

[...]

Chin-Yew Lin1•
Information Sciences Institute1
25 Jul 2004
TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Abstract: ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper introduces four different ROUGE measures: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S included in the ROUGE summarization evaluation package and their evaluations. Three of them have been used in the Document Understanding Conference (DUC) 2004, a large-scale summarization evaluation sponsored by NIST.

14,830 citations

Journal Article•10.1613/JAIR.1523•
LexRank: graph-based lexical centrality as salience in text summarization

[...]

Gunes Erkan1, Dragomir R. Radev1•
University of Michigan1
01 Jul 2004-Journal of Artificial Intelligence Research
TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.
Abstract: We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.

2,367 citations

Journal Article•10.1016/J.IPM.2003.10.006•
Centroid-based summarization of multiple documents

[...]

Dragomir R. Radev1, Hongyan Jing2, Małgorzata Styś2, Daniel Tam1•
University of Michigan1, IBM2
01 Nov 2004-Information Processing and Management
TL;DR: A multi-document summarizer, MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and an evaluation scheme based on sentence utility and subsumption is applied.
Abstract: We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

1,248 citations

Proceedings Article•10.7916/D80R9XVD•
Evaluating Content Selection in Summarization: The Pyramid Method

[...]

Ani Nenkova, Rebecca J. Passonneau
1 Jan 2004
TL;DR: It is argued that the method presented is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.
Abstract: We present an empirically grounded method for evaluating content selection in summarization. It incorporates the idea that no single best model summary for a collection of documents exists. Our method quantifies the relative importance of facts to be conveyed. We argue that it is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.

727 citations

Proceedings Article•10.7916/D8MG7XZT•
MEAD - A Platform for Multidocument Multilingual Text Summarization

[...]

Dragomir R. Radev1, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda Çelebi, Stanko Dimitrov, Elliott F. Drabek, Ali Hakim, Wai Lam, Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio Saggion, Simone Teufel, Michael Topper, Adam Winkel, Zhu Zhang •
University of Michigan1
1 May 2004
TL;DR: The functionality of MEAD is described, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations.
Abstract: This paper describes the functionality of MEAD, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations. MEAD has been used in a variety of summarization applications ranging from summarization for mobile devices to Web page summarization within a search engine and to novelty detection.

416 citations

Proceedings Article•10.1145/1008992.1009035•
Web-page classification through summarization

[...]

Dou Shen1, Zheng Chen2, Qiang Yang3, Hua-Jun Zeng2, Benyu Zhang2, Yuchang Lu1, Wei-Ying Ma2 •
Tsinghua University1, Microsoft2, Hong Kong University of Science and Technology3
25 Jul 2004
TL;DR: This paper gives empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web- page classification algorithms and proposes a new Web summarization-based classification algorithm that achieves an approximately 8.8% improvement over pure-text based methods.
Abstract: Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web-page classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then propose a new Web summarization-based classification algorithm and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that our proposed summarization-based classification algorithm achieves an approximately 8.8% improvement as compared to pure-text-based classification algorithm. We further introduce an ensemble classifier using the improved summarization algorithm and show that it achieves about 12.9% improvement over pure-text based methods.

212 citations

Task-Focused Summarization of Email

[...]

Simon Corston-Oliver, Eric K. Ringger, Michael Gamon, Richard John Campbell1•
Microsoft1
1 Jul 2004
TL;DR: SmartMail, a prototype system for automatically identifying action items (tasks) in email messages, presents the user with a task-focused summary of a message that contains a list of action items extracted from the message.
Abstract: We describe SmartMail, a prototype system for automatically identifying action items (tasks) in email messages. SmartMail presents the user with a task-focused summary of a message. The summary consists of a list of action items extracted from the message. The user can add these action items to their “to do” list.

129 citations

Proceedings Article•10.3115/1220355.1220484•
Syntactic simplification for improving content selection in multi-document summarization

[...]

Advaith Siddharthan1, Ani Nenkova1, Kathleen R. McKeown1•
Columbia University1
23 Aug 2004
TL;DR: It is shown how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information.
Abstract: In this paper, we explore the use of automatic syntactic simplification for improving content selection in multi-document summarization. In particular, we show how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information. We argue that the inclusion of parenthetical information in a summary is a reference-generation task rather than a content-selection one, and implement a baseline reference rewriting module. We perform our evaluations on the test sets from the 2003 and 2004 Document Understanding Conference and report that simplifying parentheticals results in significant improvement on the automated evaluation metric Rouge.

112 citations

LetSum, an automatic Legal Text Summarizing system

[...]

Atefeh Farzindar1, Guy Lapalme1•
Université de Montréal1
1 Jan 2004
TL;DR: LetSum (Legal text Sum- marizer), a prototype system, is described, which determines the thematic structure of a judgment in four themes INTRODUCTION, CONTEXT, JURIDICAL ANALYSIS and CONCLUSION, which identifies the relevant sentences for each theme.
Abstract: This paper presents our work on the development of a new methodology for automatic summarization of justice decision. We describe LetSum (Legal text Sum- marizer), a prototype system, which determines the thematic structure of a judgment in four themes INTRODUCTION, CONTEXT, JURIDICAL ANALYSIS and CONCLUSION. Then it identifies the relevant sentences for each theme. We discuss the evaluation of produced summaries with statistical method and also human evaluation based on jurist judgment. The results so far indicate good performance of the system when compared with other summarization technologies.

78 citations

Lakhas, an Arabic summarization system

[...]

Fouad Soufiane Douzidia1, Guy Lapalme•
Université de Montréal1
1 Jan 2004
TL;DR: The structure of the system and the various compaction techniques developed in order to produce 10 words summaries of news articles are described and the score obtained using two different machine translation systems are presented.
Abstract: This paper describes the Arabic summarization system that we have developed and evaluated on the very short summary of noisy text task of DUC2004. We describe the structure of the system and the various compaction techniques we developed in order to produce 10 words summaries of news articles. We also present the score we obtained using two different machine translation systems.

65 citations

Journal Article•10.1145/986278.986284•
Text Summarization Challenge 2 text summarization evaluation at NTCIR workshop 3

[...]

Manabu Okumura1, Takahiro Fukusima2, Hidetsugu Nanba3, Tsutomu Hirao4•
Tokyo Institute of Technology1, Otemon Gakuin University2, Hiroshima City University3, Nippon Telegraph and Telephone4
1 Jul 2004
TL;DR: The outline of Text Summarization Challenge 2 (TSC2 hereafter), a sequel text summarization evaluation conducted as one of the tasks at the NTCIR Workshop 3, is reported.
Abstract: We report the outline of Text Summarization Challenge 2 (TSC2 hereafter), a sequel text summarization evaluation conducted as one of the tasks at the NTCIR Workshop 3. First, we describe briefly the previous evaluation, Text Summarization Challenge (TSC1) as introduction to TSC2. Then we explain TSC2 including the participants, the two tasks in TSC2, data used, evaluation methods for each task, and brief report on the results. Lastly we describe plans for the next evaluation, TSC3.
Evaluation of Automatic Text Summarization

[...]

Martin Hassel
1 Jan 2004
Multi-document summarization by cluster/prole relevance and redundancy removal

[...]

Horacio Saggion1, Robert Gaizauskas•
University of Sheffield1
1 Jan 2004
TL;DR: A sentence extraction system that produces two sorts of multi-document summaries: the rst is a general-purpose summary of a cluster of related documents while the second is an entity-based summary of documents related to a particular person.
Abstract: We describe a sentence extraction system that produces two sorts of multi-document summaries: the rst is a general-purpose summary of a cluster of related documents while the second is an entity-based summary of documents related to a particular person. The general-purpose summary is generated by a process that ranks sentences based on their document and cluster \worthiness". The personality-based summary is constructed by a process that ranks sentences according to a metric that uses coreference and lexical information in a person prole. In both cases, a process of redundancy removal is applied to exclude repeated information.
Proceedings Article•10.3115/1613984.1613985•
Using N-Grams to understand the nature of summaries

[...]

Michele Banko1, Lucy Vanderwende1•
Microsoft1
2 May 2004
TL;DR: Empirically characterize human-written summaries provided in a widely used summarization corpus and suggest that extraction-based techniques which have been successful for single-document summarization may not be sufficient when summarizing multiple documents.
Abstract: Although single-document summarization is a well-studied task, the nature of multi-document summarization is only beginning to be studied in detail. While close attention has been paid to what technologies are necessary when moving from single to multi-document summarization, the properties of human-written multi-document summaries have not been quantified. In this paper, we empirically characterize human-written summaries provided in a widely used summarization corpus by attempting to answer the questions: Can multi-document summaries that are written by humans be characterized as extractive or generative? Are multi-document summaries less extractive than single-document summaries? Our results suggest that extraction-based techniques which have been successful for single-document summarization may not be sufficient when summarizing multiple documents.
Book Chapter•10.1007/978-3-540-30498-2_31•
Automatic Text Summarization with Genetic Algorithm-Based Attribute Selection

[...]

Carlos N. Silla1, Gisele L. Pappa2, Alex A. Freitas2, Celso A. A. Kaestner1•
Pontifícia Universidade Católica do Paraná1, University of Kent2
22 Nov 2004
TL;DR: The goal of the paper is to investigate the effectiveness of Genetic Algorithm (GA)-based attribute selection in improving the performance of classification algorithms solving the automatic text summarization task.
Abstract: The task of automatic text summarization consists of generating a summary of the original text that allows the user to obtain the main pieces of information available in that text, but with a much shorter reading time. This is an increasingly important task in the current era of information overload, given the huge amount of text available in documents. In this paper the automatic text summarization is cast as a classification (supervised learning) problem, so that machine learning-oriented classification methods are used to produce summaries for documents based on a set of attributes describing those documents. The goal of the paper is to investigate the effectiveness of Genetic Algorithm (GA)-based attribute selection in improving the performance of classification algorithms solving the automatic text summarization task. Computational results are reported for experiments with a document base formed by news extracted from The Wall Street Journal of the TIPSTER collection–a collection that is often used as a benchmark in the text summarization literature.
Evaluation of automatic text summarizaiton : a practical implementation

[...]

Martin Hassel
1 Jan 2004
Proceedings Article•10.1109/CIT.2004.1357351•
A study of Chinese text summarization using adaptive clustering of paragraphs

[...]

Po Hu1, Tingting He1, Donghong Ji2, Meng Wang1•
Central China Normal University1, Institute for Infocomm Research Singapore2
14 Sep 2004
TL;DR: Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.
Abstract: Automatic summarization is an important research issue in natural language processing. This paper presents a special summarization method to generate single-document summary with maximum topic completeness and minimum redundancy. It initially implements the semantic-class-based vector representations of various kinds of linguistic units in a document by means of HowNet (an existing ontology), which can improve the representation quality of traditional term-based vector space model in a certain degree. Then, by adopting K-means clustering algorithm as well as a clustering analysis algorithm, we can capture the number of different latent topic regions in a document adoptively. Finally, topic representative sentences are selected from each topic region to form the final summary. In order to evaluate the effectiveness of the proposed summarization method, a novel metric which is known as representation entropy is used for summarization redundancy evaluation. Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.
Proceedings Article•10.3115/1220355.1220432•
Corpus and evaluation measures for multiple document summarization with multiple sources

[...]

Tsutomu Hirao, Takahiro Fukusima1, Manabu Okumura2, Chikashi Nobata, Hidetsugu Nanba3 •
Otemon Gakuin University1, Tokyo Institute of Technology2, Hiroshima City University3
23 Aug 2004
TL;DR: A large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus, which annotates not only the important sentences in a document set, but also those among them that have the same content.
Abstract: In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same content. Moreover, we define new evaluation metrics taking redundancy into account and discuss the effectiveness of redundancy minimization.
Book Chapter•10.1007/978-3-540-30542-2_31•
News video summarization based on spatial and motion feature analysis

[...]

Wen-Nung Lie1, Chun-Ming Lai1•
National Chung Cheng University1
30 Nov 2004
TL;DR: The Lagrangian multiplier approach was employed to build optimization in allocating time-lengths for all the segmented shots and getting the best perceived motion activity of the summarized video.
Abstract: In this paper, an efficient and effective summarization algorithm based on the extraction and analysis of spatial and motion features for MPEG news video is proposed. We focus on video feature analysis techniques based on the compressed domain (i.e., MVs and DCT coefficients), without the need of transformation back to the pixel domain. To give the viewers a quick and enough browse of the news content, we adopted a new strategy that the anchor audio is overlaid with the summarized news video. Hence, the detection of anchor shots and the summarization of news segment subject to a time-budget constraint constitute the two main works in this paper. In summarization of news segments, the Lagrangian multiplier approach was employed to build optimization in allocating time-lengths for all the segmented shots and getting the best perceived motion activity of the summarized video. Experiments show that our summarized news videos present an average MOS score of above 4.0 in a subjective test.
Journal Article•10.1145/1039621.1039624•
Usefulness of temporal information automatically extracted from news articles for topic tracking

[...]

Pyung Kim1, Sung-Hyon Myaeng2•
Chungnam National University1, Information and Communications University2
01 Dec 2004-ACM Transactions on Asian Language Information Processing
TL;DR: A relatively simple NLP method for extracting temporal information from Korean news articles, with the goal of improving performance of TDT tasks and showing that time information extracted from the text does indeed help to significantly improve both precision and recall.
Abstract: Temporal information plays an important role in natural language processing (NLP) applications such as information extraction, discourse analysis, automatic summarization, and question-answering. In the topic detection and tracking (TDT) area, the temporal information often used is the publication date of a message, which is readily available but limited in its usefulness. We developed a relatively simple NLP method for extracting temporal information from Korean news articles, with the goal of improving performance of TDT tasks. To extract temporal information, we make use of finite state automata and a lexicon containing timerevealing vocabulary. Extracted information is converted into a canonicalized representation of a time point or a time duration. We first evaluated and investigated the extraction and canonicalization methods for their accuracy and the extent to which temporal information extracted as such can help TDT tasks. The experimental results show that time information extracted from the text does indeed help to significantly improve both precision and recall.
Multi-document summarization using document set type classification

[...]

Junichi Fukumoto1•
Ritsumeikan University1
1 Jan 2004
TL;DR: A summarization system which automatically classifies type of document set and summarizes a document set with its appropriate summarizationmechanism is proposed.
Abstract: In this paper, we propose a summarization system which automatically classifies type of document set and summarizes a document set with its appropriate summarizationmechanism. This system will classify a document set into three types: (a) One topic type, (b) multi-topic type, and (c) others. These types will be identifi ed using information of high frequency nouns and Named Entity. In our multi-document summarization system, unnecessary parts are deleted after summarizing each documents and then multi-document summary is generated. In type (a), unnecessary parts are similar part between summarized documents by single document summarization. In type (b), unnecessary parts are unsimilar parts in documents. In type (c), unnecessaryparts are identified by scores used for single document summarization.
Summarization Experiments in DUC 2004

[...]

Kenneth C. Litkowski
1 Jan 2004
TL;DR: ClCL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts, and the Knowledge Management System was extended to include a refined capability for identifying multiword units for use in keyword generation.
Abstract: CL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts. We extended the Knowledge Management System to include (1) a refined capability for identifying multiword units (phrases) for use in keyword generation, (2) the incorporation of word-sense disambiguation to tag senses and identify semantic types, and (3) the integration of question-answering functionality into the summarization framework. We did not devote much effort in refining our system to create summaries for the five tasks, but achieved reasonable levels of performance. We viewed the length restrictions imposed on the tasks as not providing sufficient flexibility to investigate different modes of summarization. We viewed the tasks of summarizing machine translations of poor quality as not very interesting. We used Tasks 1 and 3 to develop and r efine a keywor d gen er ation capa bility, ach ieving levels of four th of 18 a nd fourth of 10 priority 1 systems. In the more general summarization tasks, our performance was near the bottom of participating systems, but still achieved acceptable levels of performance. We performed much better on quality measures with our extraction-based summaries, with an overall level of third of 14 systems for Task 5. For several quality measures, our performance was somewhat less; these levels identify specifically those areas of summarization analysis where the use of an XML representation are particularly amenable to improvement. While we will continue to improve our summarization capability within the general guidelines, we believe that summarization is only one part of document understanding and may not represent needs of users for document exploration at a much deeper level.
Extending Document Summarization to Information Graphics

[...]

Sandra Carberry, Stephanie Elzer, Nancy L. Green, Kathleen F. McCoy, Daniel L. Chester 
1 Jul 2004
TL;DR: It is argued that the message that the graphic designer intended to convey must play a major role in determining the content of the summary, and the approach to identifying this intended message and using it to construct the summary is outlined.
Abstract: Information graphics (non-pictorial graphics such as bar charts or line graphs) are an important component of multimedia documents. Often such graphics convey information that is not contained elsewhere in the document. Thus document summarization must be extended to include summarization of information graphics. This paper addresses our work on graphic summarization. It argues that the message that the graphic designer intended to convey must play a major role in determining the content of the summary, and it outlines our approach to identifying this intended message and using it to construct the summary.
Book Chapter•10.1007/978-3-540-30480-7_31•
Temporal Web Page Summarization

[...]

Adam Jatowt1, Mitsuru Ishizuka1•
University of Tokyo1
22 Nov 2004
TL;DR: A new method for temporal web page summarization based on trend and variance analysis is presented, which can be also used for summarization of dynamic collections of topically related web pages.
Abstract: In the recent years the Web has become an important medium for communication and information storage. As this trend is predicted to continue, it is necessary to provide efficient solutions for retrieving and processing information found in WWW. In this paper we present a new method for temporal web page summarization based on trend and variance analysis. In the temporal summarization web documents are treated as dynamic objects that have changing contents and characteristics. The sequential versions of a single web page are retrieved during predefined time interval for which the summary is to be constructed. The resulting summary should represent the most popular, evolving concepts which are found in web document versions. The proposed method can be also used for summarization of dynamic collections of topically related web pages.
Handling Figures in Document Summarization

[...]

Robert P. Futrelle1•
Northeastern University1
1 Jul 2004
TL;DR: The focus is on diagrams (line drawings) because they allow parsing techniques to be used, in contrast to the difficulties of general image understanding, and the advances in raster image vectorization and parsing needed to produce corpora for diagram summarization.
Abstract: Some document genres contain a large number of figures. This position paper outlines approaches to diagram summarization that can augment the many well-developed techniques of text summarization. We discuss figures as surrogates for entire documents, thumbnails, extraction, the relations between text and figures as well as how automation might be achieved. The focus is on diagrams (line drawings) because they allow parsing techniques to be used, in contrast to the difficulties of general image understanding. We describe the advances in raster image vectorization and parsing needed to produce corpora for diagram summarization.
Proceedings Article•10.1109/ITCC.2004.1286634•
Information-content based sentence extraction for text summarization

[...]

D. Mallett, J. Elding, Mario A. Nascimento1•
University of Alberta1
5 Apr 2004
TL;DR: The FULL-COVERAGE summarizer is proposed: an efficient, information retrieval oriented method to extract nonredundant sentences from text for summarization purposes that can produce sentence-based summaries that are up to 78% smaller than the original text with only 3% loss in retrieval performance.
Abstract: We propose the FULL-COVERAGE summarizer: an efficient, information retrieval oriented method to extract nonredundant sentences from text for summarization purposes. Our method leverages existing information retrieval technology by extracting key-sentences on the premise that the relevance of a sentence is proportional to its similarity to the whole document. We show that our method can produce sentence-based summaries that are up to 78% smaller than the original text with only 3% loss in retrieval performance.
Proceedings Article•10.1109/WI.2004.110•
Ontology-Based Structured Cosine Similarity in Speech Document Summarization

[...]

Soe-Tsyr Yuan1, Jerry Sun2•
National Chengchi University1, Fu Jen Catholic University2
20 Sep 2004
TL;DR: A novel method named Structured Cosine Similarity is presented that furnishes document clustering with a new way of modeling on document summarization, considering the structure of terms in documents in order to improve the quality of speech document clusters.
Abstract: Development of algorithms for automated text categorization in massive text document sets is an important research area of data mining and knowledge discovery. Most of the text-clustering methods were grounded in the term-based measurement of distance or similarity, ignoring the structure of terms in documents. In this paper we present a novel method named Structured Cosine Similarity that furnishes document clustering with a new way of modeling on document summarization, considering the structure of terms in documents in order to improve the quality of speech document clustering.
Multi-Document Summarization Using Cross-Language Texts

[...]

Jung-Min Lim, In-Su Kang, Jong-Hyeok Lee1•
Pohang University of Science and Technology1
1 Jan 2004
TL;DR: This work tries to generate a summary in source language, using translated documents by a machine translator and a summarization system in target language, and shows the possibility of multi-documents summarization, using crosslanguage texts.
Abstract: Without a summarization system in source language, we try to generate a summary in source language, using translated documents by a machine translator and a summarization system in target language. For summarizing multiple documents translated by a machine translator, we extract important sentences, and remove redundant sentences using an improved term-weighting method. It assigns weights to words, using syntactic information. According to the score of the extracted sentence, we choose sentences, and map them to Japanese sentences in original documents. Finally, we arrange Japanese sentences in chronological order, and report them as the result of our system. We submitted both a short and long type of summary, and the evaluation of our results is not good. However, our approach shows the possibility of multi-documents summarization, using crosslanguage texts.
Query-based Multidocument Summarization for Information Retrieval

[...]

Toshihiko Sakurai1, Akira Utsumi1•
University of Electro-Communications1
1 Jan 2004
TL;DR: This paper presents a genre-independent method of generating a single summary from a set of the retrieved documents for information retrieval, which generates the core part of the summary from the most relevant document to a query.
Abstract: This paper presents a genre-independent method of generating a single summary from a set of the retrieved documents for information retrieval. The proposed method generates the core part of the summary from the most relevant document to a query, and then the additional part of the summary, which elaborates on the topics, from the other documents. In order to evaluate the validity of the proposed method, we participated in TSC (Text Summarization Challenge) in the 4th NTCIR Workshop. The performance was not satisfactory for the specific task, but we believe that our method would be useful for a set of documents including various genres, such as one retrieved by Web search engines.
Proceedings Article•10.3115/1220355.1220418•
Multi-answer-focused multi-document summarization using a question-answering engine

[...]

Tatsunori Mori1, Masanori Nozawa1, Yoshiaki Asada1•
Yokohama National University1
23 Aug 2004
TL;DR: A method to calculate sentence importance using scores produced by a Question-Answering engine in response to multiple questions and an integration of it into a generic multi-document summarization system is described.
Abstract: Recent years, the answer-focused summarization is paid attention to as a technology complementary to information retrieval and question answering. In order to realize multi-document summarization focused by multiple questions, we propose a method to calculate sentence importance using scores produced by a Question-Answering engine in response to multiple questions. We also describe an integration of it into a generic multi-document summarization system. The evaluation results show that the proposed method has better performance than not only several baselines but also other participants' systems in the evaluation workshop NTCIR4 TSC3 Formal Run, although we have to take notice of the fact that some of the other systems do not use the information of questions.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve