Query-oriented text summarization based on hypergraph transversals
H. Van Lierde,Tommy W. S. Chow +1 more
TL;DR: A new method using the powerful theory of hypergraph transversals is proposed to address the issue of selecting non-redundant sentences jointly covering the main query-relevant topics of a corpus, and is cheaper than existing hypergraph-based summarizers in terms of computational time complexity.
read more
Abstract: The rise in the amount of textual resources available on the Internet has created the need for tools of automatic document summarization. The main challenges of query-oriented extractive summarization are (1) to identify the topics of the documents and (2) to recover query-relevant sentences of the documents that together cover these topics. Existing graph- or hypergraph-based summarizers use graph-based ranking algorithms to produce individual scores of relevance for the sentences. Hence, these systems fail to measure the topics jointly covered by the sentences forming the summary, which tends to produce redundant summaries. To address the issue of selecting non-redundant sentences jointly covering the main query-relevant topics of a corpus, we propose a new method using the powerful theory of hypergraph transversals. First, we introduce a new topic model based on the semantic clustering of terms in order to discover the topics present in a corpus. Second, these topics are modeled as the hyperedges of a hypergraph in which the nodes are the sentences. A summary is then produced by generating a transversal of nodes in the hypergraph. Algorithms based on the theory of submodular functions are proposed to generate the transversals and to build the summaries. The proposed summarizer outperforms existing graph- or hypergraph-based summarizers by at least 6% of ROUGE-SU4 F-measure on DUC 2007 dataset. It is moreover cheaper than existing hypergraph-based summarizers in terms of computational time complexity.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 1: Algorithm Chart. 
Figure 2: Example of hypergraph and minimal hypergraph transversal. 
Table 2: Comparison with related graph- and hypergraph-based summarization systems. 
Table 3: Comparison with DUC2005, DUC2006 and DUC2007 systems 
Figure 3: ROUGE-2 and ROUGE-SU4 as a function of δ for λ = 0.4 and µ = 1.98. 
Figure 4: ROUGE-2 and ROUGE-SU4 as a function of λ for δ = 0.85 and µ = 1.98.
Citations
A Divide-and-Conquer Approach to the Summarization of Long Documents
TL;DR: The authors proposed a divide-and-conquer method for the neural summarization of long documents, which exploits the discourse structure of the document and uses sentence similarity to split the problem into an ensemble of smaller summarization problems.
96
Extractive multi-document text summarization based on graph independent sets
Taner Uçkan,Ali Karci +1 more
TL;DR: The Maximum Independent Set, which has not been used previously in any summarization study, has been utilized within the context of this study and a text processing tool is suggested in order to preserve the semantic cohesion between sentences in the representation stage of introductory texts.
85
•Posted Content
A Divide-and-Conquer Approach to the Summarization of Long Documents
TL;DR: This work exploits the discourse structure of the document and uses sentence similarity to split the problem into an ensemble of smaller summarization problems, which can decompose the problem of long document summarization into smaller and simpler problems, reducing computational complexity and creating more training examples.
77
EdgeSumm: Graph-based framework for automatic text summarization
TL;DR: A novel extractive graph-based framework “EdgeSumm” that relies on four proposed algorithms to enhance ATS for single documents and is general for any document genre and unsupervised so it does not require any training data.
73
Automatic Text Summarization Methods: A Comprehensive Review
TL;DR: A detailed state-of-the-art analysis of text summarization concepts such as summarization approaches, techniques used, standard datasets, evaluation metrics and future scopes for research is provided in this paper .
59
References
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
•Proceedings Article
Latent Dirichlet Allocation
David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 more
- 03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
•Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 02 Aug 1996
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
20.3K
•Proceedings Article
A density-based algorithm for discovering clusters in large spatial Databases with Noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 01 Jan 1996
TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Data clustering: a review
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.