Query-oriented text summarization based on hypergraph transversals

doi:10.1016/J.IPM.2019.03.003

Open AccessJournal Article10.1016/J.IPM.2019.03.003

Query-oriented text summarization based on hypergraph transversals

H. Van Lierde, +1 more

- 01 Jul 2019

- Information Processing and Management

- Vol. 56, Iss: 4, pp 1317-1338

61

TL;DR: A new method using the powerful theory of hypergraph transversals is proposed to address the issue of selecting non-redundant sentences jointly covering the main query-relevant topics of a corpus, and is cheaper than existing hypergraph-based summarizers in terms of computational time complexity.

Abstract: The rise in the amount of textual resources available on the Internet has created the need for tools of automatic document summarization. The main challenges of query-oriented extractive summarization are (1) to identify the topics of the documents and (2) to recover query-relevant sentences of the documents that together cover these topics. Existing graph- or hypergraph-based summarizers use graph-based ranking algorithms to produce individual scores of relevance for the sentences. Hence, these systems fail to measure the topics jointly covered by the sentences forming the summary, which tends to produce redundant summaries. To address the issue of selecting non-redundant sentences jointly covering the main query-relevant topics of a corpus, we propose a new method using the powerful theory of hypergraph transversals. First, we introduce a new topic model based on the semantic clustering of terms in order to discover the topics present in a corpus. Second, these topics are modeled as the hyperedges of a hypergraph in which the nodes are the sentences. A summary is then produced by generating a transversal of nodes in the hypergraph. Algorithms based on the theory of submodular functions are proposed to generate the transversals and to build the summaries. The proposed summarizer outperforms existing graph- or hypergraph-based summarizers by at least 6% of ROUGE-SU4 F-measure on DUC 2007 dataset. It is moreover cheaper than existing hypergraph-based summarizers in terms of computational time complexity.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 2: Example of hypergraph and minimal hypergraph transversal.

Table 2: Comparison with related graph- and hypergraph-based summarization systems.

Table 3: Comparison with DUC2005, DUC2006 and DUC2007 systems

Figure 3: ROUGE-2 and ROUGE-SU4 as a function of δ for λ = 0.4 and µ = 1.98.

Figure 4: ROUGE-2 and ROUGE-SU4 as a function of λ for δ = 0.85 and µ = 1.98.

Citations

•Journal Article•10.1109/TASLP.2020.3037401

A Divide-and-Conquer Approach to the Summarization of Long Documents

Alexios Gidiotis, +1 more

- 13 Apr 2020

- IEEE Transactions on Audio, Speech, and ...

TL;DR: The authors proposed a divide-and-conquer method for the neural summarization of long documents, which exploits the discourse structure of the document and uses sentence similarity to split the problem into an ensemble of smaller summarization problems.

...read moreread less

96

•Journal Article•10.1016/J.EIJ.2019.12.002

Extractive multi-document text summarization based on graph independent sets

Taner Uçkan, +1 more

- 01 Sep 2020

- Egyptian Informatics Journal

TL;DR: The Maximum Independent Set, which has not been used previously in any summarization study, has been utilized within the context of this study and a text processing tool is suggested in order to preserve the semantic cohesion between sentences in the representation stage of introductory texts.

...read moreread less

85

•Posted Content

A Divide-and-Conquer Approach to the Summarization of Long Documents

Alexios Gidiotis, +1 more

- 13 Apr 2020

- arXiv: Computation and Language

TL;DR: This work exploits the discourse structure of the document and uses sentence similarity to split the problem into an ensemble of smaller summarization problems, which can decompose the problem of long document summarization into smaller and simpler problems, reducing computational complexity and creating more training examples.

...read moreread less

77

Journal Article•10.1016/J.IPM.2020.102264

EdgeSumm: Graph-based framework for automatic text summarization

Wafaa S. El-Kassas, +4 more

- 01 Nov 2020

- Information Processing and Management

TL;DR: A novel extractive graph-based framework “EdgeSumm” that relies on four proposed algorithms to enhance ATS for single documents and is general for any document genre and unsupervised so it does not require any training data.

...read moreread less

73

Journal Article•10.1007/s42979-022-01446-w

Automatic Text Summarization Methods: A Comprehensive Review

Divakar Yadav, +2 more

- 28 Oct 2022

- SN computer science

TL;DR: A detailed state-of-the-art analysis of text summarization concepts such as summarization approaches, techniques used, standard datasets, evaluation metrics and future scopes for research is provided in this paper .

...read moreread less

59

...

Expand

References

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

•Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

- 03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

25.5K

•Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, +3 more

- 02 Aug 1996

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

20.3K

•Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Martin Ester, +3 more

- 01 Jan 1996

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

17.8K

Journal Article•10.1145/331499.331504

Data clustering: a review

Anil K. Jain, +2 more

- 01 Sep 1999

- ACM Computing Surveys

TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.

...read moreread less

15.1K