Unsupervised Multi-document Summarization with Holistic Inference

doi:10.48550/arxiv.2309.04087

Journal Article10.48550/arxiv.2309.04087

Unsupervised Multi-document Summarization with Holistic Inference

Haopeng Zhang, +6 more

- 08 Sep 2023

- arXiv.org

- Vol. abs/2309.04087

2

TL;DR: This paper proposes a new holistic framework for unsupervised multi-document extractive summarization that incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI).

Abstract: Multi-document summarization aims to obtain core information from a collection of documents written on the same topic. This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Our method incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI). SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners. To demonstrate the effectiveness of our method, we conduct extensive experiments on both small and large-scale multi-document summarization datasets under both unsupervised and adaptive settings. The proposed method outperforms strong baselines by a significant margin, as indicated by the resulting ROUGE scores and diversity measures. Our findings also suggest that diversity is essential for improving multi-document summary performance.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Preprint•10.48550/arxiv.2406.11289

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

Haopeng Zhang, +2 more

- 17 Jun 2024

TL;DR: A comprehensive survey of text summarization research covering traditional methods, deep learning approaches, PLM fine-tuning, and recent advancements in LLMs. It provides an overview of datasets, evaluation metrics, summarization methods, and future research directions.

...read moreread less

15

Journal Article•10.1109/icaccs60874.2024.10717223

Multi-Document Summarization Using LLAMA 2 Model with Transfer Learning

K. N. Sunilkumar, +1 more

- 14 Mar 2024

TL;DR: This research introduces LLama2, a novel multi-document summarization approach leveraging advanced language models, natural language processing, and machine learning to efficiently condense complex narratives into concise summaries with superior performance.

...read moreread less

References

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Proceedings Article

The PageRank Citation Ranking : Bringing Order to the Web

Lawrence Page, +3 more

- 11 Nov 1999

TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.

...read moreread less

16.4K

•Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

- 25 Jul 2004

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

14.8K

•Proceedings Article•10.18653/V1/D19-1410

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers, +1 more

- 14 Aug 2019

TL;DR: Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.

...read moreread less

12K

•Proceedings Article•10.18653/V1/2020.ACL-MAIN.703

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Michael Lewis, +7 more

- 01 Jul 2020

TL;DR: BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.

...read moreread less

11.5K

...

Expand