Evaluation of Text Generation: A Survey

Open AccessPosted Content

Evaluation of Text Generation: A Survey

- 26 Jun 2020

371

TL;DR: This paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/ACCESS.2022.3219448

K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria

Seonmin Koo, +6 more

- IEEE Access

TL;DR: This paper proposed a gold-standard test set called the Korean Neural Grammatical Correction Test set (K-NCT) for Korean grammatical error correction using a new error type classification guideline.

...read moreread less

4

•Posted Content

Multitask Learning for Class-Imbalanced Discourse Classification.

Alexander Spangher, +3 more

- 02 Jan 2021

- arXiv: Computation and Language

TL;DR: This article performed an extensive analysis on sentence-level classification approaches for the News Discourse dataset, one of the largest high-level semantic discourse datasets recently published, and showed that a multitask approach can improve 7% Micro F1 score upon current state-of-the-art benchmarks, due in part to label corrections across tasks, which improve performance for underrepresented classes.

...read moreread less

4

Journal Article•10.1093/bib/bbae142

GLDM: hit molecule generation with constrained graph latent diffusion model

Conghao Wang, +3 more

- 27 Mar 2024

- Briefings in Bioinformatics

TL;DR: Graph Latent Diffusion Model (GLDM) is proposed—a latent DM that preserves both the effectiveness of autoencoders of compressing complex chemical data and the DM’s capabilities of generating novel molecules.

...read moreread less

3

Journal Article•10.48550/arXiv.2305.14625

KNN-LM Does Not Improve Open-ended Text Generation

Shufan Wang, +5 more

- 24 May 2023

- arXiv.org

TL;DR: This paper study the generation quality of interpolation-based retrieval-augmented language models (LMs) and find that interpolating with a retrieval distribution actually increases perplexity compared to a baseline Transformer LM for the majority of tokens in the WikiText-103 test set.

...read moreread less

3

Journal Article•10.18653/v1/2023.findings-emnlp.606

DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM

Weijie Xu, +3 more

- 01 Jan 2023

TL;DR: DeTiME is a novel framework for Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM that generates highly clusterable embeddings and enhances topic generation capabilities.

...read moreread less

3

...

Expand

References

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

Proceedings Article•10.3115/V1/D14-1162

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

- 01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

41.6K

•Journal Article•10.1177/001316446002000104

A Coefficient of agreement for nominal Scales

Jacob Cohen

- 01 Apr 1960

- Educational and Psychological Measuremen...

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.

...read moreread less

41.1K

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

...

Expand

Evaluation of Text Generation: A Survey

Chat with Paper

AI Agents for this Paper

Citations

K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria

Multitask Learning for Class-Imbalanced Discourse Classification.

GLDM: hit molecule generation with constrained graph latent diffusion model

KNN-LM Does Not Improve Open-ended Text Generation

DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM

References

Long short-term memory

Attention is All you Need

Glove: Global Vectors for Word Representation

A Coefficient of agreement for nominal Scales

Bleu: a Method for Automatic Evaluation of Machine Translation

Related Papers (5)

Bleu: a Method for Automatic Evaluation of Machine Translation

ROUGE: A Package for Automatic Evaluation of Summaries

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

RoBERTa: A Robustly Optimized BERT Pretraining Approach