Open AccessPosted Content
Evaluation of Text Generation: A Survey
TL;DR: This paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models.
read more
Abstract: The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic metrics that require no training, and (3) machine-learned metrics For each category, we discuss the progress that has been made and the challenges still being faced, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models We then present two examples for task-specific NLG evaluations for automatic text summarization and long text generation, and conclude the paper by proposing future research directions
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria
TL;DR: This paper proposed a gold-standard test set called the Korean Neural Grammatical Correction Test set (K-NCT) for Korean grammatical error correction using a new error type classification guideline.
4
•Posted Content
Multitask Learning for Class-Imbalanced Discourse Classification.
TL;DR: This article performed an extensive analysis on sentence-level classification approaches for the News Discourse dataset, one of the largest high-level semantic discourse datasets recently published, and showed that a multitask approach can improve 7% Micro F1 score upon current state-of-the-art benchmarks, due in part to label corrections across tasks, which improve performance for underrepresented classes.
4
GLDM: hit molecule generation with constrained graph latent diffusion model
Conghao Wang,Hiok Hian Ong,Shunsuke Chiba,Jagath C. Rajapakse +3 more
TL;DR: Graph Latent Diffusion Model (GLDM) is proposed—a latent DM that preserves both the effectiveness of autoencoders of compressing complex chemical data and the DM’s capabilities of generating novel molecules.
3
KNN-LM Does Not Improve Open-ended Text Generation
TL;DR: This paper study the generation quality of interpolation-based retrieval-augmented language models (LMs) and find that interpolating with a retrieval distribution actually increases perplexity compared to a baseline Transformer LM for the majority of tokens in the WikiText-103 test set.
DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM
Weijie Xu,Wenxiang Hu,Fanyou Wu,Srinivasan H. Sengamedu +3 more
- 01 Jan 2023
TL;DR: DeTiME is a novel framework for Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM that generates highly clusterable embeddings and enhances topic generation capabilities.
References
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
A Coefficient of agreement for nominal Scales
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Bleu: a Method for Automatic Evaluation of Machine Translation
Kishore Papineni,Salim Roukos,Todd Ward,Wei-Jing Zhu +3 more
- 06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.