Journal Article10.48550/arxiv.2409.19247
Edit-Constrained Decoding for Sentence Simplification
Tatsuya Zetsu,Yuki Arase,Tomoyuki Kajiwara +2 more
- 28 Sep 2024
TL;DR: This study proposes edit-constrained decoding for sentence simplification, introducing stricter constraints that replicate edit operations, outperforming previous methods on three English corpora, improving sentence simplification efficacy and accuracy.
read more
Abstract: We propose edit operation based lexically constrained decoding for sentence simplification. In sentence simplification, lexical paraphrasing is one of the primary procedures for rewriting complex sentences into simpler correspondences. While previous studies have confirmed the efficacy of lexically constrained decoding on this task, their constraints can be loose and may lead to sub-optimal generation. We address this problem by designing constraints that replicate the edit operations conducted in simplification and defining stricter satisfaction conditions. Our experiments indicate that the proposed method consistently outperforms the previous studies on three English simplification corpora commonly used in this task.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 1: Our method constrains generation based on edit operations during beam search. Here, ‘artisans’ is replaced by ‘craftsmen’ by a substitution constraint. 
Table 2: Evaluation results with oracle constraints; the scores were measured on the single references from which the constraints were extracted (‘BS’ and ‘Len’ represent BERTScore and average output length, respectively). 
Table 5: Example outputs of simplification models (bold constraints are satisfied by the proposed method.) 
Table 3: Percentage of satisfied constraints; ‘Comp.’ represents (Zetsu et al., 2022). 
Table 9: Evaluation results with predicted constraints; the scores were measured using all of the multi-references (‘BS’ and ‘Len’ represent BERTScore and average output length, respectively). 
Table 6: Evaluation results with predicted constraints; the scores were measured on the same single references with Table 2 (‘BS’ and ‘Len’ represent BERTScore and average output length, respectively).
References
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Bleu: a Method for Automatic Evaluation of Machine Translation
Kishore Papineni,Salim Roukos,Todd Ward,Wei-Jing Zhu +3 more
- 06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Michael Lewis,Yinhan Liu,Naman Goyal,Marjan Ghazvininejad,Abdelrahman Mohamed,Omer Levy,Veselin Stoyanov,Luke Zettlemoyer +7 more
- 01 Jul 2020
TL;DR: BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.
Moses: Open Source Toolkit for Statistical Machine Translation
Philipp Koehn,Hieu Hoang,Alexandra Birch,Chris Callison-Burch,Marcello Federico,Nicola Bertoldi,Brooke Cowan,Wade Shen,C. Corbett Moran,Richard Zens,Chris Dyer,Ondrej Bojar,Alexandra Elena Constantin,Evan Herbst +13 more
- 25 Jun 2007
TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.
The Unified Medical Language System (UMLS): integrating biomedical terminology
TL;DR: The Unified Medical Language System is a repository of biomedical vocabularies developed by the US National Library of Medicine and includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap).