Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks

doi:10.1145/3543826

Proceedings Article10.1145/3543826

Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks

Seniz Demir

- 08 Jul 2022

Vol. 22, Iss: 2, pp 1-27

3

TL;DR: It is argued that the wealth of knowledge residing in the datasets and the insights obtained from this study hold the potential to give rise to the development of new end-to-end generation approaches for Turkish and other morphologically rich languages.

Abstract: End-to-end data-driven approaches lead to rapid development of language generation and dialogue systems. Despite the need for large amounts of well-organized data, these approaches jointly learn multiple components of the traditional generation pipeline without requiring costly human intervention. End-to-end approaches also enable the use of loosely aligned parallel datasets in system development by relaxing the degree of semantic correspondences between training data representations and text spans. However, their potential in Turkish language generation has not yet been fully exploited. In this work, we apply sequence-to-sequence (Seq2Seq) neural models to Turkish data-to-text generation where the input data given in the form of a meaning representation is verbalized. We explore encoder-decoder architectures with attention mechanism in unidirectional, bidirectional, and stacked recurrent neural network (RNN) models. Our models generate one-sentence biographies and dining venue descriptions using a crowdsourced dataset where all field value pairs that appear in meaning representations are fully captured in reference sentences. To support this work, we also explore the performances of our models on a more challenging dataset, where the content of a meaning representation is too large to fit into a single sentence, and hence content selection and surface realization need to be learned jointly. This dataset is retrieved by coupling introductory sentences of person-related Turkish Wikipedia articles with their contained infobox tables. Our empirical experiments on both datasets demonstrate that Seq2Seq models are capable of generating coherent and fluent biographies and venue descriptions from field value pairs. We argue that the wealth of knowledge residing in our datasets and the insights obtained from this study hold the potential to give rise to the development of new end-to-end generation approaches for Turkish and other morphologically rich languages.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1007/s40747-023-01322-x

DRL-based dependent task offloading with delay-energy tradeoff in medical image edge computing

Qi Liu, +3 more

- 29 Jan 2024

TL;DR: This study proposes DCDO-DRL, a distributed collaborative dependent task offloading strategy using deep reinforcement learning to maximize utility of radiomics-based medical image diagnosis tasks, outperforming other algorithms by up to 23.07% in execution utility.

...read moreread less

5

Proceedings Article•10.18653/v1/2023.ijcnlp-main.13

Phylogeny-Inspired Soft Prompts For Data-to-Text Generation in Low-Resource Languages

William Soto Martinez, +2 more

TL;DR: This paper focuses on KG-to-Text generation where the output text is in Breton, Irish or Welsh and combines the strengths of a multilingual encoder-decoder model with denoising fine-tuning on monolingual data and Soft Prompt fine- Tuning on a small quantity of KG/text data.

...read moreread less

Journal Article•10.1109/asyu58738.2023.10296831

Sentence Detailing and Its Applications

Feyza Şahin, +1 more

- 11 Oct 2023

TL;DR: Sentence Detailing is a method of generating sentences by adding details to a set of words. It is a technique that utilizes the transformer model mT5 to learn commonsense knowledge from news articles and generate appropriate sentences based on the given words.

...read moreread less

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

•Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

25.7K

•Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

- 25 Jul 2004

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

14.8K

...

Expand

Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks

Chat with Paper

AI Agents for this Paper

Citations

DRL-based dependent task offloading with delay-energy tradeoff in medical image edge computing

Phylogeny-Inspired Soft Prompts For Data-to-Text Generation in Low-Resource Languages

Sentence Detailing and Its Applications

References

Adam: A Method for Stochastic Optimization

Attention is All you Need

Bleu: a Method for Automatic Evaluation of Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

ROUGE: A Package for Automatic Evaluation of Summaries

Related Papers (5)

Comparison of RNN Encoder-Decoder Models for Anomaly Detection.

Description of Turkish Paraphrase Corpus Structure and Generation Method

Clinical Named Entity Recognition Using Deep Learning Models.

Automatic keyword extraction from single-sentence natural language queries

Creation of a Corpus of Training Sentences Based on Automated Dialogue Analysis