Exploring Data Augmentation for Code Generation Tasks

doi:10.48550/arXiv.2302.03499

Journal Article10.48550/arXiv.2302.03499

Exploring Data Augmentation for Code Generation Tasks

Pinzhen Chen, +1 more

- 05 Feb 2023

- Findings

- pp 1497-1505

4

TL;DR: The authors proposed and adapted augmentation methods that yield consistent improvements in code translation and summarization by up to 6.9% and 7.5% respectively, and showed that their methods work orthogonally and show benefits in output code style and numeric consistency.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arXiv.2305.19915

Data Augmentation Approaches for Source Code Models: A Survey

Terry Yue Zhuo, +7 more

- 31 May 2023

- arXiv.org

TL;DR: A comprehensive and integrative survey of data augmentation for source code can be found in this article , where the authors systematically compile and encapsulate existing literature to provide a comprehensive overview of the field.

...read moreread less

1

Preprint•10.48550/arxiv.2405.00066

Research and application of artificial intelligence based webshell detection model: A literature review

Mingrui Ma, +2 more

- 28 Apr 2024

TL;DR: AI-based webshell detection research lacks a standardized methodology. This paper reviews the progress of AI-based webshell detection research, dividing it into three stages. The paper analyzes the main characteristics and core algorithms of each stage and identifies pain points and challenges. It also predicts future development trends.

...read moreread less

1

Preprint•10.48550/arxiv.2405.20064

1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

Mingjie Chen, +13 more

- 30 May 2024

TL;DR: The presented system tackles class imbalance problem in speech emotion recognition by introducing focal loss and prior-based class weights. It achieved top-1 performance on the Odyssey 2024 Emotion Recognition Challenge Task-1, with a Macro-F1 score of 35.69% and an accuracy of 37.32%.

...read moreread less

Journal Article•10.48550/arXiv.2305.13504

Neural Machine Translation for Code Generation

KC Dharma, +1 more

- 22 May 2023

- arXiv.org

TL;DR: A survey of NMT for code generation can be found in this article , where a variety of different input scenarios have been explored, including generating code based on natural language description, lower-level representations such as binary or assembly (neural decompilation), partial representations of source code (code completion and repair), and source code in another language (code translation).

...read moreread less

References

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

•Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019

- arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

26.2K

•Proceedings Article•10.18653/V1/P16-1009

Improving Neural Machine Translation Models with Monolingual Data

Rico Sennrich, +2 more

- 12 Aug 2016

TL;DR: The authors used target-side monolingual data for NMT and obtained state-of-the-art performance for several NMT tasks, while only using parallel data for training.

...read moreread less

3.2K

•Posted Content

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Zhangyin Feng, +10 more

- 19 Feb 2020

- arXiv: Computation and Language

TL;DR: This work develops CodeBERT with Transformer-based neural architecture, and trains it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators.

...read moreread less

1.8K

...

Expand