A Survey of Machine Learning for Big Code and Naturalness

doi:10.1145/3212695

Open AccessJournal Article10.1145/3212695

A Survey of Machine Learning for Big Code and Naturalness

Miltiadis Allamanis, +3 more

- 31 Jul 2018

- ACM Computing Surveys

- Vol. 51, Iss: 4, pp 81

772

TL;DR: A survey of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Shuo Ren, +9 more

- 22 Sep 2020

- arXiv: Software Engineering

TL;DR: This work introduces a new automatic evaluation metric, dubbed CodeBLEU, which absorbs the strength of BLEU in the n-gram match and further injects code syntax via abstract syntax trees (AST) and code semantics via data-flow and can achieve a better correlation with programmer assigned scores compared with BLEu and accuracy.

...read moreread less

427

Journal Article•10.1109/JPROC.2020.2993293

Software Vulnerability Detection Using Deep Neural Networks: A Survey

Guanjun Lin, +4 more

- 04 Jun 2020

TL;DR: This survey reviews the current literature adopting deep-learning-/neural-network-based approaches for detecting software vulnerabilities, aiming at investigating how the state-of-the-art research leverages neural techniques for learning and understanding code semantics to facilitate vulnerability discovery.

...read moreread less

417

•Proceedings Article•10.1109/ICSE.2019.00087

A neural model for generating natural language summaries of program subroutines

Alexander LeClair, +2 more

- 25 May 2019

TL;DR: In this article, a neural model that combines words from code with code structure from an AST is presented, which allows the model to learn code structure independent of the text in code.

...read moreread less

390

Proceedings Article•10.1145/3395363.3397369

CoCoNuT: combining context-aware neural translation models using ensemble for program repair

Thibaud Lutellier, +5 more

- 18 Jul 2020

TL;DR: A new G&V technique—CoCoNuT, which uses ensemble learning on the combination of convolutional neural networks (CNNs) and a new context-aware neural machine translation (NMT) architecture to automatically fix bugs in multiple programming languages.

...read moreread less

376

...

Expand

References

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

•Proceedings Article

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

27.5K

•Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

25.7K

Journal Article•10.1145/1541880.1541882

Anomaly detection: A survey

Varun Chandola, +2 more

- 30 Jul 2009

- ACM Computing Surveys

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

...read moreread less

11.9K

...

Expand

A Survey of Machine Learning for Big Code and Naturalness

Chat with Paper

AI Agents for this Paper

Citations

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Software Vulnerability Detection Using Deep Neural Networks: A Survey

A neural model for generating natural language summaries of program subroutines

CoCoNuT: combining context-aware neural translation models using ensemble for program repair

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

References

Long short-term memory

Bleu: a Method for Automatic Evaluation of Machine Translation

Efficient Estimation of Word Representations in Vector Space

Neural Machine Translation by Jointly Learning to Align and Translate

Anomaly detection: A survey

Related Papers (5)

code2vec: learning distributed representations of code

Summarizing Source Code using a Neural Attention Model

Long short-term memory

Attention is All you Need

Code completion with statistical language models