Clone-advisor: recommending code tokens and clone methods with deep learning and information retrieval
TL;DR: Zhang et al. as discussed by the authors proposed a deep learning approach for modeling code clones along with non-cloned code to predict the next set of tokens (possibly a complete clone method body) based on the code written so far.
read more
Abstract: Software developers frequently reuse source code from repositories as it saves development time and effort. Code clones (similar code fragments) accumulated in these repositories represent often repeated functionalities and are candidates for reuse in an exploratory or rapid development. To facilitate code clone reuse, we previously presented DeepClone, a novel deep learning approach for modeling code clones along with non-cloned code to predict the next set of tokens (possibly a complete clone method body) based on the code written so far. The probabilistic nature of language modeling, however, can lead to code output with minor syntax or logic errors. To resolve this, we propose a novel approach called Clone-Advisor. We apply an information retrieval technique on top of DeepClone output to recommend real clone methods closely matching the predicted clone method, thus improving the original output by DeepClone. In this paper we have discussed and refined our previous work on DeepClone in much more detail. Moreover, we have quantitatively evaluated the performance and effectiveness of Clone-Advisor in clone method recommendation.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A survey on machine learning techniques applied to source code
Tushar Sharma,Maria Kechagia,Stefanos Georgiou,Rohit Tiwari,Indira Vats,Hadi Moazen,Federica Sarro +6 more
TL;DR: This paper surveys 494 studies on machine learning techniques applied to source code analysis, summarizing 12 software engineering tasks, tools, and datasets, and highlighting increasing use, challenges, and a comprehensive list of available resources.
15
Code Clone Configuration as a Multi-Objective Search Problem
Denis Sousa,Matheus Paixão,Chaiyong Ragkhitwetsagul,Italo Uchoa +3 more
- 15 Oct 2024
1
Clone-Writer: An effective editor for developing code by using code clones
TL;DR: Clone-Writer as discussed by the authors is an automated software development tool that recommends code clones on the basis of code written so far, and developers can perform code clone search based on a search query written either as source code terms, or as natural language.
1
Clone-Seeker: Effective Code Clone Search Using Annotations
01 Jan 2022
TL;DR: Zhang et al. as mentioned in this paper proposed a novel approach called clone-Seeker that focuses on utilizing clone class features in retrieving code clones, which can help developers to perform code clone search based on a search query written either as source code terms, or as natural language.
A Survey on Machine Learning Techniques for Source Code Analysis
Tushar Sharma,Maria Kechagia,Stefanos Georgiou,Rohit Tiwari,Indira Vats,Hadi Moazen,Federica Sarro +6 more
TL;DR: This survey of 479 studies (2011-2021) on machine learning for source code analysis identifies increasing adoption, synthesizes workflows and techniques, and highlights challenges in standard datasets, reproducibility, and hardware resources for software engineering tasks.
References
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
A mathematical theory of communication
TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
74.4K
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
- 01 Jan 2014
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
•Proceedings Article
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov,Kai Chen,Greg S. Corrado,Jeffrey Dean +3 more
- 16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
27.5K