Open AccessPosted Content
Clone-Seeker: Effective Code Clone Search Using Annotations.
3
TL;DR: Clone-Seeker as discussed by the authors uses a pre-processed list of identifiers from the code clones augmented with a list of keywords indicating the semantics of the code clone, which can be extracted from a manually annotated general description of the clone class, or automatically generated from the source code of the entire clone class.
read more
Abstract: Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is needed for tasks like automated software transplantation, software diversification, and software repair. Code clones, which are syntactically or semantically similar code fragments, are perfect candidates for such tasks. Searching for code clones involves a given search query to retrieve the relevant code fragments. We propose a novel approach called Clone-Seeker that focuses on utilizing clone class features in retrieving code clones. For this purpose, we generate metadata for each code clone in the form of a natural language document. The metadata includes a pre-processed list of identifiers from the code clones augmented with a list of keywords indicating the semantics of the code clone. This keyword list can be extracted from a manually annotated general description of the clone class, or automatically generated from the source code of the entire clone class. This approach helps developers to perform code clone search based on a search query written either as source code terms, or as natural language. In our quantitative evaluation, we show that (1) Clone-Seeker has a higher recall when searching for semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; and (2) Clone-Seeker can accurately search for relevant code clones by applying natural language queries.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges
TL;DR: A systematic literature review and meta-analysis on code similarity measurement and evaluation techniques is presented in this paper to shed light on the existing approaches and their characteristics in different applications, including code recommendation, duplicate code, plagiarism, malware, and smell detection.
18
Boosting Code Search with Structural Code Annotation
TL;DR: The experimental results show that CodeHunter obtains more effective results than Lucene and DeepCS, and it is proved that the effectiveness comes from the rich features and search models, CodeHunter can work well with different sizes of query descriptions.
Clone-Writer: An effective editor for developing code by using code clones
TL;DR: Clone-Writer as discussed by the authors is an automated software development tool that recommends code clones on the basis of code written so far, and developers can perform code clone search based on a search query written either as source code terms, or as natural language.
1
References
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
TL;DR: A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.
The vocabulary problem in human-system communication
TL;DR: It is shown how this fundamental property of language limits the success of various design methodologies for vocabulary-driven interaction, and an optimal strategy, unlimited aliasing, is derived and shown to be capable of several-fold improvements.
1.6K
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
Lingxiao Jiang,Ghassan Misherghi,Zhendong Su,Stéphane Glondu +3 more
- 24 May 2007
TL;DR: This paper presents an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code and implemented this algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK.
Related Papers (5)
Iman Keivanloo,Juergen Rilling +1 more
- 01 Jan 2021
Dongmei Zhang,Yingnong Dang,Yingjun Qiu,Song Ge +3 more
- 01 Apr 2010