Clone-Seeker: Effective Code Clone Search Using Annotations
TL;DR: Zhang et al. as mentioned in this paper proposed a novel approach called clone-Seeker that focuses on utilizing clone class features in retrieving code clones, which can help developers to perform code clone search based on a search query written either as source code terms, or as natural language.
read more
Abstract: Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is needed for tasks like automated software transplantation, software diversification, and software repair. Code clones, which are syntactically or semantically similar code fragments, are perfect candidates for such tasks. Searching for code clones involves a given search query to retrieve the relevant code fragments. We propose a novel approach called Clone-Seeker that focuses on utilizing clone class features in retrieving code clones. For this purpose, we generate metadata for each code clone in the form of a natural language document. The metadata includes a pre-processed list of identifiers from the code clones augmented with a list of keywords indicating the semantics of the code clone. This keyword list can be extracted from a manually annotated general description of the clone class, or automatically generated from the source code of the entire clone class. This approach helps developers to perform code clone search based on a search query written either as source code terms, or as natural language. With various experiments, we show that (1) Clone-Seeker is effective in finding clones from BigCloneBench dataset by applying code queries and natural language queries; 2) Clone-Seeker has a higher recall when searching for semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; 3) Clone-Seeker is a generalized technique as it is effective in finding clones in Project CodeNet dataset by applying code queries and natural language queries. 4) Clone-Seeker with manual annotation outperforms other variants in finding clones on the basis of natural language queries
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges
TL;DR: A systematic literature review and meta-analysis on code similarity measurement and evaluation techniques is presented in this paper to shed light on the existing approaches and their characteristics in different applications, including code recommendation, duplicate code, plagiarism, malware, and smell detection.
18
Boosting Code Search with Structural Code Annotation
TL;DR: The experimental results show that CodeHunter obtains more effective results than Lucene and DeepCS, and it is proved that the effectiveness comes from the rich features and search models, CodeHunter can work well with different sizes of query descriptions.
Clone-Writer: An effective editor for developing code by using code clones
TL;DR: Clone-Writer as discussed by the authors is an automated software development tool that recommends code clones on the basis of code written so far, and developers can perform code clone search based on a search query written either as source code terms, or as natural language.
1
A Comprehensive Study on Code Clones in Automated Driving Software
Ran Mo,Yingjie Jiang,Wenjing Zhan,Dongyu Wang,Zengyang Li +4 more
- 11 Sep 2023
TL;DR: Through the analysis of Apollo and Autoware, it is presented that code clones are prevalent in automated driving software and there exist cross-module clones to propagate bugs and co-modifications in different modules, which undermine the software's modularity.
1
Source Code Similarity Analysis: Comprehensive Review, Approaches, Applications, and Challenges in Clone Detection
Geetika,Navdeep Kaur,Amandeep Kaur,Geetika,Navdeep Kaur,Amandeep Kaur +5 more
- 01 Oct 2025
References
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
•Proceedings Article
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov,Kai Chen,Greg S. Corrado,Jeffrey Dean +3 more
- 16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
27.5K
•Book
Introduction to Information Retrieval
Christopher D. Manning,Prabhakar Raghavan,Hinrich Schütze +2 more
- 01 Jan 2008
TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
TL;DR: A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.