Book Chapter10.1007/978-981-16-1927-4_9
Source Code Clone Search.
Iman Keivanloo,Juergen Rilling +1 more
- 01 Jan 2021
- pp 121-134
4
TL;DR: In this paper, the authors identify and define major concepts related to clone search and present a framework that summarizes the architecture of a clone search engine and enables us to provide a systematic view of the internals of such an engine.
read more
Abstract: Identifying similarities in source code is the main challenge for reuse, plagiarism, and code clone detection. Code clone search has emerged as a new research branch in clone detection, aiming to provide similarity search functionality for code snippets. While clone search shares its fundamentals with clone detection, both its objective and requirements differ significantly. Clone search focuses on search engines that are designed to find clones of a single input code snippet (i.e., query) from a large set of code snippets (i.e., corpus). Scalability, short response time, and the ability to rank result sets among the major challenges have to be dealt with by a clone search engine. In this chapter, we identify and define major concepts related to clone search. We then present a framework that summarizes the architecture of a clone search engine and enables us to provide a systematic view of the internals of such an engine. Finally, we discuss how to benchmark and evaluate the performance of clone search engines. The discussion includes a set of measures that are helpful in evaluating clone search engines.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Novel Vulnerable Code Clone Detector Based on Context Enhancement and Patch Validation
Junjun Guo,Haonan Li,Zhengyuan Wang,Li Zhang,Chang-fei Wang +4 more
TL;DR: A vulnerable code clone detection method based on code fingerprints, named the Context-enhanced and Patch-validation-based Vulnerable code clone Detector (CPVDetector), which can achieve high accuracy with a fast detection speed, and the FPR is as low as 2.35%, which is less than one-third of that of other existing methods.
Evaluating few shot and Contrastive learning Methods for Code Clone Detection
TL;DR: This research assesses the generalizability of the state of the art models for CCD in few shot settings and employs Model Agnostic Meta-learning (MAML), where the model learns a meta-learner capable of extracting transferable knowledge from the train set so that the model can be fine-tuned using a few samples.
Evaluating few-shot and contrastive learning methods for code clone detection
Mohamad Khajezade,Fatemeh H. Fard,Mohamed Shehata +2 more
Can CodeT5 embeddings be adapted for efficient Code Clone Detection and Retrieval?
Chinmayee Rane,M. Tech +1 more
TL;DR: In this article , the authors propose a method to solve the problem of "uniformity" and "uncertainty" in the context of education.iii.iiiiii.
References
•Book
Introduction to Information Retrieval
Christopher D. Manning,Prabhakar Raghavan,Hinrich Schütze +2 more
- 01 Jan 2008
TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
TL;DR: A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.
Towards a Big Data Curated Benchmark of Inter-project Code Clones
Jeffrey Svajlenko,Judith F. Islam,Iman Keivanloo,Chanchal K. Roy,Mohammad Mamun Mia +4 more
- 29 Sep 2014
TL;DR: A Big Data clone detection benchmark that consists of known true and false positive clones in a Big Data inter-project Java repository and it is shown how the benchmark can be used to measure the recall and precision of clone detection techniques.
Siamese: scalable and incremental code clone search via multiple code representations
TL;DR: Siamese’s incremental indexing capability dramatically decreases the index preparation time for large-scale data sets with multiple releases of software projects, and the applications of Siamese to facilitate software development and research are discussed.
A third generation Smalltalk-80 implementation
Patrick J. Caudill,Allen Wirfs-Brock +1 more
- 01 Jun 1986
TL;DR: A new, high performance Smalltalk-80™ implementation is described which builds directly upon two previous implementation efforts and supports a large object space while retaining compatibility with previous Smalltalk -80™ images.
64