Clone (computing)

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1109/TSE.2002.1019480•

CCFinder: a multilinguistic token-based code clone detection system for large scale source code

[...]

Toshihiro Kamiya¹, Shinji Kusumoto¹, Katsuro Inoue¹•Institutions (1)

Osaka University¹

01 Jul 2002-IEEE Transactions on Software Engineering

TL;DR: A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.

...read moreread less

Abstract: A code clone is a code portion in source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder (Code Clone Finder), which extracts code clones in C, C++, Java, COBOL and other source files. In addition, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques.

...read moreread less

1,928 citations

Proceedings Article•10.1145/1081706.1081737•

An empirical study of code clone genealogies

[...]

Miryung Kim¹, Vibha Sazawal¹, David Notkin¹, Gail C. Murphy²•Institutions (2)

University of Washington¹, University of British Columbia²

1 Sep 2005

TL;DR: This study developed a formal denition of clone evolution and built a clone genealogy tool that automatically extracts the history of code clones from a source code repository and analyzed their evolution.

...read moreread less

Abstract: It has been broadly assumed that code clones are inherently bad and that eliminating clones by refactoring would solve the problems of code clones. To investigate the validity of this assumption, we developed a formal denition of clone evolution and built a clone genealogy tool that automatically extracts the history of code clones from a source code repository. Using our tool we extracted clone genealogy information for two Java open source projects and analyzed their evolution. Our study contradicts some conventional wisdom about clones. In particular, refactoring may not always improve software with respect to clones for two reasons. First, many code clones exist in the system for only a short time; extensive refactoring of such short-lived clones may not be worthwhile if they are likely diverge from one another very soon. Second, many clones, especially long-lived clones that have changed consistently with other elements in the same group, are not easily refactorable due to programming language limitations. These insights show that refactoring will not help in dealing with some types of clones and open up opportunities for complementary clone maintenance tools that target these other classes of clones.

...read moreread less

657 citations

Journal Article•10.1016/J.INFSOF.2013.01.008•

Software clone detection: A systematic review

[...]

Dhavleesh Rattan¹, Rajesh Bhatia², Maninder Singh³•Institutions (3)

Baba Banda Singh Bahadur Engineering College¹, Deenbandhu Chhotu Ram University of Science and Technology², Thapar University³

01 Jul 2013-Information & Software Technology

TL;DR: An extensive systematic literature review of software clones in general and software clone detection in particular calls for an increased awareness of the potential benefits of software clone management, and identifies the need to develop semantic and model clone detection techniques.

...read moreread less

Abstract: Context Reusing software by means of copy and paste is a frequent activity in software development. The duplicated code is known as a software clone and the activity is known as code cloning. Software clones may lead to bug propagation and serious maintenance problems. Objective This study reports an extensive systematic literature review of software clones in general and software clone detection in particular. Method We used the standard systematic literature review method based on a comprehensive set of 213 articles from a total of 2039 articles published in 11 leading journals and 37 premier conferences and workshops. Results Existing literature about software clones is classified broadly into different categories. The importance of semantic clone detection and model based clone detection led to different classifications. Empirical evaluation of clone detection tools/techniques is presented. Clone management, its benefits and cross cutting nature is reported. Number of studies pertaining to nine different types of clones is reported. Thirteen intermediate representations and 24 match detection techniques are reported. Conclusion We call for an increased awareness of the potential benefits of software clone management, and identify the need to develop semantic and model clone detection techniques. Recommendations are given for future research.

...read moreread less

446 citations

Proceedings Article•10.1145/2884781.2884877•

SourcererCC: Scaling Code Clone Detection to Big Code

[...]

Hitesh Sajnani¹, Vaibhav Saini¹, Jeffrey Svajlenko², Chanchal K. Roy², Cristina V. Lopes¹ - Show less +1 more•Institutions (2)

University of California, Irvine¹, University of Saskatchewan²

20 Dec 2015-arXiv: Software Engineering

TL;DR: This paper presents a token-based clone detector, SourcererCC, that can detect both exact and near-miss clones from large inter-project repositories using a standard workstation, and evaluates the scalability, execution time, recall and precision, and compares it to four publicly available and state-of-the-art tools.

...read moreread less

Abstract: Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.

...read moreread less

430 citations

Proceedings Article•10.1109/SP.2017.62•

VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery

[...]

Seulbae Kim¹, Seunghoon Woo¹, Heejo Lee¹, Hakjoo Oh¹•Institutions (1)

Korea University¹

1 May 2017

TL;DR: VUDDY outperformed four state-of-the-art code clone detection techniques in terms of both scalability and accuracy, and proved its effectiveness by detecting zero-day vulnerabilities in widely used software systems, such as Apache HTTPD and Ubuntu OS Distribution.

...read moreread less

Abstract: The ecosystem of open source software (OSS) has been growing considerably in size. In addition, code clones - code fragments that are copied and pasted within or between software systems - are also proliferating. Although code cloning may expedite the process of software development, it often critically affects the security of software because vulnerabilities and bugs can easily be propagated through code clones. These vulnerable code clones are increasing in conjunction with the growth of OSS, potentially contaminating many systems. Although researchers have attempted to detect code clones for decades, most of these attempts fail to scale to the size of the ever-growing OSS code base. The lack of scalability prevents software developers from readily managing code clones and associated vulnerabilities. Moreover, most existing clone detection techniques focus overly on merely detecting clones and this impairs their ability to accurately find "vulnerable" clones. In this paper, we propose VUDDY, an approach for the scalable detection of vulnerable code clones, which is capable of detecting security vulnerabilities in large software programs efficiently and accurately. Its extreme scalability is achieved by leveraging function-level granularity and a length-filtering technique that reduces the number of signature comparisons. This efficient design enables VUDDY to preprocess a billion lines of code in 14 hour and 17 minutes, after which it requires a few seconds to identify code clones. In addition, we designed a security-aware abstraction technique that renders VUDDY resilient to common modifications in cloned code, while preserving the vulnerable conditions even after the abstraction is applied. This extends the scope of VUDDY to identifying variants of known vulnerabilities, with high accuracy. In this study, we describe its principles and evaluate its efficacy and effectiveness by comparing it with existing mechanisms and presenting the vulnerabilities it detected. VUDDY outperformed four state-of-the-art code clone detection techniques in terms of both scalability and accuracy, and proved its effectiveness by detecting zero-day vulnerabilities in widely used software systems, such as Apache HTTPD and Ubuntu OS Distribution.

...read moreread less

371 citations

...

Expand

Year	Papers
2022	1
2021	17
2020	16
2019	12
2018	20
2017	16

Topic Tools

Papers published on a yearly basis

Papers

CCFinder: a multilinguistic token-based code clone detection system for large scale source code

An empirical study of code clone genealogies

Software clone detection: A systematic review

SourcererCC: Scaling Code Clone Detection to Big Code

VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery

Performance Metrics