Proceedings Article10.1109/COMPSAC.2017.104
Detecting Java Code Clones with Multi-granularities Based on Bytecode
Dongjin Yu,Jie Wang,Qing Wu,Jiazha Yang,Jiaojiao Wang,Wei Yang,Wei Yan +6 more
- 01 Jul 2017
- Vol. 1, pp 317-326
34
TL;DR: A novel code clone detection method based on Java bytecode that can simultaneously detect code clones at both method level and block level and is shown to be more effective than the state-of-the-art methods.
read more
Abstract: Sequences of duplicate code, either with or without modification, are known as code clones or just clones. Code clones are generally considered undesirable for a number of reasons, although they can offer some convenience to developers. The detection of code clones helps to improve the quality of source code through software re-engineering. Numerous methods have been proposed for code clone detection in Java code. However, the existing methods are mostly based on the Java source code, while only a few focus on its bytecode, in fact, the Java bytecode reflects more of the semantic nature of the source code than the source code itself does. In this paper, we propose a novel code clone detection method based on Java bytecode. Using the block-level code fragments extracted from bytecode, it can simultaneously detect code clones at both method level and block level. In addition, during the process of code clone detection, the similarities of both method call sequences and instruction sequences are calculated in order to improve accuracy. We conduct two extensive experiments to evaluate the performance of our method. The results show that the proposed method can detect code clones more effectively than the state-of-the-art methods.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Systematic Review on Code Clone Detection
TL;DR: There is a need to develop novel approaches with complete tool support in order to detect all four types of clones collectively and it is also required to introduce more approaches to simplify the development of a program dependency graph (PDG) while dealing with the detection of the type4 clones.
Detecting Java Code Clones Based on Bytecode Sequence Alignment
TL;DR: An approach based on Java bytecode is introduced, which mainly contains the steps of bytecode sequence alignment and similarity score comparison, and separately considers the similarities between instruction sequences and method call sequences, thus improving its effectiveness in detecting code clones.
Detecting Semantic Code Clones by Building AST-based Markov Chains Model
Yueming Wu,Siyue Feng,Deqing Zou,Hai Jin +3 more
- 10 Oct 2022
TL;DR: Amain this paper transforms the original complex tree into simple Markov chains and measures the distance of all states in these chains, then feeds them into a machine learning classifier to train a code clone detector.
20
VULDEFF: Vulnerability detection method based on function fingerprints and code differences
Q. Zhao,Cheng Huang,Liuhu Dai +2 more
TL;DR: Zhang et al. as mentioned in this paper designed a lightweight function fingerprint method based on the Context Triggered Piecewise Hashing algorithm, which can characterize the basic syntax features of function source code.
13
IBFET: Index‐based features extraction technique for scalable code clone detection at file level granularity
TL;DR: IBFET as mentioned in this paper uses the MapReduce rule of divide and conquer to detect code clones at a very large scale level to billions of LOC at file level granularity, and performs preprocessing, indexing, and clone detection for more than 324 billion LOC using a Hadoop distributed environment.
9
References
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
TL;DR: A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.
Clone detection using abstract syntax trees
Ira D. Baxter,A. Yahin,Leonardo de Moura,Marcelo Sant'Anna,L. Bier +4 more
- 16 Mar 1998
TL;DR: The paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using abstract syntax trees and suggests that clone detection could be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations.
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
Lingxiao Jiang,Ghassan Misherghi,Zhendong Su,Stéphane Glondu +3 more
- 24 May 2007
TL;DR: This paper presents an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code and implemented this algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK.
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
TL;DR: A qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools is provided, and a taxonomy of editing scenarios that produce different clone types and a qualitative evaluation of current clone detectors are evaluated.
1.1K
Comparison and Evaluation of Clone Detection Tools
TL;DR: An experiment is presented that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC) and selects techniques that cover the whole spectrum of the state-of-the-art in clone detection.