Finding Similarities in Source Code Through Factorization
Michel Chilowicz,Étienne Duris,Gilles Roussel +2 more
- 01 Oct 2009
- Vol. 238, Iss: 5, pp 47-62
TL;DR: A new algorithm designed to detect similarities in source codes, based on code factorization and uses adapted pattern matching algorithms and structures such as suffix arrays, which focuses on obfuscation with inlining and outlining of functions.
read more
Abstract: The high availability of a huge number of documents on the Web makes plagiarism very attractive and easy. This plagiarism concerns any kind of document, natural language texts as well as more structured information such as programs. In order to cope with this problem, many tools and algorithms have been proposed to find similarities. In this paper we present a new algorithm designed to detect similarities in source codes. Contrary to existing methods, this algorithm relies on the notion of function and focuses on obfuscation with inlining and outlining of functions. This method is also efficient against insertions, deletions and permutations of instruction blocks. It is based on code factorization and uses adapted pattern matching algorithms and structures such as suffix arrays.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Software clone detection: A systematic review
TL;DR: An extensive systematic literature review of software clones in general and software clone detection in particular calls for an increased awareness of the potential benefits of software clone management, and identifies the need to develop semantic and model clone detection techniques.
446
Detecting source code plagiarism on introductory programming course assignments using a bytecode approach
Oscar Karnalim
- 12 Oct 2016
TL;DR: Based on evaluation, it can be concluded that the source code plagiarism detection approach is more effective to detect most plagiarism attack types than raw source code approach on introductory programming course.
50
Similarity Detection Techniques for Academic Source Code Plagiarism and Collusion: A Review
Oscar Karnalim,Simon,William J. Chivers +2 more
- 01 Dec 2019
TL;DR: The mechanisms by which each of these code similarity detection techniques works are summarized, compiled from publications listed by Google Scholar and one or more of the ACM digital library, IEEE Xploredigital library, ScienceDirect, Scopus, and the references of already listed publications.
32
Syntax tree fingerprinting: a foundation for source code similarity detection
Michel Chilowicz,Étienne Duris,Gilles Roussel +2 more
- 01 Jan 2009
TL;DR: The aim is to allow further processing of neighboring exact matches in order to identify the larger approximate matches, dealing with the common modi cation patterns seen in the intra-project copy-pastes and in the plagiarism cases.
26
Enhancing program dependency graph based clone detection using approximate subgraph matching
C. M. Kamalpriya,Paramvir Singh +1 more
- 21 Feb 2017
TL;DR: This work proposes further enhancement to current state of the art PDG-based detection to identify all possible (exact and approximate) clone relations from the obtained clone pair results using Approximate Subgraph Matching (ASM).
24
References
•Book
Introduction to Algorithms
Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest +2 more
- 01 Jan 1990
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
24.8K
A universal algorithm for sequential data compression
Jacob Ziv,A. Lempel +1 more
TL;DR: The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.
•Book
Introduction to Algorithms, third edition
Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest,Clifford Stein +3 more
- 31 Jul 2009
TL;DR: Pseudo-code explanation of the algorithms coupled with proof of their accuracy makes this book a great resource on the basic tools used to analyze the performance of algorithms.
3.2K
•Book
Introduction to Algorithms, Second Edition
Ronald L. Rivest,Charles E. Leiserson,Thomas H. Cormen,Clifford Stein +3 more
- 01 Jan 2001
TL;DR: The complexity class P is formally defined as the set of concrete decision problems that are polynomial-time solvable, and encodings are used to map abstract problems to concrete problems.
2.9K
Suffix arrays: a new method for on-line string searches
Udi Manber,Gene Myers +1 more
TL;DR: A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper, and it is believed that suffixarrays will prove to be better in practice than suffixtrees for many applications.
2.4K