Source Code Author Attribution Using Author’s Programming Style and Code Smells
TL;DR: A machine learning based methodology is described not only to address the question of can code smells are useful for characterizing authors’ signatures but also for designing a system that can improves the authorship attribution.
read more
Abstract: Source code is an intellectual property and using it without author’s permission is a violation of property right. Source code authorship attribution is vital for dealing with software theft, copyright issues and piracies. Characterizing author’s signature for identifying their footprints is the core task of authorship attribution. Different aspects of source code have been considered for characterizing signatures including author’s coding style and programming structure, etc. The objective of this research is to explore another trait of authors’ coding behavior for personifying their footprints. The main question that we want to address is that “can code smells are useful for characterizing authors’ signatures? A machine learning based methodology is described not only to address the question but also for designing a system. Two different aspects of source code are considered for its representation into features: author’s style and code smells. The author’s style related feature representation is used as baseline. Results have shown that code smell can improves the authorship attribution.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Code Authorship Attribution: Methods and Challenges
TL;DR: This article presents the first comprehensive review of research on code authorship attribution, and summarizes various methods of authorship attributions, and highlights challenges in the field.
83
Source Code Authorship Identification Using Deep Neural Networks
TL;DR: The authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards, showing that the author's technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship.
21
Identification Author of Source Code by Machine Learning Methods
Anna Kurtukova,Alexander Romanov +1 more
- 04 Jun 2019
TL;DR: The authors suggest two new identification techniques based on machine learning algorithms: support vector machine, fast correlation filter and informative features; and the technique based on hybrid convolutional recurrent neural network, which is at the present time the best-known result.
Discovering software developer's coding expertise through deep learning
TL;DR: Criteria for novice and expert developers is formulated and criteria to discover the level of coding expertise of software developers using three different models of deep learning are carried out.
8
Feasibility of deception in code attribution
Alina Matyukhina
- 01 Jan 2019
TL;DR: This thesis investigates the feasibility of deception of source code attribution techniques by exploring how data characteristics and feature selection influence both the accuracy and performance of attribution methods.
References
•Proceedings Article
The Optimality of Naive Bayes.
Harry Zhang
- 01 Jan 2004
TL;DR: A sufficient condition for the optimality of naive Bayes is presented and proved, in which the dependence between attributes do exist, and evidence that dependence among attributes may cancel out each other is provided.
A k-nearest neighbor based algorithm for multi-label classification
Min-Ling Zhang,Zhi-Hua Zhou +1 more
- 25 Jul 2005
TL;DR: Experiments on a real-world multi- label bioinformatic data show that ML-kNN is highly comparable to existing multi-label learning algorithms.
505
Automatic detection of bad smells in code: An experimental assessment
TL;DR: The current panorama of the tools for automatic code smell detection is reviewed by analyzing the output of four representative code smell detectors applied to six different versions of GanttProject, an open source system written in Java.
Refereed paper: Authorship analysis: identifying the author of a program
Ivan Krsul,Eugene H. Spafford +1 more
TL;DR: The goal is to show that it is possible to identify the author of a program by examining programming style characteristics, and to find a set of characteristics that remain constant for a significant portion of the programs that this programmer might produce.
196
Investigating the Evolution of Bad Smells in Object-Oriented Code
Alexander Chatzigeorgiou,Anastasios Manakos +1 more
- 29 Sep 2010
TL;DR: This paper explores the presence and evolution of non-trivial bad smells by analyzing past versions of code and attempts to study the subject from the point of view of the problems themselves distinguishing deliberate maintenance activities from the removal of design problems as a side effect of software evolution.
155
Related Papers (4)
Swati Srivastava,Akshit Rai,Mahima Varshney +2 more
- 01 Jan 2021
Khin Khin Zaw,Nobuo Funabiki +1 more
- 12 Jul 2015