Source Code Author Attribution Using Author’s Programming Style and Code Smells
TL;DR: A machine learning based methodology is described not only to address the question of can code smells are useful for characterizing authors’ signatures but also for designing a system that can improves the authorship attribution.
read more
Abstract: Source code is an intellectual property and using it without author’s permission is a violation of property right. Source code authorship attribution is vital for dealing with software theft, copyright issues and piracies. Characterizing author’s signature for identifying their footprints is the core task of authorship attribution. Different aspects of source code have been considered for characterizing signatures including author’s coding style and programming structure, etc. The objective of this research is to explore another trait of authors’ coding behavior for personifying their footprints. The main question that we want to address is that “can code smells are useful for characterizing authors’ signatures? A machine learning based methodology is described not only to address the question but also for designing a system. Two different aspects of source code are considered for its representation into features: author’s style and code smells. The author’s style related feature representation is used as baseline. Results have shown that code smell can improves the authorship attribution.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
De-Anonymization of the Author of the Source Code Using Machine Learning Algorithms
Anna V. Kurtukova,Aleksandr Romanov,Anasstasia Fedotova +2 more
- 01 Oct 2019
TL;DR: Methods of determining the author of the anonymous source code can be used as a test for plagiarism of student programming works, which makes the study valuable for educational organizations and may be useful in resolving intellectual property and copyright disputes in a commercial environment.
5
Dataset Characteristics for Reliable Code Authorship Attribution
TL;DR: In this paper , the authors explore the nature of the data used in previous studies and assess the factors that influence the attribution task and define a process for deriving a reduced set of features for accurate and predictable attribution and make recommendations on the dataset characteristics.
2
Dataset Characteristics for Reliable Code Authorship Attribution
TL;DR: In this article , the authors explore the nature of the data used in previous studies and assess the factors that influence the attribution task and define a process for deriving a reduced set of features for accurate and predictable attribution and make recommendations on the dataset characteristics.
1
Authorship Identification of Binary and Disassembled Codes Using NLP Methods
Aleksandr Romanov,Anna V. Kurtukova,Anastasia Mikhailovna Fedotova,Alexander Alexandrovich Shelupanov +3 more
TL;DR: In this article , an ensemble of fastText, support vector machine (SVM), and the authors' hybrid neural network developed in previous works was evaluated using a dataset of source codes written in C and C++ languages collected from GitHub and Google Code Jam, achieving an average accuracy of 0.96 in simple cases and over 0.85 in complex cases.
Exploring the Role of Emojis in Tweets for Authorship Attribution
Kennedy Ellison
- 01 Jan 2019
TL;DR: This article used emoji rich features to perform authorship attribution of tweets and showed that targeting emojis in the feature set prompts a percent increase of at least 30% in the accuracy.
References
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
The WEKA data mining software: an update
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
•Book
Refactoring: Improving the Design of Existing Code
Martin Fowler
- 01 Jan 1999
TL;DR: Almost every expert in Object-Oriented Development stresses the importance of iterative development, but how do you add function to the existing code base while still preserving its design integrity?
Refactoring improving the design of existing code
Mauricio A. Saca
- 01 Nov 2017
TL;DR: The present document details the how, why and when to apply refactoring in computer systems that have been poorly designed, this in order to a better performance and maintenance of the constituent components.
4.1K
Related Papers (4)
Swati Srivastava,Akshit Rai,Mahima Varshney +2 more
- 01 Jan 2021
Khin Khin Zaw,Nobuo Funabiki +1 more
- 12 Jul 2015