Source Code Author Attribution Using Author’s Programming Style and Code Smells

doi:10.5815/IJISA.2017.05.04

Open AccessJournal Article10.5815/IJISA.2017.05.04

Source Code Author Attribution Using Author’s Programming Style and Code Smells

Muqaddas Gull, +2 more

- 08 May 2017

- International Journal of Intelligent Sys...

- Vol. 9, Iss: 5, pp 27-33

12

TL;DR: A machine learning based methodology is described not only to address the question of can code smells are useful for characterizing authors’ signatures but also for designing a system that can improves the authorship attribution.

Abstract: Source code is an intellectual property and using it without author’s permission is a violation of property right. Source code authorship attribution is vital for dealing with software theft, copyright issues and piracies. Characterizing author’s signature for identifying their footprints is the core task of authorship attribution. Different aspects of source code have been considered for characterizing signatures including author’s coding style and programming structure, etc. The objective of this research is to explore another trait of authors’ coding behavior for personifying their footprints. The main question that we want to address is that “can code smells are useful for characterizing authors’ signatures? A machine learning based methodology is described not only to address the question but also for designing a system. Two different aspects of source code are considered for its representation into features: author’s style and code smells. The author’s style related feature representation is used as baseline. Results have shown that code smell can improves the authorship attribution.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.7717/peerj-cs.2142

Code stylometry vs formatting and minification

Stefano Balla, +2 more

- 06 Sep 2024

- PeerJ

TL;DR: This study investigates how code formatting and minification impact the accuracy of code stylometry, a technique for identifying code authors based on their programming styles, and finds that formatting reduces accuracy by 15% and minification by 3%.

...read moreread less

Journal Article•10.21293/1818-0442-2023-26-4-53-60

Development of a methodology for identifying the authorship of binary and disassembled program codes based on an ensemble of modern natural language processing methods

Anna V. Kurtukova, +2 more

- Doklady Tomskogo gosudarstvennogo univer...

TL;DR: A methodology for identifying the authorship of binary and disassembled program codes based on an ensemble of modern natural language processing methods is proposed. The average accuracy of identifying the author of disassembled code using the proposed method was more than 0.9.

...read moreread less

References

Journal Article•10.1145/1961189.1961199

LIBSVM: A library for support vector machines

Chih-Chung Chang, +1 more

- 06 May 2011

- ACM Transactions on Intelligent Systems ...

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

46.3K

Journal Article•10.1145/1656274.1656278

The WEKA data mining software: an update

Mark Hall, +5 more

- 16 Nov 2009

- Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

21.2K

Libsvm : A library for support vector machines

V. Ferrari

- 01 Jan 2008

8K

•Book

Refactoring: Improving the Design of Existing Code

Martin Fowler

- 01 Jan 1999

TL;DR: Almost every expert in Object-Oriented Development stresses the importance of iterative development, but how do you add function to the existing code base while still preserving its design integrity?

...read moreread less

5.7K

Proceedings Article•10.1109/CONCAPAN.2017.8278488

Refactoring improving the design of existing code

Mauricio A. Saca

- 01 Nov 2017

TL;DR: The present document details the how, why and when to apply refactoring in computer systems that have been poorly designed, this in order to a better performance and maintenance of the constituent components.

...read moreread less

4.1K