Source Code Authorship Identification Using Deep Neural Networks

doi:10.3390/SYM12122044

Open AccessJournal Article10.3390/SYM12122044

Source Code Authorship Identification Using Deep Neural Networks

Anna V. Kurtukova, +2 more

- 10 Dec 2020

- Symmetry

- Vol. 12, Iss: 12, pp 2044

24

TL;DR: The authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards, showing that the author's technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship.

Abstract: Many open-source projects are developed by the community and have a common basis. The more source code is open, the more the project is open to contributors. The possibility of accidental or deliberate use of someone else’s source code as a closed functionality in another project (even a commercial) is not excluded. This situation could create copyright disputes. Adding a plagiarism check to the project lifecycle during software engineering solves this problem. However, not all code samples for comparing can be found in the public domain. In this case, the methods of identifying the source code author can be useful. Therefore, identifying the source code author is an important problem in software engineering, and it is also a research area in symmetry. This article discusses the problem of identifying the source code author and modern methods of solving this problem. Based on the experience of researchers in the field of natural language processing (NLP), the authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards. The results show that the author’s technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship. The average accuracy obtained for all programming languages was 95% in the simple case and exceeded 80% in the complicated ones.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.3390/software1010002

Automated Code Assessment for Education: Review, Classification and Perspectives on Techniques and Tools

Sébastien Combéfis

- 08 Feb 2022

- Software

TL;DR: A systematic review of recent automated code assessment systems and the possible analyses they can perform with the associated techniques, the kinds of produced feedback and the ways they are integrated in the learning process is proposed.

...read moreread less

36

•Journal Article•10.3390/INFO11120589

Determining the Age of the Author of the Text Based on Deep Neural Network Models

Aleksandr Romanov, +4 more

- 21 Dec 2020

- Information-an International Interdiscip...

TL;DR: In this paper, the authors presented an analysis of methods for determining the age of the author of a text and approaches to determining age of a user by a photo. And they used deep neural networks to solve the problem of age regression.

...read moreread less

14

Journal Article•10.3390/info15030131

Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey

Xie He, +3 more

- 28 Feb 2024

- Information

TL;DR: This comprehensive survey of authorship attribution methods explores state-of-the-art techniques, emerging methods, and challenges in software forensics, plagiarism detection, and security, providing insights for new researchers and future research directions in the field.

...read moreread less

10

•Journal Article•10.21293/1818-0442-2021-25-1-79-85

Application of machine learning methods and feature selection based on a genetic algorithm in solving the problem of determining the authorship of a Russian-language text for cybersecurity

Anna V. Kurtukova, +3 more

- 01 Jan 2022

- Doklady Tomskogo gosudarstvennogo univer...

TL;DR: In this article , the author identification was carried out using classical machine learning algorithms and neural network architectures (including fastText, CNN and LSTM and their hybrids, BERT) and the efficiency of the model was evaluated based on the social media texts dataset.

...read moreread less

7

•Journal Article•10.3390/fi14030088

Neural Network-Based Price Tag Data Analysis

Pavel Laptev, +5 more

- 13 Mar 2022

- Future Internet

TL;DR: Research revealed that the optimal network for segmentation is YOLOv4-tiny, featuring a cross validation accuracy of 96.92% and EasyOCR accuracy was calculated and is 95.22%.

...read moreread less

6

...

Expand

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Proceedings Article•10.1109/CVPR.2015.7298594

Going deeper with convolutions

Christian Szegedy, +8 more

- 07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

56.6K

•Proceedings Article•10.3115/V1/D14-1181

Convolutional Neural Networks for Sentence Classification

Yoon Kim

- 25 Aug 2014

TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.

...read moreread less

16.1K

•Posted Content

Convolutional Neural Networks for Sentence Classification

Yoon Kim

- 25 Aug 2014

- arXiv: Computation and Language

TL;DR: In this article, CNNs are trained on top of pre-trained word vectors for sentence-level classification tasks and a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks.

...read moreread less

7.8K

•Posted Content

Attention Is All You Need

Ashish Vaswani, +7 more

- 12 Jun 2017

- arXiv: Computation and Language

TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

7.2K