Source Code Authorship Identification Using Deep Neural Networks
24
TL;DR: The authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards, showing that the author's technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship.
read more
Abstract: Many open-source projects are developed by the community and have a common basis. The more source code is open, the more the project is open to contributors. The possibility of accidental or deliberate use of someone else’s source code as a closed functionality in another project (even a commercial) is not excluded. This situation could create copyright disputes. Adding a plagiarism check to the project lifecycle during software engineering solves this problem. However, not all code samples for comparing can be found in the public domain. In this case, the methods of identifying the source code author can be useful. Therefore, identifying the source code author is an important problem in software engineering, and it is also a research area in symmetry. This article discusses the problem of identifying the source code author and modern methods of solving this problem. Based on the experience of researchers in the field of natural language processing (NLP), the authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards. The results show that the author’s technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship. The average accuracy obtained for all programming languages was 95% in the simple case and exceeded 80% in the complicated ones.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Automated Code Assessment for Education: Review, Classification and Perspectives on Techniques and Tools
TL;DR: A systematic review of recent automated code assessment systems and the possible analyses they can perform with the associated techniques, the kinds of produced feedback and the ways they are integrated in the learning process is proposed.
Determining the Age of the Author of the Text Based on Deep Neural Network Models
Aleksandr Romanov,Anna V. Kurtukova,Artem Alexandrovich Sobolev,Alexander Alexandrovich Shelupanov,Anastasia Mikhailovna Fedotova +4 more
TL;DR: In this paper, the authors presented an analysis of methods for determining the age of the author of a text and approaches to determining age of a user by a photo. And they used deep neural networks to solve the problem of age regression.
14
Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey
Xie He,Arash Habibi Lashkari,Nikhill Vombatkere,Dilli Prasad Sharma +3 more
TL;DR: This comprehensive survey of authorship attribution methods explores state-of-the-art techniques, emerging methods, and challenges in software forensics, plagiarism detection, and security, providing insights for new researchers and future research directions in the field.
10
Application of machine learning methods and feature selection based on a genetic algorithm in solving the problem of determining the authorship of a Russian-language text for cybersecurity
Anna V. Kurtukova,Alexander Romanov,Anastasia Mikhailovna Fedotova,Alexander Alexandrovich Shelupanov +3 more
TL;DR: In this article , the author identification was carried out using classical machine learning algorithms and neural network architectures (including fastText, CNN and LSTM and their hybrids, BERT) and the efficiency of the model was evaluated based on the social media texts dataset.
Neural Network-Based Price Tag Data Analysis
Pavel Laptev,Sergey Litovkin,Sergey Davydenko,Anton Konev,Evgeny Kostyuchenko,Alexander Alexandrovich Shelupanov +5 more
TL;DR: Research revealed that the optimal network for segmentation is YOLOv4-tiny, featuring a cross validation accuracy of 96.92% and EasyOCR accuracy was calculated and is 95.22%.
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
- 07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Convolutional Neural Networks for Sentence Classification
Yoon Kim
- 25 Aug 2014
TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.
•Posted Content
Convolutional Neural Networks for Sentence Classification
TL;DR: In this article, CNNs are trained on top of pre-trained word vectors for sentence-level classification tasks and a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks.
7.8K
•Posted Content
Attention Is All You Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
7.2K