Journal Article10.1145/3292577
Code Authorship Attribution: Methods and Challenges
83
TL;DR: This article presents the first comprehensive review of research on code authorship attribution, and summarizes various methods of authorship attributions, and highlights challenges in the field.
read more
Abstract: Code authorship attribution is the process of identifying the author of a given code. With increasing numbers of malware and advanced mutation techniques, the authors of malware are creating a large number of malware variants. To better deal with this problem, methods for examining the authorship of malicious code are necessary. Code authorship attribution techniques can thus be utilized to identify and categorize the authors of malware. This information can help predict the types of tools and techniques that the author of a specific malware uses, as well as the manner in which the malware spreads and evolves. In this article, we present the first comprehensive review of research on code authorship attribution. The article summarizes various methods of authorship attribution and highlights challenges in the field.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning
Yongsen Ma,Sheheryar Arshad,Swetha Muniraju,Eric Torkildson,Enrico Rantala,Klaus Doppler,Gang Zhou +6 more
- 21 Jan 2021
TL;DR: In this article, the authors proposed a deep learning design for location and person-independent activity recognition with WiFi, which consists of three deep neural networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search.
63
SoK: Exploring the State of the Art and the Future Potential of Artificial Intelligence in Digital Forensic Investigation
Xiaoyu Du,Christopher Hargreaves,John Sheppard,Felix Anda,Asanka Sayakkara,Nhien-An Le-Khac,Mark Scanlon +6 more
TL;DR: This paper summarises existing artificial intelligence based tools and approaches in digital forensics and shows great promise in expediting the digital forensic analysis process while increasing case processing capacities.
55
RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation
Zhen Li,Guenevere Chen,Yayi Zou,Shouhuai Xu +3 more
- 12 Feb 2022
TL;DR: Experimental results show that RoPGen can significantly improve the robustness of DL-based code authorship attribution, by respectively reducing 22.8% and 41.0% of the success rate of targeted and untargeted attacks on average.
funcGNN: A Graph Neural Network Approach to Program Similarity
TL;DR: The graph embedding of a program proposed by the methodology could be applied to several related software engineering problems (such as code plagiarism and clone identification) thus opening multiple research directions.
Authorship attribution of source code: a language-agnostic approach and applicability in software engineering
Egor Bogomolov,Vladimir Kovalenko,Yurii Rebryk,Alberto Bacchelli,Timofey Bryksin +4 more
- 20 Aug 2021
TL;DR: In this paper, the authors introduce a new language-agnostic approach to authorship attribution of source code, and discuss limitations of existing synthetic datasets for authorhip attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering.
26
References
A Complexity Measure
TL;DR: Several properties of the graph-theoretic complexity are proved which show, for example, that complexity is independent of physical size and complexity depends only on the decision structure of a program.
6K
•Book
A complexity measure
Thomas J. McCabe
- 04 Oct 1993
TL;DR: In this paper, a graph-theoretic complexity measure for managing and controlling program complexity is presented. But the complexity is independent of physical size, and complexity depends only on the decision structure of a program.
5.1K
Proceedings of the 11th USENIX Security Symposium
Dan Boneh
- 05 Aug 2002
TL;DR: It is shown that permitting user selection of passwords in two graphical password schemes can yield passwords with entropy far below the theoretical optimum and, in some cases, that are highly correlated with the race or gender of the user.
1.8K
Comparative study on methods of detecting research fronts using different types of citation
TL;DR: Direct citation, which could detect large and young emerging clusters earlier, shows the best performance in detecting a research front, and co-citation shows the worst, which suggests that the content similarity of papers connected by direct citations is the greatest and that direct citation networks have the least risk of missing emerging research domains.
1.7K
Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation
A.J. Albrecht,J.E. Gaffney +1 more
TL;DR: In this paper, the equivalence between Albrecht's external input/output data flow representative of a program (the function points" metric) and Halstead's [2] "software science" or "software linguistics" model of a programming program as well as the "soft content" variation of Halsteads model suggested by Gaffney [7] was demonstrated.
1.6K