Code Authorship Attribution: Methods and Challenges

doi:10.1145/3292577

Journal Article10.1145/3292577

Code Authorship Attribution: Methods and Challenges

Vaibhavi Kalgutkar, +4 more

- 13 Feb 2019

- ACM Computing Surveys

- Vol. 52, Iss: 1, pp 3

83

TL;DR: This article presents the first comprehensive review of research on code authorship attribution, and summarizes various methods of authorship attributions, and highlights challenges in the field.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/3424739

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

Yongsen Ma, +6 more

- 21 Jan 2021

TL;DR: In this article, the authors proposed a deep learning design for location and person-independent activity recognition with WiFi, which consists of three deep neural networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search.

...read moreread less

63

•Proceedings Article•10.1145/3407023.3407068

SoK: Exploring the State of the Art and the Future Potential of Artificial Intelligence in Digital Forensic Investigation

Xiaoyu Du, +6 more

- 02 Dec 2020

- arXiv: Cryptography and Security

TL;DR: This paper summarises existing artificial intelligence based tools and approaches in digital forensics and shows great promise in expediting the digital forensic analysis process while increasing case processing capacities.

...read moreread less

55

•Proceedings Article•10.1145/3510003.3510181

RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Zhen Li, +3 more

- 12 Feb 2022

TL;DR: Experimental results show that RoPGen can significantly improve the robustness of DL-based code authorship attribution, by respectively reducing 22.8% and 41.0% of the success rate of targeted and untargeted attacks on average.

...read moreread less

40

•Proceedings Article•10.1145/3382494.3410675

funcGNN: A Graph Neural Network Approach to Program Similarity

Aravind Nair, +2 more

- 26 Jul 2020

- arXiv: Learning

TL;DR: The graph embedding of a program proposed by the methodology could be applied to several related software engineering problems (such as code plagiarism and clone identification) thus opening multiple research directions.

...read moreread less

27

•Proceedings Article•10.1145/3468264.3468606

Authorship attribution of source code: a language-agnostic approach and applicability in software engineering

Egor Bogomolov, +4 more

- 20 Aug 2021

TL;DR: In this paper, the authors introduce a new language-agnostic approach to authorship attribution of source code, and discuss limitations of existing synthetic datasets for authorhip attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering.

...read moreread less

26

...

Expand

References

Journal Article•10.1109/TSE.1976.233837

A Complexity Measure

Thomas J. McCabe

- 01 Jul 1976

- IEEE Transactions on Software Engineerin...

TL;DR: Several properties of the graph-theoretic complexity are proved which show, for example, that complexity is independent of physical size and complexity depends only on the decision structure of a program.

...read moreread less

6K

•Book

A complexity measure

Thomas J. McCabe

- 04 Oct 1993

TL;DR: In this paper, a graph-theoretic complexity measure for managing and controlling program complexity is presented. But the complexity is independent of physical size, and complexity depends only on the decision structure of a program.

...read moreread less

5.1K

Proceedings of the 11th USENIX Security Symposium

Dan Boneh

- 05 Aug 2002

TL;DR: It is shown that permitting user selection of passwords in two graphical password schemes can yield passwords with entropy far below the theoretical optimum and, in some cases, that are highly correlated with the race or gender of the user.

...read moreread less

1.8K

Journal Issue•10.1002/ASI.V60:3

Comparative study on methods of detecting research fronts using different types of citation

Naoki Shibata, +3 more

- 01 Mar 2009

- Journal of the Association for Informati...

TL;DR: Direct citation, which could detect large and young emerging clusters earlier, shows the best performance in detecting a research front, and co-citation shows the worst, which suggests that the content similarity of papers connected by direct citations is the greatest and that direct citation networks have the least risk of missing emerging research domains.

...read moreread less

1.7K

Journal Article•10.1109/TSE.1983.235271

Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation

A.J. Albrecht, +1 more

- 01 Nov 1983

- IEEE Transactions on Software Engineerin...

TL;DR: In this paper, the equivalence between Albrecht's external input/output data flow representative of a program (the function points" metric) and Halstead's [2] "software science" or "software linguistics" model of a programming program as well as the "soft content" variation of Halsteads model suggested by Gaffney [7] was demonstrated.

...read moreread less

1.6K