Open Access
Source Code Authorship Analysis For Supporting the Cybercrime Investigation Process.
Georgia Frantzeskou,Stephen G. MacDonell,Efstathios Stamatatos +2 more
- 01 Jan 2010
pp 470-495
43
TL;DR: The set of tools and techniques used to achieve the goal of authorship identification are presented, a review of the research efforts in the area and a new taxonomy on source code authorship analysis are presented.
read more
Abstract: Cybercrime has increased in severity and frequency in the recent years and because of this, it has become a major concern for companies, universities and organizations. The anonymity offered by the Internet has made the task of tracing criminal identity difficult. One study field that has contributed in tracing criminals is authorship analysis on e-mails, messages and programs. This paper contains a study on source code authorship analysis. The aim of the research efforts in this area is to identify the author of a particular piece of code by examining its programming style characteristics. Borrowing extensively from the existing fields of linguistics and software metrics, this field attempts to investigate various aspects of computer program authorship. Source code authorship analysis could be implemented in cases of cyber attacks, plagiarism and computer fraud. In this paper we present the set of tools and techniques used to achieve the goal of authorship identification, a review of the research efforts in the area and a new taxonomy on source code authorship analysis.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Behavioural biometrics: a survey and classification
TL;DR: A survey and classification of the state-of-the-art in behavioural biometrics which is based on skills, style, preference, knowledge, motor-skills or strategy used by people while accomplishing different everyday tasks.
Authorship Attribution for Social Media Forensics
Anderson Rocha,Walter J. Scheirer,Christopher W. Forstall,Thiago Cavalcante,Antonio Theophilo,Bingyu Shen,Ariadne R. B. Carvalho,Efstathios Stamatatos +7 more
TL;DR: It is argued that there is a significant need in forensics for new authorship attribution algorithms that can exploit context, can process multi-modal data, and are tolerant to incomplete knowledge of the space of all possible authors at training time.
255
Comparing techniques for authorship attribution of source code
TL;DR: All previous techniques to source code authorship attribution are summarized, feature sets that are motivated by the literature are implemented, and information retrieval ranking methods or machine classifiers for each approach are applied.
68
Examining the significance of high-level programming features in source code author classification
TL;DR: A means of identifying the high-level features that contribute to source code authorship identification using as a tool the SCAP method and the results show that, for these programs, comments, layout features and package-related naming influence classification accuracy whereas user-defined naming does not appear to influence accuracy.
67
References
Estimating software project effort using analogies
Martin Shepperd,Chris Schofield +1 more
TL;DR: It is argued that estimation by analogy is a viable technique that, at the very least, can be used by project managers to complement current estimation techniques.
Inference and Disputed Authorship: The Federalist.
Abstract: The 1964 publication of "Inference and Disputed Authorship" made the cover of "Time" magazine and drew the attention of academics and the public alike for its use of statistical methodology to solve one of American history's most notorious questions: the disputed authorship of the "Federalist Papers". Back in print for a new generation of readers, this classic volume applies mathematics, including the once-controversial Bayesian analysis, to the heart of a literary and historical problem by studying frequently used words in the texts. The reissue of this landmark book will be welcomed by anyone interested in the juncture of history, political science, and authorship.
556
How variable may a constant be? Measures of lexical richness in perspective
TL;DR: The results suggest that the empirical trajectories tap into a considerable amount of authorial structure without, however, guaranteeing that spatial separation implies a difference in authorship.
447