Automatically profiling the author of an anonymous text
TL;DR: How much can the authors discern about the author of a text simply by analyzing the text itself?
read more
Abstract: ImagIne that you have been gIven an Important text of unknown authorship, and wish to know as much as possible about the unknown author (demographics, personality, cultural background, among others), just by analyzing the given text. This authorship profiling problem is of growing importance in the current global information environment– applications abound in forensics, security, and commercial settings. For example, authorship profiling can help police identify characteristics of the perpetrator of a crime when there are too few (or too many) specific suspects to consider. Similarly, large corporations may be interested in knowing what types of people like or dislike their products, based on analysis of blogs and online product reviews. The question we therefore ask is: How much can we discern about the author of a text simply by analyzing the text itself? It turns out that, with varying degrees of accuracy, we can say a great deal indeed. Unlike the problem of authorship attribution (determining the author of a text from a given candidate set) discussed recently in these pages by Li, Zheng, and Chen authorship profiling does not begin with a set of writing samples from known candidate authors. Instead, we exploit the sociolinguistic observation that different groups of people speaking or writing in a particular genre and in a particular language use that language differently. That is, they vary in how often they use certain words or syntactic constructions (in addition to variation in pronunciation or intonation, for example). The particular profile dimensions we consider here are author gender, age,8 native language7 and personality.10
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Gender identification on Twitter
Catherine Ikae,Jacques Savoy +1 more
TL;DR: This study analyzes the effectiveness of 10 different classifiers to know whether or not the same model always proposes the best effectiveness when considering similar corpora under the same conditions.
21
•Posted Content
Authorship Verification - An Approach based on Random Forest
TL;DR: This work has used several word-based and style-based features to identify the differences between the known and unknown problems of one given set and label the unknown ones accordingly using a Random Forest based classifier.
Automatic Author Profiling Based on Linguistic and Stylistic Features Notebook for PAN at CLEF 2013
Braja Gopal Patra,Somnath Banerjee,Dipankar Das,Tanik Saikh,Sivaji Bandyopadhyay +4 more
- 01 Jan 2013
TL;DR: This work has employed the Decision tree classifier for classifying the author profile and achieved the accuracies of 56.83% and 28.95% for gender and age group classification, respectively.
Software-Based Approach towards Automated Authorship Acknowledgement—Chi-Square Test on One Consonant Group
TL;DR: The conducted experiments on the Java programming language have proved that the chi-square test is a powerful nonparametric statistical test that can be used for author identification on the level of English consonants with a test validity of 95%.
Deep Learning Network Models to Categorize Texts According to Author's Gender and to Identify Text Sentiment
Aleksandr Sboev,Tatiana Litvinova,Irina Voronina,Dmitry Gudovskikh,Roman Rybka +4 more
- 01 Dec 2016
TL;DR: A preexisting corpus of Russian-language texts RusPersonality labeled with information on their authors (gender, age, psychological testing and so on) has been used for gender task along with the materials of the SentiRuEval competition for evaluating the sentiment of tweets.
19
References
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
•Book
An Introduction to Functional Grammar
Michael Halliday
- 01 Jan 1985
TL;DR: Part 1 The clause: constituency towards a functional grammar clause as message clause as exchange clause as representation and above, below and beyond the clause: below the clause - groups and phrases above the clauses - the clause complex additional.
14K
Machine learning in automated text categorization
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Psychological aspects of natural language. use: our words, our selves.
TL;DR: Findings that point to the psychological value of studying particles-parts of speech that include pronouns, articles, prepositions, conjunctives, and auxiliary verbs are summarized.
2.5K
The handbook of language variation and change
Jack Chambers,Peter Trudgill,Natalie Schilling-Estes +2 more
- 01 Jan 2003
1.1K