Automatically profiling the author of an anonymous text
TL;DR: How much can the authors discern about the author of a text simply by analyzing the text itself?
read more
Abstract: ImagIne that you have been gIven an Important text of unknown authorship, and wish to know as much as possible about the unknown author (demographics, personality, cultural background, among others), just by analyzing the given text. This authorship profiling problem is of growing importance in the current global information environment– applications abound in forensics, security, and commercial settings. For example, authorship profiling can help police identify characteristics of the perpetrator of a crime when there are too few (or too many) specific suspects to consider. Similarly, large corporations may be interested in knowing what types of people like or dislike their products, based on analysis of blogs and online product reviews. The question we therefore ask is: How much can we discern about the author of a text simply by analyzing the text itself? It turns out that, with varying degrees of accuracy, we can say a great deal indeed. Unlike the problem of authorship attribution (determining the author of a text from a given candidate set) discussed recently in these pages by Li, Zheng, and Chen authorship profiling does not begin with a set of writing samples from known candidate authors. Instead, we exploit the sociolinguistic observation that different groups of people speaking or writing in a particular genre and in a particular language use that language differently. That is, they vary in how often they use certain words or syntactic constructions (in addition to variation in pronunciation or intonation, for example). The particular profile dimensions we consider here are author gender, age,8 native language7 and personality.10
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Novel Text Analysis for Investigating Personality: Identifying the Dark Lady in Shakespeare’s Sonnets
TL;DR: This work uses an exploratory combinatorial data analysis technique called seriation in combination with RPAS, a multi-faceted text analysis approach that draws on a writer’s personality, or self, to visualize the 154 sonnets and finds that RPAS has the potential to discriminate subtle shifts in personality from texts as small as 90 words.
12
•Posted Content
How well can machine learning predict demographics of social media users
TL;DR: Knowing the demographics in a data sample can aid in addressing issues of bias and population representation, so that existing societal inequalities are not exacerbated.
12
Gender Prediction from Social Media Comments with Artificial Intelligence
Özer Çelik,Ahmet Faruk Aslan +1 more
TL;DR: An estimation of genders of the commenters thanks to machine learning techniques by analyzing the comments of companies posting on Facebook, which showed that machine learning methods predicted with similar accuracy rates, while the highest accuracy rate was obtained by logistic regression method.
ITALICA at PAN 2013: An Ensemble Learning Approach to Author Profiling Notebook for PAN at CLEF 2013.
Fermín L. Cruz,R Rafa Haro,F. Javier Ortega +2 more
- 01 Jan 2013
TL;DR: This notebook discusses the approach to the Author Profiling task developed by the Italica group for PAN 2013, which implements two different sets of classifiers which are combined later in order to build a final classifier that takes into account the decisions of the previous ones.
References
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
•Book
An Introduction to Functional Grammar
Michael Halliday
- 01 Jan 1985
TL;DR: Part 1 The clause: constituency towards a functional grammar clause as message clause as exchange clause as representation and above, below and beyond the clause: below the clause - groups and phrases above the clauses - the clause complex additional.
14K
Machine learning in automated text categorization
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Psychological aspects of natural language. use: our words, our selves.
TL;DR: Findings that point to the psychological value of studying particles-parts of speech that include pronouns, articles, prepositions, conjunctives, and auxiliary verbs are summarized.
2.5K
The handbook of language variation and change
Jack Chambers,Peter Trudgill,Natalie Schilling-Estes +2 more
- 01 Jan 2003
1.1K