Automatically profiling the author of an anonymous text
TL;DR: How much can the authors discern about the author of a text simply by analyzing the text itself?
read more
Abstract: ImagIne that you have been gIven an Important text of unknown authorship, and wish to know as much as possible about the unknown author (demographics, personality, cultural background, among others), just by analyzing the given text. This authorship profiling problem is of growing importance in the current global information environment– applications abound in forensics, security, and commercial settings. For example, authorship profiling can help police identify characteristics of the perpetrator of a crime when there are too few (or too many) specific suspects to consider. Similarly, large corporations may be interested in knowing what types of people like or dislike their products, based on analysis of blogs and online product reviews. The question we therefore ask is: How much can we discern about the author of a text simply by analyzing the text itself? It turns out that, with varying degrees of accuracy, we can say a great deal indeed. Unlike the problem of authorship attribution (determining the author of a text from a given candidate set) discussed recently in these pages by Li, Zheng, and Chen authorship profiling does not begin with a set of writing samples from known candidate authors. Instead, we exploit the sociolinguistic observation that different groups of people speaking or writing in a particular genre and in a particular language use that language differently. That is, they vary in how often they use certain words or syntactic constructions (in addition to variation in pronunciation or intonation, for example). The particular profile dimensions we consider here are author gender, age,8 native language7 and personality.10
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Author Profiling: Gender Prediction from Tweets and Images: Notebook for PAN at CLEF 2018.
Yaakov HaCohen-Kerner,Yair Yigal,Elyashiv Shayovitz,Daniel Miller,Toby P. Breckon +4 more
- 01 Jan 2018
TL;DR: The participation of the teams in the PAN 2018 shared task on author profiling, identifying authors’ gender, and the pre-processing, feature sets, machine learning methods and accuracy results are described.
An Experimental Study on Authorship Identification for Cyber Forensics
Smita Nirkhi,Rajiv V. Dharaskar,Vilas M. Thakare +2 more
- 01 Jan 2015
TL;DR: This paper compares the Performance of various classifiers in terms of accuracy for authorship identification task of online messages and investigates the appropriate classifier for solving authorship of anonymous online messages in the context of cyber forensics.
Social Gender Construction in Political Context: A Corpus-Based Study of Lexical Differences across Genders
Wang Ruonan,He Jun +1 more
TL;DR: This paper examined the gender differences in terms of lexical choice manifested by the selected 20 U.S. presidential candidates from the year 2012 to 2020 and presented the changes of each gender group in a male-dominated political context.
Distance-Based Approaches
Jacques Savoy
- 01 Jan 2020
TL;DR: In this paper, the authors focus on the problem of authorship attribution, with leading models proposed and discussed in the humanities community, and a step-by-step numerical example supports some of the needed computation.
6
Patent
Systems and methods for keyword spotting using alternating search algorithms
Yitshak Yishay
- 23 Jan 2015
TL;DR: In this article, a system and methods for spotting keywords in data packets are provided, in particular, input data is received to be searched for occurrences of a set of patterns, the input data being divided into multiple segments.
5
References
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
•Book
An Introduction to Functional Grammar
Michael Halliday
- 01 Jan 1985
TL;DR: Part 1 The clause: constituency towards a functional grammar clause as message clause as exchange clause as representation and above, below and beyond the clause: below the clause - groups and phrases above the clauses - the clause complex additional.
14K
Machine learning in automated text categorization
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Psychological aspects of natural language. use: our words, our selves.
TL;DR: Findings that point to the psychological value of studying particles-parts of speech that include pronouns, articles, prepositions, conjunctives, and auxiliary verbs are summarized.
2.5K
The handbook of language variation and change
Jack Chambers,Peter Trudgill,Natalie Schilling-Estes +2 more
- 01 Jan 2003
1.1K