Automatically profiling the author of an anonymous text

doi:10.1145/1461928.1461959

Open AccessJournal Article10.1145/1461928.1461959

Automatically profiling the author of an anonymous text

Shlomo Argamon, +3 more

- 01 Feb 2009

- Communications of The ACM

- Vol. 52, Iss: 2, pp 119-123

479

TL;DR: How much can the authors discern about the author of a text simply by analyzing the text itself?

Abstract: ImagIne that you have been gIven an Important text of unknown authorship, and wish to know as much as possible about the unknown author (demographics, personality, cultural background, among others), just by analyzing the given text. This authorship profiling problem is of growing importance in the current global information environment– applications abound in forensics, security, and commercial settings. For example, authorship profiling can help police identify characteristics of the perpetrator of a crime when there are too few (or too many) specific suspects to consider. Similarly, large corporations may be interested in knowing what types of people like or dislike their products, based on analysis of blogs and online product reviews. The question we therefore ask is: How much can we discern about the author of a text simply by analyzing the text itself? It turns out that, with varying degrees of accuracy, we can say a great deal indeed. Unlike the problem of authorship attribution (determining the author of a text from a given candidate set) discussed recently in these pages by Li, Zheng, and Chen authorship profiling does not begin with a set of writing samples from known candidate authors. Instead, we exploit the sociolinguistic observation that different groups of people speaking or writing in a particular genre and in a particular language use that language differently. That is, they vary in how often they use certain words or syntactic constructions (in addition to variation in pronunciation or intonation, for example). The particular profile dimensions we consider here are author gender, age,8 native language7 and personality.10

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Pitfalls in Machine Learning Research: Reexamining the Development Cycle

Stella Biderman, +1 more

- 04 Nov 2020

- arXiv: Learning

TL;DR: This work follows the machine learning process from algorithm design to data collection to model evaluation, drawing attention to common pitfalls and providing practical recommendations for improvements.

...read moreread less

26

•Journal Article•10.1002/ASI.24582

Does double-blind peer review reduce bias? Evidence from a top computer science conference

Mengyi Sun, +3 more

- 12 Oct 2021

- Journal of the Association for Informati...

TL;DR: In this paper, the authors examined the effect of double-blind peer review on prestige bias by analyzing the peer review files of 5027 papers submitted to the International Conference on Learning Representations (ICLR).

...read moreread less

26

Patent

System and method for efficient classification and processing of network traffic

Eithan Goldfarb, +3 more

- 25 Jan 2012

TL;DR: In this article, a front-end processor associates input packets with flows and forwards each flow to the appropriate unit, typically by querying a flow table that holds a respective classification for each active flow.

...read moreread less

26

Book Chapter•10.1007/978-3-319-72926-8_28

Age and Gender Classification of Tweets Using Convolutional Neural Networks

Roy Khristopher Bayot, +1 more

- 14 Sep 2017

TL;DR: This work explores the use of convolutional neural networks together with word2vec word embeddings for determining age and gender from a series of texts in comparison to handcrafted features.

...read moreread less

25

Book Chapter•10.1007/978-3-319-47955-2_13

Evaluating Topic-Based Representations for Author Profiling in Social Media

Miguel A. Álvarez-Carmona, +4 more

- 23 Nov 2016

TL;DR: A representation based on Latent Semantic Analysis (LSA), which automatically discovers the topics from a given document collection, and a simplified version of the Linguistic Inquiry and Word Count (LIWC), which consists of 41 features representing manually predefined thematic categories are considered.

...read moreread less

25

...

Expand

References

Journal Article•10.1145/242224.242229

Machine learning

Thomas G. Dietterich

- 01 Dec 1996

- ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

14K

•Book

An Introduction to Functional Grammar

Michael Halliday

- 01 Jan 1985

TL;DR: Part 1 The clause: constituency towards a functional grammar clause as message clause as exchange clause as representation and above, below and beyond the clause: below the clause - groups and phrases above the clauses - the clause complex additional.

...read moreread less

14K

•Journal Article•10.1145/505282.505283

Machine learning in automated text categorization

Fabrizio Sebastiani

- 01 Mar 2002

- ACM Computing Surveys

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

8.5K

Journal Article•10.1146/ANNUREV.PSYCH.54.101601.145041

Psychological aspects of natural language. use: our words, our selves.

James W. Pennebaker, +2 more

- 28 Nov 2003

- Annual Review of Psychology

TL;DR: Findings that point to the psychological value of studying particles-parts of speech that include pronouns, articles, prepositions, conjunctives, and auxiliary verbs are summarized.

...read moreread less

2.5K

Reference Book•10.1111/B.9781405116923.2003.X