Automatically profiling the author of an anonymous text
TL;DR: How much can the authors discern about the author of a text simply by analyzing the text itself?
read more
Abstract: ImagIne that you have been gIven an Important text of unknown authorship, and wish to know as much as possible about the unknown author (demographics, personality, cultural background, among others), just by analyzing the given text. This authorship profiling problem is of growing importance in the current global information environment– applications abound in forensics, security, and commercial settings. For example, authorship profiling can help police identify characteristics of the perpetrator of a crime when there are too few (or too many) specific suspects to consider. Similarly, large corporations may be interested in knowing what types of people like or dislike their products, based on analysis of blogs and online product reviews. The question we therefore ask is: How much can we discern about the author of a text simply by analyzing the text itself? It turns out that, with varying degrees of accuracy, we can say a great deal indeed. Unlike the problem of authorship attribution (determining the author of a text from a given candidate set) discussed recently in these pages by Li, Zheng, and Chen authorship profiling does not begin with a set of writing samples from known candidate authors. Instead, we exploit the sociolinguistic observation that different groups of people speaking or writing in a particular genre and in a particular language use that language differently. That is, they vary in how often they use certain words or syntactic constructions (in addition to variation in pronunciation or intonation, for example). The particular profile dimensions we consider here are author gender, age,8 native language7 and personality.10
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Pitfalls in Machine Learning Research: Reexamining the Development Cycle
TL;DR: This work follows the machine learning process from algorithm design to data collection to model evaluation, drawing attention to common pitfalls and providing practical recommendations for improvements.
Does double-blind peer review reduce bias? Evidence from a top computer science conference
TL;DR: In this paper, the authors examined the effect of double-blind peer review on prestige bias by analyzing the peer review files of 5027 papers submitted to the International Conference on Learning Representations (ICLR).
26
Patent
System and method for efficient classification and processing of network traffic
Eithan Goldfarb,Yuval Altman,Naomi Frid,Gur Yaari +3 more
- 25 Jan 2012
TL;DR: In this article, a front-end processor associates input packets with flows and forwards each flow to the appropriate unit, typically by querying a flow table that holds a respective classification for each active flow.
26
Age and Gender Classification of Tweets Using Convolutional Neural Networks
Roy Khristopher Bayot,Teresa Gonçalves +1 more
- 14 Sep 2017
TL;DR: This work explores the use of convolutional neural networks together with word2vec word embeddings for determining age and gender from a series of texts in comparison to handcrafted features.
25
Evaluating Topic-Based Representations for Author Profiling in Social Media
Miguel A. Álvarez-Carmona,A. Pastor López-Monroy,Manuel Montes-y-Gómez,Luis Villaseñor-Pineda,Ivan Meza +4 more
- 23 Nov 2016
TL;DR: A representation based on Latent Semantic Analysis (LSA), which automatically discovers the topics from a given document collection, and a simplified version of the Linguistic Inquiry and Word Count (LIWC), which consists of 41 features representing manually predefined thematic categories are considered.
25
References
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
•Book
An Introduction to Functional Grammar
Michael Halliday
- 01 Jan 1985
TL;DR: Part 1 The clause: constituency towards a functional grammar clause as message clause as exchange clause as representation and above, below and beyond the clause: below the clause - groups and phrases above the clauses - the clause complex additional.
14K
Machine learning in automated text categorization
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Psychological aspects of natural language. use: our words, our selves.
TL;DR: Findings that point to the psychological value of studying particles-parts of speech that include pronouns, articles, prepositions, conjunctives, and auxiliary verbs are summarized.
2.5K
The handbook of language variation and change
Jack Chambers,Peter Trudgill,Natalie Schilling-Estes +2 more
- 01 Jan 2003
1.1K