Book Chapter10.1007/978-3-319-47955-2_13
Evaluating Topic-Based Representations for Author Profiling in Social Media
Miguel A. Álvarez-Carmona,A. Pastor López-Monroy,Manuel Montes-y-Gómez,Luis Villaseñor-Pineda,Ivan Meza +4 more
- 23 Nov 2016
- pp 151-162
24
TL;DR: A representation based on Latent Semantic Analysis (LSA), which automatically discovers the topics from a given document collection, and a simplified version of the Linguistic Inquiry and Word Count (LIWC), which consists of 41 features representing manually predefined thematic categories are considered.
read more
Abstract: The Author Profiling (AP) task aims to determine specific demographic characteristics such as gender and age, by analyzing the language usage in groups of authors. Notwithstanding the recent advances in AP, this is still an unsolved problem, especially in the case of social media domains. According to the literature most of the work has been devoted to the analysis of useful textual features. The most prominent ones are those related with content and style. In spite of the success of using jointly both kinds of features, most of the authors agree in that content features are much more relevant than style, which suggest that some profiling aspects, like age or gender could be determined only by observing the thematic interests, concerns, moods, or others words related to events of daily life. Additionally, most of the research only uses traditional representations such as the BoW, rather than other more sophisticated representations to harness the content features. In this regard, this paper aims at evaluating the usefulness of some topic-based representations for the AP task. We mainly consider a representation based on Latent Semantic Analysis (LSA), which automatically discovers the topics from a given document collection, and a simplified version of the Linguistic Inquiry and Word Count (LIWC), which consists of 41 features representing manually predefined thematic categories. We report promising results in several corpora showing the effectiveness of the evaluated topic-based representations for AP in social media.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Richer Document Embeddings for Author Profiling tasks based on a heuristic search
Roberto López-Santillán,Manuel Montes-y-Gómez,Luis Carlos González-Gurrola,Graciela Ramirez-Alonso,Olanda Prieto-Ordaz +4 more
TL;DR: A new numerical statistic feature called Relevance Topic Value (rtv) is introduced, which could be used to enhance the forecasting of characteristics of authors, by numerically describing the topic of a document and the personal use of words by users.
21
Fake News Spreader Detection on Twitter using Character N-Grams.
Inna Vogel,Meghana Meghana +1 more
- 01 Jan 2020
TL;DR: The aim of the task is to determine whether it is possible to discriminate authors that have shared fake news in the past from those that have never done it, and to show that it is difficult to differentiate solidly fake news spreaders on Twitter from users who share credible information.
Early author profiling on Twitter using profile features with multi-resolution
TL;DR: This work proposes a novel strategy that combines a state of the art representation for early text classification and specialized word-vectors for author profiling tasks, and builds prototypical features called Profile based Meta-Words, which allow to model AP information at different levels of granularity.
16
Author Profiling in Social Media with Multimodal Information
Miguel Ángel Álvarez Carmona,Esaú Villatoro Tello,Manuel Montes y Gómez,Luis Villaseñor Pineda +3 more
- 29 Sep 2020
TL;DR: The results show that the textual descriptions of the images contain useful information for the author profiling task, and that the fusion of textual information with information extracted from the images increases the accuracy of this task.
13
References
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
•Journal Article
LIBLINEAR: A Library for Large Linear Classification
TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
TL;DR: A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
The psychological meaning of words: LIWC and computerized text analysis methods
TL;DR: The Linguistic Inquiry and Word Count (LIWC) system as discussed by the authors is a text analysis system that counts words in psychologically meaningful categories to detect meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles and individual differences.
An introduction to latent semantic analysis
TL;DR: The adequacy of LSA's reflection of human knowledge has been established in a variety of ways, for example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word‐word and passage‐word lexical priming data.