Book Chapter10.1007/978-1-4614-3223-4_5
Dimensionality Reduction and Topic Modeling: From Latent Semantic Indexing to Latent Dirichlet Allocation and Beyond
Steven P. Crain,Ke Zhou,Shuang-Hong Yang,Hongyuan Zha +3 more
- 01 Jan 2012
- pp 129-161
129
TL;DR: This chapter surveys two influential forms of dimension reduction, including probabilistic latent semantic indexing and latent Dirichlet allocation, and describes the basic technologies in detail and exposes the underlying mechanism.
read more
Abstract: The bag-of-words representation commonly used in text analysis can be analyzed very efficiently and retains a great deal of useful information, but it is also troublesome because the same thought can be expressed using many different terms or one term can have very different meanings. Dimension reduction can collapse together terms that have the same semantics, to identify and disambiguate terms with multiple meanings and to provide a lower-dimensional representation of documents that reflects concepts instead of raw terms. In this chapter, we survey two influential forms of dimension reduction. Latent semantic indexing uses spectral decomposition to identify a lower-dimensional representation that maintains semantic properties of the documents. Topic modeling, including probabilistic latent semantic indexing and latent Dirichlet allocation, is a form of dimension reduction that uses a probabilistic model to find the co-occurrence patterns of terms that correspond to semantic topics in a collection of documents. We describe the basic technologies in detail and expose the underlying mechanism. We also discuss recent advances that have made it possible to apply these techniques to very large and evolving text collections and to incorporate network structure or other contextual information.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Deep transitions: Emergence, acceleration, stabilization and directionality
Johan Schot,Laur Kanger +1 more
TL;DR: In this article, the authors propose a new theoretical framework that aims to explain the emergence, acceleration, stabilization and directionality of Deep Transitions through the synthesis of two literatures that have attempted to explain large-scale and long-term socio-technical change: the Multi-level Perspective (MLP) on socio-tech transitions, and Techno-economic Paradigm (TEP) framework.
359
Deep Transitions: Emergence, Acceleration, Stabilization and Directionality
Johan Schot,Laur Kanger +1 more
TL;DR: In this paper, the authors propose a theoretical framework to explain the emergence, acceleration, stabilization and directionality of deep transition in socio-technical systems, which does so through the synthesis of three strands of literature: individual socio-technologies, interconnected systems and industrialization-related macro-trends.
Twitter and Research: A Systematic Literature Review Through Text Mining
TL;DR: This study systematically mines a large number of Twitter-based studies to characterize the relevant literature by an efficient and effective approach and finds that while 23.7% of topics did not show a significant trend, it is found that these hot and cold topics represent three categories: application, methodology, and technology.
Topic Modeling: A Comprehensive Review
Pooja Kherwa,Poonam Bansal +1 more
- 13 Jul 2018
TL;DR: A comprehensive survey on topic modeling has been presented in this paper, which includes classification hierarchy, topic modelling methods, Posterior Inference techniques, different evolution models of latent Dirichlet allocation (LDA) and its applications in different areas of technology including Scientific Literature, Bioinformatics, Software Engineering and analysing social network.
A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)
TL;DR: This article compares quantitative and qualitative text analysis methods used across social sciences, and provides a list of best practice recommendations for text analysis focused on hypothesis and question formation, design and data collection, data pre-processing, and (4) topic modeling.
153
References
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
•Proceedings Article
Latent Dirichlet Allocation
David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 more
- 03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.