Dimensionality Reduction and Topic Modeling: From Latent Semantic Indexing to Latent Dirichlet Allocation and Beyond

doi:10.1007/978-1-4614-3223-4_5

Book Chapter10.1007/978-1-4614-3223-4_5

Dimensionality Reduction and Topic Modeling: From Latent Semantic Indexing to Latent Dirichlet Allocation and Beyond

Steven P. Crain, +3 more

- 01 Jan 2012

- pp 129-161

129

TL;DR: This chapter surveys two influential forms of dimension reduction, including probabilistic latent semantic indexing and latent Dirichlet allocation, and describes the basic technologies in detail and exposes the underlying mechanism.

Abstract: The bag-of-words representation commonly used in text analysis can be analyzed very efficiently and retains a great deal of useful information, but it is also troublesome because the same thought can be expressed using many different terms or one term can have very different meanings. Dimension reduction can collapse together terms that have the same semantics, to identify and disambiguate terms with multiple meanings and to provide a lower-dimensional representation of documents that reflects concepts instead of raw terms. In this chapter, we survey two influential forms of dimension reduction. Latent semantic indexing uses spectral decomposition to identify a lower-dimensional representation that maintains semantic properties of the documents. Topic modeling, including probabilistic latent semantic indexing and latent Dirichlet allocation, is a form of dimension reduction that uses a probabilistic model to find the co-occurrence patterns of terms that correspond to semantic topics in a collection of documents. We describe the basic technologies in detail and expose the underlying mechanism. We also discuss recent advances that have made it possible to apply these techniques to very large and evolving text collections and to incorporate network structure or other contextual information.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1016/J.RESPOL.2018.03.009

Deep transitions: Emergence, acceleration, stabilization and directionality

Johan Schot, +1 more

- 01 Jul 2018

- Research Policy

TL;DR: In this article, the authors propose a new theoretical framework that aims to explain the emergence, acceleration, stabilization and directionality of Deep Transitions through the synthesis of two literatures that have attempted to explain large-scale and long-term socio-technical change: the Multi-level Perspective (MLP) on socio-tech transitions, and Techno-economic Paradigm (TEP) framework.

...read moreread less

359

•Journal Article•10.2139/SSRN.2834854

Deep Transitions: Emergence, Acceleration, Stabilization and Directionality

Johan Schot, +1 more

- 02 Sep 2016

- Social Science Research Network

TL;DR: In this paper, the authors propose a theoretical framework to explain the emergence, acceleration, stabilization and directionality of deep transition in socio-technical systems, which does so through the synthesis of three strands of literature: individual socio-technologies, interconnected systems and industrialization-related macro-trends.

...read moreread less

311

•Journal Article•10.1109/ACCESS.2020.2983656

Twitter and Research: A Systematic Literature Review Through Text Mining

Amir Karami, +3 more

- 26 Mar 2020

- IEEE Access

TL;DR: This study systematically mines a large number of Twitter-based studies to characterize the relevant literature by an efficient and effective approach and finds that while 23.7% of topics did not show a significant trend, it is found that these hot and cold topics represent three categories: application, methodology, and technology.

...read moreread less

224

•Journal Article•10.4108/EAI.13-7-2018.159623

Topic Modeling: A Comprehensive Review

Pooja Kherwa, +1 more

- 13 Jul 2018

TL;DR: A comprehensive survey on topic modeling has been presented in this paper, which includes classification hierarchy, topic modelling methods, Posterior Inference techniques, different evolution models of latent Dirichlet allocation (LDA) and its applications in different areas of technology including Scientific Literature, Bioinformatics, Software Engineering and analysing social network.

...read moreread less

221

Journal Article•10.1007/S10869-017-9528-3

A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)

George C. Banks, +3 more

- 11 Jan 2018

- Journal of Business and Psychology

TL;DR: This article compares quantitative and qualitative text analysis methods used across social sciences, and provides a list of best practice recommendations for text analysis focused on hypothesis and question formation, design and data collection, data pre-processing, and (4) topic modeling.

...read moreread less

153

...

Expand

References

Journal Article•10.1111/J.2517-6161.1977.TB01600.X

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977

- Journal of the royal statistical society...

55.2K

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

•Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

- 03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

25.5K

•Journal Article•10.1214/AOMS/1177729694

On Information and Sufficiency

Solomon Kullback, +1 more

- 01 Mar 1951

- Annals of Mathematical Statistics

19.8K

•Journal Article•10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990

- Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

13.5K

...

Expand

Dimensionality Reduction and Topic Modeling: From Latent Semantic Indexing to Latent Dirichlet Allocation and Beyond

Chat with Paper

AI Agents for this Paper

Citations

Deep transitions: Emergence, acceleration, stabilization and directionality

Deep Transitions: Emergence, Acceleration, Stabilization and Directionality

Twitter and Research: A Systematic Literature Review Through Text Mining

Topic Modeling: A Comprehensive Review

A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)

References

Maximum likelihood from incomplete data via the EM algorithm

Latent dirichlet allocation

Latent Dirichlet Allocation

On Information and Sufficiency

Indexing by Latent Semantic Analysis

Related Papers (5)

Latent dirichlet allocation

Probabilistic topic models

Indexing by Latent Semantic Analysis

Finding scientific topics

Introduction to Information Retrieval