Open AccessProceedings Article
Efficient Tree-Based Topic Modeling
Yuening Hu,Jordan Boyd-Graber +1 more
- 08 Jul 2012
- pp 275-279
19
TL;DR: The SPARSELDA inference scheme for latent Dirichlet allocation (LDA) is extended to tree-based topic models and this sampling scheme computes the exact conditional distribution for Gibbs sampling much more quickly than enumerating all possible latent variable assignments.
read more
Abstract: Topic modeling with a tree-based prior has been used for a variety of applications because it can encode correlations between words that traditional topic modeling cannot. However, its expressive power comes at the cost of more complicated inference. We extend the SPARSELDA (Yao et al., 2009) inference scheme for latent Dirichlet allocation (LDA) to tree-based topic models. This sampling scheme computes the exact conditional distribution for Gibbs sampling much more quickly than enumerating all possible latent variable assignments. We further improve performance by iteratively refining the sampling distribution only when needed. Experiments show that the proposed techniques dramatically improve the computation time.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Interactive Topic Modeling
Yuening Hu,Jordan Boyd-Graber,Brianna Satinoff +2 more
- 19 Jun 2011
TL;DR: This paper presents a mechanism for giving users a voice by encoding users’ feedback to topic models as correlations between words into a topic model, and develops more efficient inference algorithms for tree-based topic models.
Efficient Methods for Incorporating Knowledge into Topic Models
Yi Yang,Doug Downey,Jordan Boyd-Graber +2 more
- 01 Sep 2015
TL;DR: This work proposes a factor graph framework, Sparse Constrained LDA (SC-LDA), for efficiently incorporating prior knowledge into LDA, and evaluates its ability to incorporate word correlation knowledge and document label knowledge on three benchmark datasets.
Polylingual Tree-Based Topic Models for Translation Domain Adaptation
Yuening Hu,Ke Zhai,Vladimir Eidelman,Jordan Boyd-Graber +3 more
- 01 Jun 2014
TL;DR: New polylingual tree-based topic models to extract domain knowledge that considers both source and target languages and derive three different inference schemes are proposed.
Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation
TL;DR: A conceptual dynamic latent Dirichlet allocation (CDLDA) model for topic detection and tracking in conversational content is proposed, which considers temporal features by introducing dynamic concepts and outperformed the traditional DLDA and LDA and support vector machine models.
55
•Posted Content
Graph-Sparse LDA: A Topic Model with Structured Sparsity
TL;DR: Graph-Sparse LDA is introduced, a hierarchical topic model that uses knowledge of relationships between words (e.g., as encoded by an ontology) to improve topic interpretability and recovers sparse, interpretable summaries on two real-world biomedical datasets while matching state-of-the-art prediction performance.
References
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
•Proceedings Article
Latent Dirichlet Allocation
David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 more
- 03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Parameter estimation for text analysis
Gregor Heinrich
- 01 Jan 2009
TL;DR: Presents parameter estimation methods common with discrete proba- bility distributions, which is of particular interest in text modeling, and central concepts like conjugate distributions and Bayesian networks are reviewed.
An architecture for parallel topic models
Alexander J. Smola,Shravan Narayanamurthy +1 more
- 01 Sep 2010
TL;DR: This paper describes a high performance sampling architecture for inference of latent topic models on a cluster of workstations and shows that this architecture is entirely general and that it can be extended easily to more sophisticated latent variable models such as n-grams and hierarchies.
563
Efficient methods for topic model inference on streaming document collections
Limin Yao,David Mimno,Andrew McCallum +2 more
- 28 Jun 2009
TL;DR: Empirical results indicate that SparseLDA can be approximately 20 times faster than traditional LDA and provide twice the speedup of previously published fast sampling methods, while also using substantially less memory.