Open AccessProceedings Article
On Estimation and Selection for Topic Models
Matt Taddy
- 21 Mar 2012
- pp 1184-1193
TL;DR: In this paper, the authors describe posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization, and show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics.
read more
Abstract: This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
stm: An R Package for Structural Topic Models
TL;DR: This paper demonstrates how to use the R package stm for structural topic modeling, which allows researchers to flexibly estimate a topic model that includes document-level metadata.
LDAvis: A method for visualizing and interpreting topics
Carson Sievert,Kenneth E. Shirley +1 more
- 01 Jan 2014
TL;DR: LDAvis, a web-based interactive visualization of topics estimated using Latent Dirichlet Allocation that is built using a combination of R and D3, and a novel method for choosing which terms to present to a user to aid in the task of topic interpretation is proposed.
A model of text for experimentation in the social sciences
TL;DR: A hierarchical mixed membership model for analyzing topical content of documents, in which mixing weights are parameterized by observed covariates is posit, enabling researchers to introduce elements of the experimental design that informed document collection into the model, within a generally applicable framework.
652
cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data
Carmen Bravo González-Blas,Liesbeth Minnoye,Dafni Papasokrati,Sara Aibar,Gert Hulselmans,Valerie Christiaens,Kristofer Davie,Jasper Wouters,Stein Aerts +8 more
TL;DR: As an unsupervised Bayesian framework, cisTopic classifies regions in scATAC-seq data into regulatory topics, which are used for clustering and provides insight into the mechanisms underlying regulatory heterogeneity in cell populations.
CEO Behavior and Firm Performance
TL;DR: A new method to measure CEO behavior in large samples via a survey that collects high-frequency, high-dimensional diary data and a machine learning algorithm that estimates behavioral types reveals two types: “leaders,” who do multifunction,High-level meetings, and “managers,’ who do individual meetings with core functions.
References
•Proceedings Article
On smoothing and inference for topic models
Arthur U. Asuncion,Max Welling,Padhraic Smyth,Yee Whye Teh +3 more
- 18 Jun 2009
TL;DR: In this article, the authors compare the performance of topic models with collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and find that the main differences are attributable to the amount of smoothing applied to the counts.
Practical Bayesian Density Estimation Using Mixtures of Normals
Kathryn Roeder,Larry Wasserman +1 more
TL;DR: In this paper, the posterior for the number of components in a mixture of normals is not well defined, and posterior simulation does not provide a direct estimate of the posterior of the components in the mixture.
590
•Posted Content
On Smoothing and Inference for Topic Models
TL;DR: Using the insights gained from this comparative study, it is shown how accurate topic models can be learned in several seconds on text corpora with thousands of documents.
507
Efficient methods for topic model inference on streaming document collections
Limin Yao,David Mimno,Andrew McCallum +2 more
- 28 Jun 2009
TL;DR: Empirical results indicate that SparseLDA can be approximately 20 times faster than traditional LDA and provide twice the speedup of previously published fast sampling methods, while also using substantially less memory.
Integrated likelihood methods for eliminating nuisance parameters
TL;DR: In this paper, the authors review common integrated likelihoods and discuss their strengths and weaknesses relative to other methods, especially those arising from default or non-informative priors.