Efficient Methods for Incorporating Knowledge into Topic Models
Yi Yang,Doug Downey,Jordan Boyd-Graber +2 more
- 01 Sep 2015
- pp 308-317
TL;DR: This work proposes a factor graph framework, Sparse Constrained LDA (SC-LDA), for efficiently incorporating prior knowledge into LDA, and evaluates its ability to incorporate word correlation knowledge and document label knowledge on three benchmark datasets.
read more
Abstract: Latent Dirichlet allocation (LDA) is a popular topic modeling technique for exploring hidden topics in text corpora. Increasingly, topic modeling needs to scale to larger topic spaces and use richer forms of prior knowledge, such as word correlations or document labels. However, inference is cumbersome for LDA models with prior knowledge. As a result, LDA models that use prior knowledge only work in small-scale scenarios. In this work, we propose a factor graph framework, Sparse Constrained LDA (SC-LDA), for efficiently incorporating prior knowledge into LDA. We evaluate SC-LDA’s ability to incorporate word correlation knowledge and document label knowledge on three benchmark datasets. Compared to several baseline methods, SC-LDA achieves comparable performance but is significantly faster.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
TL;DR: This work combines contextualized representations with neural topic models to find that this approach produces more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models.
200
Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
Federico Bianchi,Silvia Terragni,Dirk Hovy +2 more
- 01 Aug 2021
TL;DR: The authors combine contextualized representations with neural topic models to produce more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models, and their results indicate that future improvements in language models will translate into better topic models.
Topic Modeling Using Latent Dirichlet allocation: A Survey
Uttam Chauhan,Apurva Shah +1 more
TL;DR: The background and advancement of topic modeling techniques can be found in this paper, where the authors introduce the preliminaries of the topic modelling techniques and review its extensions and variations, such as hierarchical topic modeling over various domains, hierarchical topic modelling, word embedded topic models, and topic models in multilingual perspectives.
156
•Proceedings Article
A Word Embeddings Informed Focused Topic Model
He Zhao,Lan Du,Wray Buntine +2 more
- 11 Nov 2017
TL;DR: A focused topic model where how a topic focuses on words is informed by word embeddings is proposed, which is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality.
•Posted Content
Neural Topic Model via Optimal Transport.
TL;DR: A new neural topic model via the theory of optimal transport (OT) is presented to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions through the cost matrix of the OT distance.
References
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
•Proceedings Article
Latent Dirichlet Allocation
David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 more
- 03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
•Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov,Ilya Sutskever,Kai Chen,Greg S. Corrado,Jeffrey Dean +4 more
- 05 Dec 2013
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
•Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images
Stuart Geman,Donald Geman +1 more
TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.