Efficient Methods for Incorporating Knowledge into Topic Models

doi:10.18653/V1/D15-1037

Open AccessProceedings Article10.18653/V1/D15-1037

Efficient Methods for Incorporating Knowledge into Topic Models

Yi Yang, +2 more

- 01 Sep 2015

- pp 308-317

72

TL;DR: This work proposes a factor graph framework, Sparse Constrained LDA (SC-LDA), for efficiently incorporating prior knowledge into LDA, and evaluates its ability to incorporate word correlation knowledge and document label knowledge on three benchmark datasets.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Federico Bianchi, +2 more

- 08 Apr 2020

- arXiv: Computation and Language

TL;DR: This work combines contextualized representations with neural topic models to find that this approach produces more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models.

...read moreread less

200

•Proceedings Article•10.18653/V1/2021.ACL-SHORT.96

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Federico Bianchi, +2 more

- 01 Aug 2021

TL;DR: The authors combine contextualized representations with neural topic models to produce more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models, and their results indicate that future improvements in language models will translate into better topic models.

...read moreread less

174

Journal Article•10.1145/3462478

Topic Modeling Using Latent Dirichlet allocation: A Survey

Uttam Chauhan, +1 more

- 17 Sep 2021

- ACM Computing Surveys

TL;DR: The background and advancement of topic modeling techniques can be found in this paper, where the authors introduce the preliminaries of the topic modelling techniques and review its extensions and variations, such as hierarchical topic modeling over various domains, hierarchical topic modelling, word embedded topic models, and topic models in multilingual perspectives.

...read moreread less

156

•Proceedings Article

A Word Embeddings Informed Focused Topic Model

He Zhao, +2 more

- 11 Nov 2017

TL;DR: A focused topic model where how a topic focuses on words is informed by word embeddings is proposed, which is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality.

...read moreread less

48

•Posted Content

Neural Topic Model via Optimal Transport.

He Zhao, +4 more

- 12 Aug 2020

- arXiv: Information Retrieval

TL;DR: A new neural topic model via the theory of optimal transport (OT) is presented to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions through the cost matrix of the OT distance.

...read moreread less

46

...

Expand

References

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

•Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

- 03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

25.5K

•Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

24.1K

•Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013

- arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

22.9K

Journal Article•10.1109/TPAMI.1984.4767596

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Stuart Geman, +1 more

- 01 Nov 1984

- IEEE Transactions on Pattern Analysis an...

TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.

...read moreread less

19.5K

...

Expand

Efficient Methods for Incorporating Knowledge into Topic Models

Chat with Paper

AI Agents for this Paper

Citations

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Topic Modeling Using Latent Dirichlet allocation: A Survey

A Word Embeddings Informed Focused Topic Model

Neural Topic Model via Optimal Transport.

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Distributed Representations of Words and Phrases and their Compositionality

Distributed Representations of Words and Phrases and their Compositionality

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Related Papers (5)

Latent dirichlet allocation

Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality

Reading Tea Leaves: How Humans Interpret Topic Models

Finding scientific topics

Exploring the Space of Topic Coherence Measures