A Provably Efficient Algorithm for Separable Topic Discovery

doi:10.1109/JSTSP.2016.2555240

Journal Article10.1109/JSTSP.2016.2555240

A Provably Efficient Algorithm for Separable Topic Discovery

Weicong Ding, +2 more

- 20 Apr 2016

- IEEE Journal of Selected Topics in Signa...

- Vol. 10, Iss: 4, pp 712-725

4

TL;DR: In this article, the authors develop necessary and sufficient conditions and a novel provably consistent and efficient algorithm for discovering topics from observations (documents) that are realized from a probabilistic mixture of shared latent factors that have certain properties.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article

A rate of convergence for mixture proportion estimation, with application to learning from noisy labels

Clayton Scott

- 01 Jan 2015

- Journal of Machine Learning Research

TL;DR: This work establishes a rate of convergence for mixture proportion estimation under an appropriate distributional assumption, and argues that this rate of converge is useful for analyzing weakly supervised learning algorithms that build on MPE.

...read moreread less

79

•Journal Article

Decontamination of mutually contaminated models

Gilles Blanchard, +1 more

- 01 Jan 2014

- Journal of Machine Learning Research

TL;DR: A procedure for decontamination of the contaminated models from data is developed, which then facilitates the design of a consistent discrimination rule and relies on a novel method for estimating the error when projecting one distribution onto a convex combination of others, where the projection is with respect to a statistical distance known as the separation distance.

...read moreread less

14

•Journal Article•10.1080/00401706.2016.1247017

A Geometric Approach to Archetypal Analysis and Nonnegative Matrix Factorization.

Anil Damle, +1 more

- 27 Apr 2017

- Technometrics

TL;DR: A geometric approach to both NMF and archetypal analysis is described by interpreting both problems as finding extreme points of the data cloud by developing and analyzing an efficient approach to findingextreme points in high dimensions.

...read moreread less

14

•Dissertation

Learning mixed membership models with a separable latent structure: Theory, provably efficient algorithms, and applications

Weicong Ding

- 01 Jan 2015

TL;DR: In a wide spectrum of problems in science and engineering that includes hyperspectral imaging, gene expression analysis, and machine learning tasks such as topic modeling, the observed data is high-dimensional and can be modeled as arising from a dataspecific probabilistic mixture of a small collection of latent factors.

...read moreread less

3

References

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

UCI Machine Learning Repository

A. Asuncion

- 01 Jan 2007

24.3K

Journal Article•10.1038/44565

Learning the parts of objects by non-negative matrix factorization

Daniel D. Lee, +2 more

- 21 Oct 1999

- Nature

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.

...read moreread less

14.2K

•Journal Article•10.1073/PNAS.0307752101

Finding scientific topics

Thomas L. Griffiths, +1 more

- 06 Apr 2004

- Proceedings of the National Academy of S...

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

...read moreread less

6.9K

Journal Article•10.1145/2133806.2133826

Probabilistic topic models

David M. Blei

- 01 Apr 2012

- Communications of The ACM

TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.

...read moreread less

5.6K

...

Expand

A Provably Efficient Algorithm for Separable Topic Discovery

Chat with Paper

AI Agents for this Paper

Citations

A rate of convergence for mixture proportion estimation, with application to learning from noisy labels

Decontamination of mutually contaminated models

A Geometric Approach to Archetypal Analysis and Nonnegative Matrix Factorization.

Learning mixed membership models with a separable latent structure: Theory, provably efficient algorithms, and applications

References

Latent dirichlet allocation

UCI Machine Learning Repository

Learning the parts of objects by non-negative matrix factorization

Finding scientific topics

Probabilistic topic models

Related Papers (5)

A Very Fast Algorithm for Matrix Factorization

Clustering Semi-Random Mixtures of Gaussians

Random Feature Approximation for Online Nonlinear Graph Topology Identification

Large-Scale Clustering Algorithms

A PAC-Bayesian Approach to Unsupervised Learning with Application to Co-clustering Analysis