Parameter estimation for text analysis

Open Access

Parameter estimation for text analysis

- 01 Jan 2009

826

TL;DR: Presents parameter estimation methods common with discrete proba- bility distributions, which is of particular interest in text modeling, and central concepts like conjugate distributions and Bayesian networks are reviewed.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

Proceedings Article•10.1145/2488388.2488514

A biterm topic model for short texts

Xiaohui Yan, +3 more

- 13 May 2013

TL;DR: The approach can discover more prominent and coherent topics, and significantly outperform baseline methods on several evaluation metrics, and is found that BTM can outperform LDA even on normal texts, showing the potential generality and wider usage of the new topic model.

...read moreread less

1.2K

Proceedings Article•10.1145/1367497.1367510

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Xuan-Hieu Phan, +2 more

- 21 Apr 2008

TL;DR: A general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from large-scale data collections that is general enough to be applied to different data domains and genres ranging from Web search results to medical text.

...read moreread less

939

Proceedings Article•10.1145/2623330.2623715

A dirichlet multinomial mixture model-based approach for short text clustering

Jianhua Yin, +1 more

- 24 Aug 2014

TL;DR: This paper proposed a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model for short text clustering and found that GSDMM can infer the number of clusters automatically with a good balance between the completeness and homogeneity of the clustering results, and is fast to converge.

...read moreread less

601

•Posted Content

A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques

Mehdi Allahyari, +6 more

- 10 Jul 2017

- arXiv: Computation and Language

TL;DR: Several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering are described, which briefly explain text mining in biomedical and health care domains.

...read moreread less

594

...

Expand

References

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

•Journal Article•10.1093/GENETICS/155.2.945

Inference of population structure using multilocus genotype data

Jonathan K. Pritchard, +2 more

- 01 Jun 2000

- Genetics

TL;DR: Pritch et al. as discussed by the authors proposed a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations, which can be applied to most of the commonly used genetic markers, provided that they are not closely linked.

...read moreread less

31.5K

•Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

- 03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

25.5K

•Journal Article

Inference of population structure using multilocus genotype data.

Adam J. Pritchard, +4 more

- 30 May 2000

- Genomics

20.4K

•Journal Article•10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990

- Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

13.5K

...

Expand

Parameter estimation for text analysis

Chat with Paper

AI Agents for this Paper

Citations

Latent dirichlet allocation

A biterm topic model for short texts

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

A dirichlet multinomial mixture model-based approach for short text clustering

A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques

References

Latent dirichlet allocation

Inference of population structure using multilocus genotype data

Latent Dirichlet Allocation

Inference of population structure using multilocus genotype data.

Indexing by Latent Semantic Analysis

Related Papers (5)

Latent dirichlet allocation

Finding scientific topics

Probabilistic latent semantic indexing

Indexing by Latent Semantic Analysis

Probabilistic topic models