Statistical Debugging Using Latent Topic Models

doi:10.1007/978-3-540-74958-5_5

Open AccessBook Chapter10.1007/978-3-540-74958-5_5

Statistical Debugging Using Latent Topic Models

David Andrzejewski, +3 more

- 17 Sep 2007

- pp 6-17

95

TL;DR: Qualitative evaluation by domain experts suggests that the novel Delta-Latent-Dirichlet-Allocation model outperforms existing statistical methods for bug cause identification, and may help support other software tasks not addressed by earlier models.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/MSP.2010.938079

Probabilistic Topic Models

David M. Blei, +2 more

- 18 Oct 2010

- IEEE Signal Processing Magazine

TL;DR: In this paper, a review of probabilistic topic models can be found, which can be used to summarize a large collection of documents with a smaller number of distributions over words.

...read moreread less

4.1K

Proceedings Article•10.1145/2676726.2677009

Predicting Program Properties from "Big Code"

Veselin Raychev, +2 more

- 14 Jan 2015

TL;DR: This work formulating the problem of inferring program properties as structured prediction and showing how to perform both learning and inference in this context opens up new possibilities for attacking a wide range of difficult problems in the context of "Big Code" including invariant generation, decompilation, synthesis and others.

...read moreread less

428

Journal Article•10.1007/S10618-008-0118-X

Sourcerer: mining and searching internet-scale software repositories

Erik Linstead, +5 more

- 01 Apr 2009

- Data Mining and Knowledge Discovery

TL;DR: By combining software textual content with structural information captured by the CodeRank approach, this work is able to significantly improve software retrieval performance, increasing the area under the curve (AUC) retrieval metric to 0.92, roughly 10–30% better than previous approaches based on text alone.

...read moreread less

279

Journal Article•10.1007/S10664-015-9402-8

A survey on the use of topic models when mining software repositories

Tse-Hsun Chen, +2 more

- 01 Oct 2016

- Empirical Software Engineering

TL;DR: This paper surveys 167 articles from the software engineering literature that make use of topic models and provides a starting point for new researchers who are interested in using topic models, and may help new researchers and practitioners determine how to best apply topic models to a particular software engineering task.

...read moreread less

219

•Proceedings Article•10.3115/1621829.1621835

Latent Dirichlet Allocation with Topic-in-Set Knowledge

David Andrzejewski, +1 more

- 04 Jun 2009

TL;DR: This work proposes a mechanism for adding partial supervision, called topic-in-set knowledge, to latent topic modeling, to encourage the recovery of topics which are more relevant to user modeling goals than the topics which would be recovered otherwise.

...read moreread less

170

...

Expand

References

Lecture Notes in Computer Science 2382

Petrus Bollen

- 01 Jan 2002

36.7K

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

•Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

- 03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

25.5K

•Journal Article•10.1073/PNAS.0307752101

Finding scientific topics

Thomas L. Griffiths, +1 more

- 06 Apr 2004

- Proceedings of the National Academy of S...

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

...read moreread less

6.9K

•Journal Article•10.1080/01621459.1971.10482356

Objective Criteria for the Evaluation of Clustering Methods

William M. Rand

- 01 Dec 1971

- Journal of the American Statistical Asso...

TL;DR: This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data.

...read moreread less

6.7K

...

Expand

Statistical Debugging Using Latent Topic Models

Chat with Paper

AI Agents for this Paper

Citations

Probabilistic Topic Models

Predicting Program Properties from "Big Code"

Sourcerer: mining and searching internet-scale software repositories

A survey on the use of topic models when mining software repositories

Latent Dirichlet Allocation with Topic-in-Set Knowledge

References

Lecture Notes in Computer Science 2382

Latent dirichlet allocation

Latent Dirichlet Allocation

Finding scientific topics

Objective Criteria for the Evaluation of Clustering Methods

Related Papers (5)

Latent dirichlet allocation

Finding scientific topics

Empirical evaluation of the tarantula automatic fault-localization technique

The author-topic model for authors and documents

Hierarchical Dirichlet Processes