Automatically classifying software changes via discriminative topic model

doi:10.1016/J.JSS.2015.12.019

Journal Article10.1016/J.JSS.2015.12.019

Automatically classifying software changes via discriminative topic model

Meng Yan, +5 more

- 01 Mar 2016

- Journal of Systems and Software

- Vol. 113, pp 296-308

59

TL;DR: A discriminative Probability Latent Semantic Analysis model with a novel initialization method which initializes the word distributions for different topics using labeled samples so that DPLSA is well applicable to cross-project software change message analysis.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1145/3238147.3238190

Neural-machine-translation-based commit message generation: how far are we?

Zhongxin Liu, +5 more

- 03 Sep 2018

TL;DR: A simpler and faster approach is proposed, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm, which is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU by 21%.

...read moreread less

284

•Proceedings Article•10.1109/ASE.2019.00026

Automatic generation of pull request descriptions

Zhongxin Liu, +4 more

- 10 Nov 2019

TL;DR: Zhang et al. as mentioned in this paper proposed an approach to automatically generate PR descriptions based on the commit messages and the added source code comments in the PRs using a sequence-to-sequence model.

...read moreread less

100

•Journal Article•10.1109/TSE.2018.2831232

Automating Change-Level Self-Admitted Technical Debt Determination

Meng Yan, +5 more

- 01 Dec 2019

- IEEE Transactions on Software Engineerin...

TL;DR: The experimental results show that the proposed change-level SATD Determination model achieves a promising and better performance than four baselines in terms of AUC and cost-effectiveness and “Diffusion” is the most discriminative dimension among the three dimensions of features for determining TD-introducing changes.

...read moreread less

95

•Proceedings Article•10.1145/3510003.3510205

What Makes a Good Commit Message?

Ying-Jun Tian, +4 more

- 07 Feb 2022

TL;DR: A taxonomy based on recurring patterns in commit messages' expressions is developed, investigating whether “good” commit messages can be automatically identified and whether such automation could prompt developers to write better commit messages.

...read moreread less

73

•Journal Article•10.1016/J.ESWA.2020.114176

How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation

Eman Abdullah AlOmar, +5 more

- 01 Apr 2021

- Expert Systems With Applications

TL;DR: The results of the empirical investigation show that fixing code smells is not the main driver for developers to refactoring their code bases, and this classification challenges the original definition ofRefactoring, being exclusive to improving software design and fixing code smelling.

...read moreread less

66

...

Expand

References

Journal Article•10.1111/J.2517-6161.1977.TB01600.X

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977

- Journal of the royal statistical society...

55.2K

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

•Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

- 03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

25.5K

Journal Article•10.1145/219717.219748

WordNet: a lexical database for English

George A. Miller

- 01 Nov 1995

- Communications of The ACM

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.

...read moreread less

16.9K

•Journal Article•10.1109/TPAMI.2008.79

Robust Face Recognition via Sparse Representation

John Wright, +4 more

- 01 Feb 2009

- IEEE Transactions on Pattern Analysis an...

TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.

...read moreread less

10.5K