Mining version histories to guide software changes
Thomas Zimmermann,P. Weibgerber,Stephan Diehl,Andreas Zeller +3 more
- 23 May 2004
- pp 563-572
TL;DR: The ROSE prototype can correctly predict further locations to be changed and show up item coupling that is undetectable by program analysis, and can prevent errors due to incomplete changes.
read more
Abstract: We apply data mining to version histories in order to guide programmers along related changes: "Programmers who changed these functions also changed. . . ". Given a set of existing changes, such rules (a) suggest and predict likely further changes, (b) show up item coupling that is indetectable by program analysis, and (c) prevent errors due to incomplete changes. After an initial change, our ROSE prototype can correctly predict 26% of further files to be changed - and 15% of the precise functions or variables. The topmost three suggestions contain a correct location with a likelihood of 64%.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
On the naturalness of software
TL;DR: The conjecture that most software is also natural - in the sense that it is created by humans at work, with all the attendant constraints and limitations - and thus, like natural language, it is also likely to be repetitive and predictable is investigated.
939
Mining metrics to predict component failures
Nachiappan Nagappan,Thomas Ball,Andreas Zeller +2 more
- 28 May 2006
TL;DR: Using principal component analysis on the code metrics, this work built regression models that accurately predict the likelihood of post-release defects for new entities and can be generalized to arbitrary projects.
Classifying Software Changes: Clean or Buggy?
TL;DR: A description of the change classification approach, techniques for extracting features from the source code and change histories, a characterization of the performance of change classification across 12 open source projects, and an evaluation of the predictive power of different groups of features.
Predicting source code changes by mining change history
TL;DR: An approach that applies data mining techniques to determine change patterns can be used to recommend potentially relevant source code to a developer performing a modification task and can reveal valuable dependencies by applying to the Eclipse and Mozilla open source projects.
Free/Libre open-source software development: What we know and what we do not know
TL;DR: This work develops a framework for organizing the literature based on the input-mediator-output-input (IMOI) model from the small groups literature, and suggests topics for future research.
References
•Book
Data Mining: Concepts and Techniques
Jiawei Han,Micheline Kamber,Jian Pei +2 more
- 08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
•Proceedings Article
Fast Algorithms for Mining Association Rules in Large Databases
Rakesh Agrawal,Ramakrishnan Srikant +1 more
- 12 Sep 1994
TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
•Proceedings Article
Fast algorithms for mining association rules
Rakesh Agrawal,Ramakrishnan Srikant +1 more
- 01 Jul 1998
TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Mining frequent patterns without candidate generation
Jiawei Han,Jian Pei,Yiwen Yin +2 more
- 16 May 2000
TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.