Journal Article10.1007/S10664-012-9218-8
Software defect prediction using Bayesian networks
Ahmet Okutan,Olcay Taner Yildiz +1 more
TL;DR: This work uses Bayesian networks to determine the probabilistic influential relationships among software metrics and defect proneness, and shows that response for class, lines of code, and lack of coding quality are the most effective metrics whereas coupling between objects, weighted method per class, and Lack of cohesion of methods are less effective metrics on defects proneness.
read more
Abstract: There are lots of different software metrics discovered and used for defect prediction in the literature. Instead of dealing with so many metrics, it would be practical and easy if we could determine the set of metrics that are most important and focus on them more to predict defectiveness. We use Bayesian networks to determine the probabilistic influential relationships among software metrics and defect proneness. In addition to the metrics used in Promise data repository, we define two more metrics, i.e. NOD for the number of developers and LOCQ for the source code quality. We extract these metrics by inspecting the source code repositories of the selected Promise data repository data sets. At the end of our modeling, we learn the marginal defect proneness probability of the whole software system, the set of most effective metrics, and the influential relationships among metrics and defectiveness. Our experiments on nine open source Promise data repository data sets show that response for class (RFC), lines of code (LOC), and lack of coding quality (LOCQ) are the most effective metrics whereas coupling between objects (CBO), weighted method per class (WMC), and lack of cohesion of methods (LCOM) are less effective metrics on defect proneness. Furthermore, number of children (NOC) and depth of inheritance tree (DIT) have very limited effect and are untrustworthy. On the other hand, based on the experiments on Poi, Tomcat, and Xalan data sets, we observe that there is a positive correlation between the number of developers (NOD) and the level of defectiveness. However, further investigation involving a greater number of projects is needed to confirm our findings.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A systematic review of machine learning techniques for software fault prediction
Ruchika Malhotra
- 01 Feb 2015
TL;DR: The machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers, however, the application of theMachine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results.
637
Software defect prediction using ensemble learning on selected features
TL;DR: Tackling software data issues, including redundancy, correlation, feature irrelevance and missing samples, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.
387
An empirical study on software defect prediction with a simplified metric set
TL;DR: The experimental results indicate that the choice of training data for defect prediction should depend on the specific requirement of accuracy and the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice.
A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction
TL;DR: Imbalanced learning should only be considered for moderate or highly imbalanced SDP data sets and the appropriate combination of imbalanced method and classifier needs to be carefully chosen to ameliorate the imbalanced learning problem for SDP.
Survey on software defect prediction techniques
TL;DR: This work is planning to develop an efficient approach for software defect prediction by using soft computing based machine learning techniques which helps to predict optimize the features and efficiently learn the features.
References
•Book
An introduction to the bootstrap
Bradley Efron,Robert Tibshirani +1 more
- 01 Jan 1993
TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
The WEKA data mining software: an update
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
An Introduction to the Bootstrap.
Bradley Efron,Robert Tibshirani +1 more
TL;DR: In this article, the authors present a geometric representation for the Bootstrap and the Jackknife, as well as an overview of nonparametric and Parametric Inference methods for estimating the error in Bootstrap estimates.
15.3K
No free lunch theorems for optimization
TL;DR: A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving and a number of "no free lunch" (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performance over another class.
An Introduction to the Bootstrap
TL;DR: Statistical theory attacks the problem from both ends as discussed by the authors, and provides optimal methods for finding a real signal in a noisy background, and also provides strict checks against the overinterpretation of random patterns.
6.4K