Software defect prediction using Bayesian networks

doi:10.1007/S10664-012-9218-8

Journal Article10.1007/S10664-012-9218-8

Software defect prediction using Bayesian networks

Ahmet Okutan, +1 more

- 01 Feb 2014

- Empirical Software Engineering

- Vol. 19, Iss: 1, pp 154-181

337

TL;DR: This work uses Bayesian networks to determine the probabilistic influential relationships among software metrics and defect proneness, and shows that response for class, lines of code, and lack of coding quality are the most effective metrics whereas coupling between objects, weighted method per class, and Lack of cohesion of methods are less effective metrics on defects proneness.

Abstract: There are lots of different software metrics discovered and used for defect prediction in the literature. Instead of dealing with so many metrics, it would be practical and easy if we could determine the set of metrics that are most important and focus on them more to predict defectiveness. We use Bayesian networks to determine the probabilistic influential relationships among software metrics and defect proneness. In addition to the metrics used in Promise data repository, we define two more metrics, i.e. NOD for the number of developers and LOCQ for the source code quality. We extract these metrics by inspecting the source code repositories of the selected Promise data repository data sets. At the end of our modeling, we learn the marginal defect proneness probability of the whole software system, the set of most effective metrics, and the influential relationships among metrics and defectiveness. Our experiments on nine open source Promise data repository data sets show that response for class (RFC), lines of code (LOC), and lack of coding quality (LOCQ) are the most effective metrics whereas coupling between objects (CBO), weighted method per class (WMC), and lack of cohesion of methods (LCOM) are less effective metrics on defect proneness. Furthermore, number of children (NOC) and depth of inheritance tree (DIT) have very limited effect and are untrustworthy. On the other hand, based on the experiments on Poi, Tomcat, and Xalan data sets, we observe that there is a positive correlation between the number of developers (NOD) and the level of defectiveness. However, further investigation involving a greater number of projects is needed to confirm our findings.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1016/J.ASOC.2014.11.023

A systematic review of machine learning techniques for software fault prediction

Ruchika Malhotra

- 01 Feb 2015

TL;DR: The machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers, however, the application of theMachine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results.

...read moreread less

637

Journal Article•10.1016/J.INFSOF.2014.07.005

Software defect prediction using ensemble learning on selected features

Issam H. Laradji, +2 more

- 01 Feb 2015

- Information & Software Technology

TL;DR: Tackling software data issues, including redundancy, correlation, feature irrelevance and missing samples, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.

...read moreread less

387

•Journal Article•10.1016/J.INFSOF.2014.11.006

An empirical study on software defect prediction with a simplified metric set

Peng He, +4 more

- 01 Mar 2015

- Information & Software Technology

TL;DR: The experimental results indicate that the choice of training data for defect prediction should depend on the specific requirement of accuracy and the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice.

...read moreread less

325

•Journal Article•10.1109/TSE.2018.2836442

A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction

Qinbao Song, +2 more

- 01 Dec 2019

- IEEE Transactions on Software Engineerin...

TL;DR: Imbalanced learning should only be considered for moderate or highly imbalanced SDP data sets and the appropriate combination of imbalanced method and classifier needs to be carefully chosen to ameliorate the imbalanced learning problem for SDP.

...read moreread less

277

•Journal Article•10.6703/IJASE.202012_17(4).331

Survey on software defect prediction techniques

Mahesh Kumar Thota, +2 more

- 01 Jan 2020

- International Journal of Applied Science...

TL;DR: This work is planning to develop an efficient approach for software defect prediction by using soft computing based machine learning techniques which helps to predict optimize the features and efficiently learn the features.

...read moreread less

246

...

Expand

References

•Book

An introduction to the bootstrap

Bradley Efron, +1 more

- 01 Jan 1993

TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.

...read moreread less

40.3K

Journal Article•10.1145/1656274.1656278

The WEKA data mining software: an update

Mark Hall, +5 more

- 16 Nov 2009

- Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

21.2K

Journal Article•10.2307/2983304

An Introduction to the Bootstrap.

Bradley Efron, +1 more

- 01 Mar 1995

- Journal of The Royal Statistical Society...

TL;DR: In this article, the authors present a geometric representation for the Bootstrap and the Jackknife, as well as an overview of nonparametric and Parametric Inference methods for estimating the error in Bootstrap estimates.

...read moreread less

15.3K

•Journal Article•10.1109/4235.585893

No free lunch theorems for optimization

David H. Wolpert, +1 more

- 01 Apr 1997

- IEEE Transactions on Evolutionary Comput...

TL;DR: A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving and a number of "no free lunch" (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performance over another class.

...read moreread less

13.4K

Journal Article•10.1080/00401706.1995.10484340

An Introduction to the Bootstrap

Scott D. Grimshaw

- 01 Aug 1995

- Technometrics

TL;DR: Statistical theory attacks the problem from both ends as discussed by the authors, and provides optimal methods for finding a real signal in a noisy background, and also provides strict checks against the overinterpretation of random patterns.

...read moreread less

6.4K