Conference
Predictive Models in Software Engineering
About: Predictive Models in Software Engineering is an academic conference. The conference publishes majorly in the area(s): Computer science & Software. Over the lifetime, 153 publications have been published by the conference receiving 3995 citations.
Papers
12 Sep 2010
TL;DR: The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used.
Abstract: Background: This paper describes an analysis that was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters.Aims: The goal of this research is to perform clustering on software projects in order to identify groups of software projects with similar characteristic from the defect prediction point of view. One defect prediction model should work well for all projects that belong to such group. The existence of those groups was investigated with statistical tests and by comparing the mean value of prediction efficiency.Method: Hierarchical and k-means clustering, as well as Kohonen's neural network was used to find groups of similar projects. The obtained clusters were investigated with the discriminant analysis. For each of the identified group a statistical analysis has been conducted in order to distinguish whether this group really exists. Two defect prediction models were created for each of the identified groups. The first one was based on the projects that belong to a given group, and the second one - on all the projects. Then, both models were applied to all versions of projects from the investigated group. If the predictions from the model based on projects that belong to the identified group are significantly better than the all-projects model (the mean values were compared and statistical tests were used), we conclude that the group really exists.Results: Six different clusters were identified and the existence of two of them was statistically proven: 1) cluster proprietary B -- T=19, p=0.035, r=0.40; 2) cluster proprietary/open - t(17)=3.18, p=0.05, r=0.59. The obtained effect sizes (r) represent large effects according to Cohen's benchmark, which is a substantial finding.Conclusions: The two identified clusters were described and compared with results obtained by other researchers. The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used. Furthermore, a method of clustering was suggested and applied.
602 citations
17 Sep 2014
TL;DR: Subjective estimation techniques, e.g. expert judgment-based techniques, planning poker or the use case points method are the one used the most in agile effort estimation studies, thus suggesting numerous possible avenues for future work.
Abstract: Context: Ever since the emergence of agile methodologies in 2001, many software companies have shifted to Agile Software Development (ASD), and since then many studies have been conducted to investigate effort estimation within such context; however to date there is no single study that presents a detailed overview of the state of the art in effort estimation for ASD. Objectives: The aim of this study is to provide a detailed overview of the state of the art in the area of effort estimation in ASD. Method: To report the state of the art, we conducted a systematic literature review in accordance with the guidelines proposed in the evidence-based software engineering literature. Results: A total of 25 primary studies were selected; the main findings are: i) Subjective estimation techniques (e.g. expert judgment, planning poker, use case points estimation method) are the most frequently applied in an agile context; ii) Use case points and story points are the most frequently used size metrics respectively; iii) MMRE (Mean Magnitude of Relative Error) and MRE (Magnitude of Relative Error) are the most frequently used accuracy metrics; iv) team skills, prior experience and task size are cited as the three important cost drivers for effort estimation in ASD; and v) Extreme Programming (XP) and SCRUM are the only two agile methods that are identified in the primary studies. Conclusion: Subjective estimation techniques, e.g. expert judgment-based techniques, planning poker or the use case points method, are the one used the most in agile effort estimation studies. As for the size metrics, the ones that were used the most in the primary studies were story points and use case points. Several research gaps were identified, relating to the agile methods, size metrics and cost drivers, thus suggesting numerous possible avenues for future work.
222 citations
9 Oct 2013
TL;DR: The results show that the proposed distance-based strategies for the selection of training data based on distributional characteristics of the available data improves the achieved success rate of cross-project defect predictions significantly and the quality of the results still cannot compete with within- project defect prediction.
Abstract: Software defect prediction has been a popular research topic in recent years and is considered as a means for the optimization of quality assurance activities. Defect prediction can be done in a within-project or a cross-project scenario. The within-project scenario produces results with a very high quality, but requires historic data of the project, which is often not available. For the cross-project prediction, the data availability is not an issue as data from other projects is readily available, e.g., in repositories like PROMISE. However, the quality of the defect prediction results is too low for practical use. Recent research showed that the selection of appropriate training data can improve the quality of cross-project defect predictions. In this paper, we propose distance-based strategies for the selection of training data based on distributional characteristics of the available data. We evaluate the proposed strategies in a large case study with 44 data sets obtained from 14 open source projects. Our results show that our training data selection strategy improves the achieved success rate of cross-project defect predictions significantly. However, the quality of the results still cannot compete with within-project defect prediction.
172 citations
20 Sep 2011
TL;DR: It is found that the time of the filing of a bug and its location are the most important attributes in the Mozilla project for determining the fix-time of a bugs.
Abstract: Background: Bug fixing lies at the core of most software maintenance efforts. Most prior studies examine the effort needed to fix a bug (fix-effort). However, the effort needed to fix a bug may not correlate with the calendar time needed to fix it (fix-time). For example, the fix-time for bugs with low fix-effort may be long if they are considered to be of low priority.Aims: We study the fix-time for bugs in large open source projects.Method: We study the fix-time along three dimensions: (1) the location of the bug (e.g., which component), (2) the reporter of the bug, and (3) the description of the bug. Using these three dimensions and their associated attributes, we examine the fix-time for bugs in two large open source projects: Eclipse and Mozilla, using a random forest classifier.Results: We show that we can correctly classify ~65% of the time the fix-time for bugs in these projects. We perform a sensitivity analysis to identify the most important attributes in each dimension. We find that the time of the filing of a bug and its location are the most important attributes in the Mozilla project for determining the fix-time of a bug. On the other hand, the fix-time in the Eclipse project is highly dependant on the severity of the bug. Surprisingly, the priority of the bug is not an important attribute when determining the fix-time for a bug in both projects.Conclusion: Attributes affecting the fix-time vary between projects and vary over time within the same project.
138 citations
21 Oct 2015
TL;DR: A dataset extracted from the Jira ITS of four popular open source ecosystems (as well as the tools and infrastructure used for extraction) the Apache Software Foundation, Spring, JBoss and CodeHaus communities is presented, to deeply study the communication process among developers, and how this aspect affects the development process.
Abstract: Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and recently investigating developers "affectiveness". In particular, the Jira Issue Tracking System is a proprietary tracking system that has gained a tremendous popularity in the last years and offers unique features like the project management system and the Jira agile kanban board. This paper presents a dataset extracted from the Jira ITS of four popular open source ecosystems (as well as the tools and infrastructure used for extraction) the Apache Software Foundation, Spring, JBoss and CodeHaus communities. Our dataset hosts more than 1K projects, containing more than 700K issue reports and more than 2 million issue comments. Using this data, we have been able to deeply study the communication process among developers, and how this aspect affects the development process. Furthermore, comments posted by developers contain not only technical information, but also valuable information about sentiments and emotions. Since sentiment analysis and human aspects in software engineering are gaining more and more importance in the last years, with this repository we would like to encourage further studies in this direction.
99 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2021 | 5 |
| 2020 | 7 |
| 2019 | 11 |
| 2018 | 11 |
| 2017 | 12 |
| 2016 | 10 |