About: Software regression is a research topic. Over the lifetime, 761 publications have been published within this topic receiving 28673 citations. The topic is also known as: Regression bug & Regression error.
TL;DR: This book brings together a number of procedures developed for regression problems in current use and includes material that either has not previously appeared in a textbook or if it has appeared is not generally available.
Abstract: This book brings together a number of procedures developed for regression problems in current use. Since the emphasis is on practical application theoretical results are stated without proofs in many cases. This book provides a standard basic course in multiple linear regression but it also includes material that either has not previously appeared in a textbook or if it has appeared is not generally available. Chapters 1 and 3 together provide a course in fitting a straight line without using matrix algebra at all. If chapter 2 is added the idea of matrix representation of regression problems can be introduced as well. Chapter 4 covers 2 predictor variables and chapter 5 deals with more complicated models. Selecting the best regression equation is discussed in chapter 6. Chapter 7 covers specific problems. Chapters 8 and 9 discuss 1) multiple regression and mathematical model building and 2) multiple regression applied to analysis of variance problems. Chapter 10 contains an introduction to nonlinear estimation. The 2nd edition contains many new regression ideas and techniques. In particular new computational algorithms and new software regression packages have made it very easy to investigate the allequacy of conjectured models with many different techniques.
TL;DR: This paper applies a machine learning algorithm to the open bug repository to learn the kinds of reports each developer resolves and reaches precision levels of 57% and 64% on the Eclipse and Firefox development projects respectively.
Abstract: Open source development projects typically support an open bug repository to which both developers and users can report bugs. The reports that appear in this repository must be triaged to determine if the report is one which requires attention and if it is, which developer will be assigned the responsibility of resolving the report. Large open source developments are burdened by the rate at which new bug reports appear in the bug repository. In this paper, we present a semi-automated approach intended to ease one part of this process, the assignment of reports to a developer. Our approach applies a machine learning algorithm to the open bug repository to learn the kinds of reports each developer resolves. When a new report arrives, the classifier produced by the machine learning technique suggests a small number of developers suitable to resolve the report. With this approach, we have reached precision levels of 57% and 64% on the Eclipse and Firefox development projects respectively. We have also applied our approach to the gcc open source development with less positive results. We describe the conditions under which the approach is applicable and also report on the lessons we learned about applying machine learning to repositories used in open source development.
TL;DR: Using principal component analysis on the code metrics, this work built regression models that accurately predict the likelihood of post-release defects for new entities and can be generalized to arbitrary projects.
Abstract: What is it that makes software fail? In an empirical study of the post-release defect history of five Microsoft software systems, we found that failure-prone software entities are statistically correlated with code complexity measures. However, there is no single set of complexity metrics that could act as a universally best defect predictor. Using principal component analysis on the code metrics, we built regression models that accurately predict the likelihood of post-release defects for new entities. The approach can easily be generalized to arbitrary projects; in particular, predictors obtained from one project can also be significant for new, similar projects.
TL;DR: Initial empirical studies indicate that the technique can significantly reduce the cost of regression testing modified software and is at lease as precise as other safe regression test selection algorithms.
Abstract: Regression testing is an expensive but necessary maintenance activity performed on modified software to provide confidence that changes are correct and do not adversely affect other portions of the softwore. A regression test selection technique choses, from an existing test set, thests that are deemed necessary to validate modified software. We present a new technique for regression test selection. Our algorithms construct control flow graphs for a precedure or program and its modified version and use these graphs to select tests that execute changed code from the original test suite. We prove that, under certain conditions, the set of tests our technique selects includes every test from the original test suite that con expose faults in the modified procedfdure or program. Under these conditions our algorithms are safe. Moreover, although our algorithms may select some tests that cannot expose faults, they are at lease as precise as other safe regression test selection algorithms. Unlike many other regression test selection algorithms, our algorithms handle all language constructs and all types of program modifications. We have implemented our algorithms; initial empirical studies indicate that our technique can significantly reduce the cost of regression testing modified software.
TL;DR: The CUEZILLA prototype is a tool that measures the quality of new bug reports and recommends which elements should be added to improve the quality, and discusses several recommendations for better bug tracking systems which should focus on engaging bug reporters, better tool support, and improved handling of bug duplicates.
Abstract: In software development, bug reports provide crucial information to developers. However, these reports widely differ in their quality. We conducted a survey among developers and users of APACHE, ECLIPSE, and MOZILLA to find out what makes a good bug report.The analysis of the 466 responses revealed an information mismatch between what developers need and what users supply. Most developers consider steps to reproduce, stack traces, and test cases as helpful, which are at the same time most difficult to provide for users. Such insight is helpful to design new bug tracking tools that guide users at collecting and providing more helpful information.Our CUEZILLA prototype is such a tool and measures the quality of new bug reports; it also recommends which elements should be added to improve the quality. We trained CUEZILLA on a sample of 289 bug reports, rated by developers as part of the survey. In our experiments, CUEZILLA was able to predict the quality of 31--48% of bug reports accurately.