TL;DR: The results indicate that the nature of changes (in particular changes related to refactorings), the software design, and the number of active developers are factors related to change entropy.
Abstract: Context Software systems continuously change for various reasons, such as adding new features, fixing bugs, or refactoring. Changes may either increase the source code complexity and disorganization, or help to reducing it. Aim This paper empirically investigates the relationship of source code complexity and disorganization--measured using source code change entropy--with four factors, namely the presence of refactoring activities, the number of developers working on a source code file, the participation of classes in design patterns, and the different kinds of changes occurring on the system, classified in terms of their topics extracted from commit notes. Method We carried out an exploratory study on an interval of the life-time span of four open source systems, namely ArgoUML, Eclipse-JDT, Mozilla, and Samba, with the aim of analyzing the relationship between the source code change entropy and four factors: refactoring activities, number of contributors for a file, participation of classes in design patterns, and change topics. Results The study shows that (i) the change entropy decreases after refactoring, (ii) files changed by a higher number of developers tend to exhibit a higher change entropy than others, (iii) classes participating in certain design patterns exhibit a higher change entropy than others, and (iv) changes related to different topics exhibit different change entropy, for example bug fixings exhibit a limited change entropy while changes introducing new features exhibit a high change entropy. Conclusions Results provided in this paper indicate that the nature of changes (in particular changes related to refactorings), the software design, and the number of active developers are factors related to change entropy. Our findings contribute to understand the software aging phenomenon and are preliminary to identifying better ways to contrast it.
TL;DR: In this paper, a wavelet transforms are used to define a Suspiciously Structured Entropic Change Score (SSECS), a scalar feature that quantifies the suspiciousness of a file based on its distribution of entropic energy across multiple levels of spatial resolution.
TL;DR: An integrated, quality-driven and tool-supported methodology to support object-oriented software evolution based on the novel concept of "correction strategies", which serve as reference descriptions that enable a human-assisted tool to plan and perform all necessary steps for the safe removal of detected design flaws.
Abstract: Software inevitably changes. As a consequence, we observe the phenomenon referred to as "software entropy" or "software decay": the software design continually degrades making maintenance and functional extensions overly costly if not impossible. There exist a number of approaches to identify design flaws (problem detection) and to remedy them (refactoring). There is, however, a conceptual gap between these two stages: There is no appropriate support for the automated mapping of design flaws to possible solutions. Here we propose an integrated, quality-driven and tool-supported methodology to support object-oriented software evolution. Our approach is based on the novel concept of "correction strategies". Correction strategies serve as reference descriptions that enable a human-assisted tool to plan and perform all necessary steps for the safe removal of detected design flaws, with special concern towards the targeted quality goals of the restructuring process. We briefly sketch our tool chain and illustrate our approach with the help of a medium-sized real-world case-study.
TL;DR: This paper uses the metrics derived using entropy of changes to compare five machine learning techniques, namely Gene Expression Programming (GEP), General Regression Neural Network, Locally Weighted Regression, Support Vector Regression and Least Median Square Regression for predicting bugs.
Abstract: There are many approaches for predicting bugs in software systems. A popular approach for bug prediction is using entropy of changes as proposed by Hassan (2009). This paper uses the metrics derived using entropy of changes to compare five machine learning techniques, namely Gene Expression Programming (GEP), General Regression Neural Network, Locally Weighted Regression, Support Vector Regression (SVR) and Least Median Square Regression for predicting bugs. Four software subsystems: mozilla/layout/generic, mozilla/layout/forms, apache/httpd/modules/ssl and apache/httpd/modules/mappers are used for the validation purpose. The data extraction for the validation purpose is automated by developing an algorithm that employs web scraping and regular expressions. The study suggests GEP and SVR as stable regression techniques for bug prediction using entropy of changes.
TL;DR: In this article, a wavelet transform is applied to the corresponding entropy time series to generate an energy spectrum characterizing, for the file, an amount of entropic energy at multiple scales of code resolution, which can then be determined, for each file, whether or not the file is likely to be malicious based on the energy spectrum.
Abstract: A plurality of data files is received. Thereafter, each file is represented as an entropy time series that reflects an amount of entropy across locations in code for such file. A wavelet transform is applied, for each file, to the corresponding entropy time series to generate an energy spectrum characterizing, for the file, an amount of entropic energy at multiple scales of code resolution. It can then be determined, for each file, whether or not the file is likely to be malicious based on the energy spectrum. Related apparatus, systems, techniques and articles are also described.