TL;DR: An approach, coined as Change Scribe, which is designed to generate commit messages automatically from change sets by taking into account commit stereotype, the type of changes, as well as the impact set of the underlying changes.
Abstract: Although version control systems allow developers to describe and explain the rationale behind code changes in commit messages, the state of practice indicates that most of the time such commit messages are either very short or even empty. In fact, in a recent study of 23K+ Java projects it has been found that only 10% of the messages are descriptive and over 66% of those messages contained fewer words as compared to a typical English sentence (i.e., 15-20 words). However, accurate and complete commit messages summarizing software changes are important to support a number of development and maintenance tasks. In this paper we present an approach, coined as Change Scribe, which is designed to generate commit messages automatically from change sets. Change Scribe generates natural language commit messages by taking into account commit stereotype, the type of changes (e.g., files rename, changes done only to property files), as well as the impact set of the underlying changes. We evaluated Change Scribe in a survey involving 23 developers in which the participants analyzed automatically generated commit messages from real changes and compared them with commit messages written by the original developers of six open source systems. The results demonstrate that automatically generated messages by Change Scribe are preferred in about 62% of the cases for large commits, and about 54% for small commits.
TL;DR: This study investigates semantic versioning, a versioning scheme which provides strict rules on major versus minor and patch releases, and analyzes seven years of library release history in Maven Central to find that around one third of all releases introduce at least one breaking change.
Abstract: For users of software libraries or public programming interfaces (APIs), backward compatibility is a desirable trait. Without compatibility, library users will face increased risk and cost when upgrading their dependencies. In this study, we investigate semantic versioning, a versioning scheme which provides strict rules on major versus minor and patch releases. We analyze seven years of library release history in Maven Central, and contrast version identifiers with actual incompatibilities. We find that around one third of all releases introduce at least one breaking change, and that this figure is the same for minor and major releases, indicating that version numbers do not provide developers with information in stability of interfaces. Additionally, we find that the adherence to semantic versioning principles has only marginally increased over time. We also investigate the use of deprecation tags and find out that methods get deleted without applying deprecated tags, and methods with deprecated tags are never deleted. We conclude the paper by arguing that the adherence to semantic versioning principles should increase because it provides users of an interface with a way to determine the amount of rework that is expected when upgrading to a new version.
TL;DR: A study that investigates fine-grained co-evolution patterns of production and test code and provides insights into their appearance along the history of the analyzed software systems to provide a better understanding of how test code evolves.
Abstract: Numerous software development practices suggest updating the test code whenever the production code is changed. However, previous studies have shown that co-evolving test and production code is generally a difficult task that needs to be thoroughly investigated. In this paper we perform a study that, following a mixed methods approach, investigates fine-grained co-evolution patterns of production and test code. First, we mine fine-grained changes from the evolution of 5 open-source systems. Then, we use an association rule mining algorithm to generate the co-evolution patterns. Finally, we interpret the obtained patterns by performing a qualitative analysis. The results show 6 co-evolution patterns and provide insights into their appearance along the history of the analyzed software systems. Besides providing a better understanding of how test code evolves, these findings also help identify gaps in the test code thereby assisting both researchers and developers.
TL;DR: Zhang et al. as mentioned in this paper proposed a context-aware code recommendation approach that recommends exception handling code examples from a number of popular open source code repositories hosted at GitHub, which collects the code examples exploiting GitHub code search API, and then analyzes, filters and ranks them against the code under development in the IDE by leveraging not only the structural (i.e., graph-based) and lexical features but also the heuristic quality measures of exception handlers in the examples.
Abstract: Studies show that software developers often either misuse exception handling features or use them inefficiently, and such a practice may lead an undergoing software project to a fragile, insecure and non-robust application system. In this paper, we propose a context-aware code recommendation approach that recommends exception handling code examples from a number of popular open source code repositories hosted at GitHub. It collects the code examples exploiting GitHub code search API, and then analyzes, filters and ranks them against the code under development in the IDE by leveraging not only the structural (i.e., graph-based) and lexical features but also the heuristic quality measures of exception handlers in the examples. Experiments with 4,400 code examples and 65 exception handling scenarios as well as comparisons with four existing approaches show that the proposed approach is highly promising.
TL;DR: This research work applies the concept of evolutionary coupling to identify clones that are important for refactoring or tracking and determines and analyzes constrained association rules of clone fragments that evolved following a particular change pattern called Similarity Preserving Change Pattern and are important from the perspective ofRefactoring and tracking.
Abstract: Code cloning is a controversial software engineering practice due to contradictory claims regarding its impacts on software evolution and maintenance. While a number of studies identify some positive aspects of code clones, there is strong empirical evidence of some negative impacts of clones too. Focusing on the issues related to clones researchers suggest to manage code clones through detection, refactoring, and tracking. However, all clones in a software system are not suitable for refactoring or tracking. Thus, it is important to identify which clones we should consider for refactoring and which clones should be considered for tracking. In this research work we apply the concept of evolutionary coupling to identify clones that are important for refactoring or tracking. By mining software evolution history, we determine and analyze constrained association rules of clone fragments that evolved following a particular change pattern called Similarity Preserving Change Pattern and are important from the perspective of refactoring and tracking. According to our investigation with rigorous manual analysis on thousands of revisions of six diverse subject systems covering two programming languages, overall 13.20% of all clones in a software system are important candidates for refactoring, and overall 10.27% of all clones are important candidates for tracking. Our implemented system can automatically identify these important candidates and thus, can help us in better maintenance of code clones in terms of refactoring and tracking.
TL;DR: Experiments indicate that the proposed enhanced taint analysis algorithm leads to significant improvements in the quality of deobfuscation, and a generalization of current bit-level taintAnalysis techniques to address these problems and improve their precision is proposed.
Abstract: Taint analysis has a wide variety of applications in software analysis, making the precision of taint analysis an important consideration. Current taint analysis algorithms, including previous work on bit-precise taint analyses, suffer from shortcomings that can lead to significant loss of precision (under/over tainting) in some situations. This paper discusses these limitations of existing taint analysis algorithms, shows how they can lead to imprecise taint propagation, and proposes a generalization of current bit-level taint analysis techniques to address these problems and improve their precision. Experiments using a deobfuscation tool indicate that our enhanced taint analysis algorithm leads to significant improvements in the quality of deobfuscation.
TL;DR: PESTO is proposed, a tool facing the problem of the automated migration of 2nd generation test suites to the 3rd generation, based on visual image recognition, which determines automatically the screen position of each web element located on the DOM by a 2 second generation test case.
Abstract: Automated testing of web applications reduces the effort needed in manual testing. Old 1st generation tools, based on screen coordinates, produce quite fragile test suites, tightly coupled with the specific screen resolution, window position and size experienced during test case recording. These tools have been replaced by a 2nd generation of tools, which offer easy selection and interaction with the web elements, based on DOM-oriented commands. Recently, a new 3rd generation of tools came up based on visual image recognition, bringing the promise of wider applicability and simplicity. A tester might ask if the migration towards such new technology is worthwhile, since the manual effort to rewrite a test suite might be overwhelming. In this paper, we propose PESTO, a tool facing the problem of the automated migration of 2nd generation test suites to the 3rd generation. PESTO determines automatically the screen position of each web element located on the DOM by a 2nd generation test case. It then calculates a screenshot image centred around the web element so as to ensure unique visual matching. Then, the entire source code of the DOM-based test suite is transformed into a visual test suite, based on such automatically extracted images and using specific visual commands.
TL;DR: This paper provides a formal description of an automatic translation of ACSL annotations into C code that can be used by a test generation tool either to trigger and detect specification failures, or to gain confidence, or, under some assumptions, even to confirm that the code is in conformity with respect to the annotations.
Abstract: Software verification and validation often rely on formal specifications that encode desired program properties. Recent research proposed a combined verification approach in which a program can be incrementally verified using alternatively deductive verification and testing. Both techniques should use the same specification expressed in a unique specification language. This paper addresses this problem within the Frama-C framework for analysis of C programs, that offers ACSL as a common specification language. We provide a formal description of an automatic translation of ACSL annotations into C code that can be used by a test generation tool either to trigger and detect specification failures, or to gain confidence, or, under some assumptions, even to confirm that the code is in conformity with respect to the annotations. We implement the proposed specification translation in a combined verification tool Study. Our initial experiments suggest that the proposed support for a common specification language can be very helpful for combined static-dynamic analyses.
TL;DR: The effect on maintainability qualitatively is discussed and the effect quantitatively is analyzed using a set of software metrics extending the Chidamber-Kemerer suite to find that maintainability has increased as an effect of the refactoring.
Abstract: Methods and tools for refactoring of software have been extensively studied during the last decades, and we argue that there is now a need for additional studies of the effects of refactoring on code quality and external code attributes such as computational performance. To study these effects, we have refactored the central parts of a code base developed in academia for a class of computationally demanding scientific computing problems. We made design choices on the basis of the SOLID principles and we used object-oriented techniques, such as the Gang of Four patterns, in the implementation. In this paper, we discuss the effect on maintainability qualitatively and also analyze it quantitatively using a set of software metrics extending the Chidamber-Kemerer suite. Not surprisingly, we find that maintainability has increased as an effect of the refactoring. We also study performance and find that dynamic binding, which inhibits in lining by the compiler, in the most frequently executed parts of the code makes the execution times increase by over 700%. By exploiting static polymorphism, we have been able able to reduce the relative increase in execution times to less than 100%. We argue that the code version implementing static polymorphism is less maintainable than the one using dynamic polymorphism, although both versions are considerably more maintainable than the original code.
TL;DR: This study on seven open-source Java systems with diversity in their size, length of evolution and application domain shows that changes are more frequent in cloned code than in noncloned code and Type-1 clones are comparatively more vulnerable to the stability of the systems.
Abstract: Clones are the duplicate or similar code blocks in software systems. A large number of studies concerning the impacts of clones on software systems mainly focus on the frequency of changes to evaluate stability, consistency in evolution and introduction of bugs. Although it is obvious that not each type of changes has equal impact on software systems, none of the existing studies take the types of changes and their significance into account during comparative evaluation of stability of cloned and non-cloned code. This paper presents an empirical study on the comparative stability of cloned and non-cloned code from the perspective of different change types. Changes from successive revisions are extracted and classified using Change Distiller which employs Abstract Syntax Tree (AST) differencing of the successive revisions of source code and assigns the corresponding level of significance to each of the classified changes. We detect exact (Type-1) and near-miss (Type-2 and Type-3) clones using the hybrid clone detection tool NiCad. Extracted and classified changes and clone information are then analyzed to compare the stability of cloned and non-cloned code from three different perspectives: types of clones, types of changes with respect to the significance of changes, and size and extent of evolution of the systems. Our study on seven open-source Java systems with diversity in their size, length of evolution and application domain shows that changes are more frequent in cloned code than in noncloned code and Type-1 clones are comparatively more vulnerable to the stability of the systems. Therefore, cloned code is less stable than non-cloned code suggesting that cloned code is likely to pose more maintenance challenges than non-cloned code.
TL;DR: The SourceMeter Sonar Qube plug-in extends the built-in Java code analysis engine of SonarQube with Front End ART's high-end Java codeAnalysis engine, and offers new GUI features on the Sonar Zube dashboard and drilldown views, making the SonAR Qube user experience more comfortable and the work with the tool more productive.
Abstract: The Source Meter Sonar Qube plug-in is an extension of Sonar Qube, an open-source platform for managing code quality made by Sonar Source S.A, Switzerland. The plug-in extends the built-in Java code analysis engine of Sonar Qube with Front End ART's high-end Java code analysis engine. Most of Sonar Qubes original analysis results are replaced (including the detected source code duplications), while the range of available analyses is extended with a number of additional metrics and issue detectors. Additionally, the plug-in offers new GUI features on the Sonar Qube dashboard and drilldown views, making the Sonar Qube user experience more comfortable and the work with the tool more productive.
TL;DR: Taking Picture Description Languages as a case study, the challenges and effectiveness of such a generalization are considered and the results show that not only is it possible to generalize the ORBS algorithm, but the resulting slicer is quite effective removing from 27% to 98% of the original source code with an average of 85%.
Abstract: Program slicing has seen a plethora of applications and variations since its introduction over thirty years ago. The dominant method for computing slices involves significant complex source-code analysis to model the dependences in the code. A recently introduced alternative, Observation-Based Slicing (ORBS), sidesteps this complexity by observing the behavior of candidate slices. ORBS has several other strengths, including the ability to slice multi-language systems. However, ORBS remains rooted in tradition as it captures semantics by comparing sequences of values. This raises the question of whether it is possible to extend slicing beyond its traditional semantic roots. A few existing projects have attempted this, but the extension requires considerable effort. If it is possible to build on the ORBS platform to more easily generalize slicing to languages with non-traditional semantics, then there is the potential to vastly increase the range of programming languages to which slicing can be applied. ORBS supports this by reducing the problem to that of generalizing how semantics are captured. Taking Picture Description Languages as a case study, the challenges and effectiveness of such a generalization are considered. The results show that not only is it possible to generalize the ORBS algorithm, but the resulting slicer is quite effective removing from 27% to 98% of the original source code with an average of 85%. Finally a qualitative look at the slices finds the technique very effective, at times producing minimal slices.
TL;DR: A tool ACUA is proposed to generate reports containing detailed API change and usage information by analyzing the binary code of both frameworks and clients programs written in Java to estimate the work load and decide when to starting upgrading client programs based on the estimation.
Abstract: Modern software uses frameworks through their Application Programming Interfaces (APIs). Framework APIs may change while frameworks evolve. Client programs have to upgrade to new releases of frameworks if security vulnerabilities are discovered in the used releases. Patching security vulnerabilities can be delayed by non-security-related API changes when the frameworks used by client programs are not up to date. Keeping frameworks updated can reduce the reaction time to patch security leaks. Client program upgrades are not cost free, developers need to understand the API usages in client programs and API changes between framework releases before conduct upgrading tasks. In this paper, we propose a tool ACUA to generate reports containing detailed API change and usage information by analyzing the binary code of both frameworks and clients programs written in Java. Developers can use the API change and usage reports generated by ACUA to estimate the work load and decide when to starting upgrading client programs based on the estimation.
TL;DR: Pangea is proposed, an infrastructure allowing fast development of static analyses on multi-language corpora that uses language-independent meta-models stored as object model snapshots that can be directly loaded into memory and queryed without any parsing overhead.
Abstract: Software corpora facilitate reproducibility of analyses, however, static analysis for an entire corpus still requires considerable effort, often duplicated unnecessarily by multiple users. Moreover, most corpora are designed for single languages increasing the effort for cross-language analysis. To address these aspects we propose Pangea, an infrastructure allowing fast development of static analyses on multi-language corpora. Pangea uses language-independent meta-models stored as object model snapshots that can be directly loaded into memory and queryed without any parsing overhead. To reduce the effort of performing static analyses, Pangea provides out-of-the box support for: creating and refining analyses in a dedicated environment, deploying an analysis on an entire corpus, using a runner that supports parallel execution, and exporting results in various formats. In this tool demonstration we introduce Pangea and provide several usage scenarios that illustrate how it reduces the cost of analysis.
TL;DR: This tool paper develops Ekeko/X from the ground up, starting from its applicative logic meta-programming foundation, and highlights the key choices in this implementation and demonstrates its use through two example program transformations.
Abstract: Developers often need to perform repetitive changes to source code. For instance, to repair several instances of a bug or to update all clients of a library to a newer version. Manually performing such changes is laborious and error-prone. Program transformation tools enable automating changes, but specifying changes as a program transformation requires significant expertise. Code templates are often touted as a remedy, yet have never been endorsed wholeheartedly. Their use is mostly limited to expressing the syntactic characteristics of the intended change subjects. Less familiar means have to be resorted to for expressing their structural, control flow, and data flow characteristics. In this tool paper, we introduce a decidedly template-driven program transformation tool called Ekeko/X. Its specifications feature templates for specifying all of the aforementioned characteristics of its subjects. To this end, developers can associate different directives with individual components of a template. Each matching directive imposes particular constraints on the matches for the component it is associated with. Rewriting directives, on the other hand, determine how each match should be changed. We develop Ekeko/X from the ground up, starting from its applicative logic meta-programming foundation. We highlight the key choices in this implementation and demonstrate its use through two example program transformations.
TL;DR: This paper presents an approach to assess the accuracy of forward dynamic slices, which are used in software maintenance and evolution tasks, and is the first work that quantifies the intrinsic limitations of dynamic slicing.
Abstract: Dynamic slicing is a practical and popular analysis technique used in various software-engineering tasks. Dynamic slicing is known to be incomplete because it analyzes only a subset of all possible executions of a program. However, it is less known that its results may inaccurately represent the dependencies that occur in those executions. Some researchers have identified this problem and developed extensions such as relevant slicing, which incorporates static information. Yet, dynamic slicing continues to be widely used, even though the extent of its inaccuracy is not well understood, which can affect the benefits of this analysis. In this paper, we present an approach to assess the accuracy of forward dynamic slices, which are used in software maintenance and evolution tasks. Because finding all actual dependencies is an undecidable problem, our approach instead computes bounds of the precision and recall of forward dynamic slices. Our approach uses sensitivity analysis and execution differencing to find a subset of all program statements that truly depend at runtime on another statement. Using this approach, we studied the accuracy of many forward dynamic slices from a variety of Java applications. Our results show that forward dynamic slicing can have low recall -- for dependencies in the analyzed executions -- and some potential imprecision. We also conducted a case study that shows how this inaccuracy affects a software maintenance task. To the best of our knowledge, ours is the first work that quantifies the intrinsic limitations of dynamic slicing.
TL;DR: The method is able to efficiently use secondary storage for dynamic dependence graphs, thus allowing the method to scale to long program executions, and it is shown that graphs can be constructed for program runs with billions of executed instructions, at slowdowns ranging from 62x to 173x.
Abstract: Dynamic program slicing is widely recognized as a powerful aid for e.g. Program comprehension during debugging. However, its widespread use has been impeded in part by scalability issues that occur when constructing the dynamic dependence graph necessary to compute dynamic slices. A few seconds of execution time on a modern CPU can easily yield dynamic dependence graphs on the order of tens of gigabytes in size. Existing methods either produce imprecise slices, incur large time overheads during slice computation, or run out of memory for long program executions. By carefully designing our method to take advantage of locality, we are able to efficiently use secondary storage for dynamic dependence graphs, thus allowing our method to scale to long program executions. Our prototype implementation runs directly on x86 executables, eliminating problems with e.g. Binary-only libraries. We show in our experiments that graphs can be constructed for program runs with billions of executed instructions, at slowdowns ranging from 62x to 173x. Our optimized format also allows graphs to be traversed at speeds of several million dependence edges per second.
TL;DR: In this article, the authors define metrics that relate nodes and edges in the object graph to elements in the code structure, to measure how they differ, and if the differences are indicative of language or design features such as encapsulation, polymorphism and inheritance.
Abstract: To evolve object-oriented code, one must understand both the code structure in terms of classes, and the runtime structure in terms of abstractions of objects that are being created and relations between those objects. To help with this understanding, static program analysis can extract heap abstractions such as object graphs. But the extracted graphs can become too large if they do not sufficiently abstract objects, or too imprecise if they abstract objects excessively to the point of being similar to a class diagram, where one box for a class represents all the instances of that class. One previously proposed solution uses both annotations and abstract interpretation to extract a global, hierarchical, abstract object graph that conveys both abstraction and design intent, but can still be related to the code structure. In this paper, we define metrics that relate nodes and edges in the object graph to elements in the code structure, to measure how they differ, and if the differences are indicative of language or design features such as encapsulation, polymorphism and inheritance. We compute the metrics across eight systems totaling over 100 KLOC, and show a statistically significant difference between the code and the object graph. In several cases, the magnitude of this difference is large.
TL;DR: Investigating the effect of clone information on the performance of developers in common bug-fixing tasks shows that developers are quite capable to compensate missing clone information through testing to provide correct solutions, and clone information does help to detect cloned defects faster, although developers may exploit semantic code relations such as inheritance only slightly slower if they do not have clone information.
Abstract: Duplicated source code -- clones -- is known to occur frequently in software systems and bears the risk of inconsistent updates of the code. The impact of clones has been investigated mostly by retrospective analysis of software systems. Only little effort has been spent to investigate human interaction when dealing with clones. A previous study by Chatterji and colleagues found that cloned defects are removed significantly more accurately when clone information is provided to the programmers. We conducted a controlled experiment to extend the previous study on the use of clone information by investigating the effect of clone information on the performance of developers in common bug-fixing tasks. The experiment shows that developers are quite capable to compensate missing clone information through testing to provide correct solutions. Clone information does help to detect cloned defects faster, although developers may exploit semantic code relations such as inheritance to uncover cloned defects only slightly slower if they do not have clone information. If cloned defects lurk in semantically unrelated places however, clone information helps to find them faster at statistical significance. Developers without clone information needed 17 minutes longer on average or 140% more time in relative terms to complete the task successfully.
TL;DR: In this article, a pushdown framework for object-oriented languages with full-featured exceptions is presented, which allows precise matching of throwers to catchers and pruning of points-to information.
Abstract: Statically reasoning in the presence of exceptions and about the effects of exceptions is challenging: exception-flows are mutually determined by traditional control-flow and points-to analyses We tackle the challenge of analyzing exception-flows from two angles First, from the angle of pruning control-flows (both normal and exceptional), we derive a pushdown framework for an object-oriented language with full-featured exceptions Unlike traditional analyses, it allows precise matching of throwers to catchers Second, from the angle of pruning points-to information, we generalize abstract garbage collection to object-oriented programs and enhance it with liveness analysis We then seamlessly weave the techniques into enhanced reach ability computation, yielding highly precise exception-flow analysis, without becoming intractable, even for large applications We evaluate our pruned, pushdown exception-flow analysis, comparing it with an established analysis on large scale standard Java benchmarks The results show that our analysis significantly improves analysis precision over traditional analysis within a reasonable analysis time
TL;DR: A new algorithm is presented that identifies a set of special nodes, which are the only ones that can have interprocedural dominance edges, and extends the intraProcedural dominator trees by deriving those edges.
Abstract: We present a new algorithm for computing interprocedural dominators. The algorithm identifies a set of special nodes, which are the only ones that can have interprocedural dominance edges, and extends the intraprocedural dominator trees by deriving those edges. The computation of the dominators of each node is independent of the computation of any other node, and therefore can be done on demand for each node as required. For the same reason, the algorithm lends itself naturally to parallelization. The algorithm has been implemented, and is shown to be practical for large programs. Because of its cooperative caching behavior, the algorithm gains a large performance boost when running on parallel hardware. We also present an efficient way of extending the algorithm for computing interprocedural dominance frontiers and control dependence.
TL;DR: A novel algorithm is proposed that integrates concrete execution and symbolic reasoning about the error trace to address the challenges of concolic testing and can avoid complex logical encodings when reasoning about traces in low-level C programs.
Abstract: An integral part of all debugging activities is the task of diagnosing the cause of an error. Most existing fault diagnosis techniques rely on the availability of high quality test suites because they work by comparing failing and passing runs to identify the error cause. This limits their applicability. One alternative are techniques that statically analyze an error trace of the program without relying on additional passing runs to compare against. Particularly promising are novel proof-based approaches that leverage the advances in automated theorem proving to obtain an abstraction of the program that aids fault diagnostics. However, existing proof-based approaches still have practical limitations such as reduced scalability and dependence on complex mathematical models of programs. Such models are notoriously difficult to develop for real-world programs. Inspired by concolic testing, we propose a novel algorithm that integrates concrete execution and symbolic reasoning about the error trace to address these challenges. Specifically, we execute the error trace to obtain intermediate program states that allow us to split the trace into smaller fragments, each of which can be analyzed in isolation using an automated theorem prover. Moreover, we show how this approach can avoid complex logical encodings when reasoning about traces in low-level C programs. We have conducted an experiment where we applied our new algorithm to error traces generated from faulty versions of UNIX utils such as gzip and sed. Our experiment indicates that our concolic fault abstraction scales to real-world error traces and generates useful error diagnoses.
TL;DR: This paper proposes a search method for unpreprocessed programs, which are difficult to parse, and shows an alignment tool for branch directives, which converts undisciplined directives to discipline ones, and a reverse macro expansion tool, which integrates the use of macro calls.
Abstract: Pattern search of programs is a fundamental function for supporting programming. In this paper, we propose a search method for unpreprocessed programs, which are difficult to parse. Our parser directly parses them by rewriting token sequences, and allows minor errors in syntax trees. The search tool takes queries that are the same as the format of program fragments. By using the same parser for both queries and target programs, programmers have no need to describe the detail structures of syntax trees in queries. To support accurate search, we also show an alignment tool for branch directives, which converts undisciplined directives to discipline ones, and a reverse macro expansion tool, which integrates the use of macro calls. Finally, we present some experiments in which we have applied the tools to an open source application, and discuss how to improve our tools.
TL;DR: The approach that brings the software compilation process and security verification to a meeting point where both can be applied simultaneously in a user-friendly manner is presented.
Abstract: Automated verification tools are required to detect coding errors that may lead to severe software vulnerabilities. However, the usage of these tools is still not well integrated into software development life cycle. In this paper, we present our approach that brings the software compilation process and security verification to a meeting point where both can be applied simultaneously in a user-friendly manner. Our security verification engine is implemented as a new GCC pass that can be enabled via flag-fsecurity-check=checks.xml where the input XML file contains a set of user-defined security checks. The verification operates on the GIMPLE intermediate representation of source code that is language and platform independent. The conducted experiments demonstrate the scalability, efficiency and performance of our engine used to verify large scale software, especially the entire Linux kernel source code.
TL;DR: It is found that one single refactoring only makes a small change (sometimes even decreases quality), but when done in blocks, it can significantly increase quality, which can result not only in the local, but also in the global improvement of the code.
Abstract: The quality of a software system is mostly defined by its source code. Software evolves continuously, it gets modified, enhanced, and new requirements always arise. If we do not spend time periodically on improving our source code, it becomes messy and its quality will decrease inevitably. Literature tells us that we can improve the quality of our software product by regularly refactoring it. But does refactoring really increase software quality? Can it happen that a refactoring decreases the quality? Is it possible to recognize the change in quality caused by a single refactoring operation? In our paper, we seek answers to these questions in a case study of refactoring large-scale proprietary software systems. We analyzed the source code of 5 systems, and measured the quality of several revisions for a period of time. We analyzed 2 million lines of code and identified nearly 200 refactoring commits which fixed over 500 coding issues. We found that one single refactoring only makes a small change (sometimes even decreases quality), but when we do them in blocks, we can significantly increase quality, which can result not only in the local, but also in the global improvement of the code.
TL;DR: SENSA is presented, a novel dynamic-analysis technique and tool that combines sensitivity analysis and execution differencing to estimate the dependencies among statements that occur in practice and predicts the actual impacts of changes to those statements more accurately than static and dynamic forward slicing.
Abstract: Sensitivity analysis determines how a system responds to stimuli variations, which can benefit important software-engineering tasks such as change-impact analysis. We present SENSA, a novel dynamic-analysis technique and tool that combines sensitivity analysis and execution differencing to estimate the dependencies among statements that occur in practice. In addition to identifying dependencies, SENSA quantifies them to estimate how much or how likely a statement depends on another. Quantifying dependencies helps developers prioritize and focus their inspection of code relationships. To assess the benefits of quantifying dependencies with SENSA, we applied it to various statements across Java subjects to find and prioritize the potential impacts of changing those statements. We found that SENSA predicts the actual impacts of changes to those statements more accurately than static and dynamic forward slicing. Our SENSA prototype tool is freely available for download.
TL;DR: The Flowgen framework, which generates flowcharts from annotated C++ source code, is presented and a proof-of-concept application to the VINCIA plug-in for simulating collisions at CERN's Large Hadron Collider is described.
Abstract: We present the Flowgen framework, which generates flowcharts from annotated C++ source code. It generates a set of interconnected high-level UML activity diagrams, one for each function or method in the C++ sources. It provides a simple and visual overview of complex implementations of numerical algorithms. Flowgen is complementary to the widely-used Doxygen documentation tool. The ultimate aim is to render complex C++ codes accessible, and to enhance collaboration between programmers and algorithm or science specialists. We describe the tool and a proof-of-concept application to the VINCIA plug-in for simulating collisions at CERN's Large Hadron Collider.
TL;DR: The purpose and benefits of the SoDA library, the associated toolset, which also includes a graphical user interface, as well as possible usage scenarios are demonstrated.
Abstract: Code coverage is often used in academic and industrial practice of white-box software testing. Various test optimization methods, e.g. Test selection and prioritization, rely on code coverage information, but other related fields benefit from it as well, such as fault localization. These methods require access to the fine details of coverage information and efficient ways of processing this data. The purpose of the (free) SoDA library and toolset is to provide an efficient set of data structures and algorithms which can be used to prepare, store and analyze in various ways data related to code coverage. The focus of SoDA is not on the calculation of coverage data (such as instrumentation and test execution) but on the analysis and manipulation of test suites based on such information. An important design goal of the library was to be usable on industrial-size programs and test suites. Furthermore, there is no limitation on programming language, analysis granularity and coverage criteria. In this paper, we demonstrate the purpose and benefits of the library, the associated toolset, which also includes a graphical user interface, as well as possible usage scenarios. SoDA also includes a repository of prepared programs, which are from small to large sizes and can be used for experimentation and as a benchmark for code coverage related research.
TL;DR: It is shown that it is possible to automatically extract explanations for co-changes, that the quality of such explanations improves when structural and semantic properties are taken into account, and when the methods analyzed co-change recurrently.
Abstract: By analyzing historical information from Source Code Management systems, previous research has observed that certain methods tend to change together consistently. Co-change has been identified as a good predictor of the entities that are likely to be affected by a change, which ones might be missing modifications, and which ones might change in the future. However, existing co-change analysis provides no insight on why methods consistently co-change. Being able to identify the rationale that explains co-changes could allow to document and enforce design knowledge. This paper proposes an automatic approach to derive the reason behind a co-change. We define the reason of a (set) of co-changes as a set of properties common to the elements that co-change. We consider two kinds of properties: structural properties which indicate explicit dependencies, and semantic properties which reveal implicit dependencies. Then we attempt to identify the reasons behind single commits, as well as the reasons behind co-changes that repeatedly affect the same set of methods. These sets of methods are identified by clustering methods that tend to be modified in the same commit-transactions. We perform our analysis over the history of two open-source systems, analyzing nearly 19.000 methods and over 3700 commits. We show that it is possible to automatically extract explanations for co-changes, that the quality of such explanations improves when structural and semantic properties are taken into account, and when the methods analyzed co-change recurrently.
TL;DR: Predictive models for re-opened bugs are constructed using historical information about supplementary bug fixes with a precision between 72.2% and 97%, as well as a recall between 47.7% and 65.3%.
Abstract: A typical bug fixing cycle involves the reporting of a bug, the triaging of the report, the production and verification of a fix, and the closing of the bug. However, previous work has studied two phenomena where more than one fix are associated with the same bug report. The first one is the case where developers re-open a previously fixed bug in the bug repository (sometimes even multiple times) to provide a new bug fix that replace a previous fix, whereas the second one is the case where multiple commits in the version control system contribute to the same bug report ("supplementary bug fixes"). Even though both phenomena seem related, they have never been studied together, i.e., are supplementary fixes a subset of re-opened bugs or the other way around? This paper investigates the interplay between both phenomena in five open source software projects: Mozilla, Net beans, Eclipse JDT Core, Eclipse Platform SWT, and Web Kit. We found that re-opened bugs account for between 21.6% and 33.8% of all supplementary fixes. However, 33% to 57.5% of re-opened bugs had only one commit associated, which means that the original bug report was prematurely closed instead of fixed incorrectly. Furthermore, we constructed predictive models for re-opened bugs using historical information about supplementary bug fixes with a precision between 72.2% and 97%, as well as a recall between 47.7% and 65.3%. Software researchers and practitioners who are mining data repositories can use our approach to identify potential failures of their bug fixes and the re-opening of bug reports.