TL;DR: An empirical study carried out on three Java software systems, namely Apache Ant, Xerces, and Ar-go UML, aimed at investigating to what extent refactoring activities induce faults, indicates that, while some kinds of refactorings are unlikely to be harmful, others, such as refactoring involving hierarchies, tend to induce faults very frequently.
Abstract: Refactorings are - as defined by Fowler - behavior preserving source code transformations. Their main purpose is to improve maintainability or comprehensibility, or also reduce the code footprint if needed. In principle, refactorings are defined as simple operations so that are "unlikely to go wrong" and introduce faults. In practice, refactoring activities could have their risks, as other changes. This paper reports an empirical study carried out on three Java software systems, namely Apache Ant, Xerces, and Ar-go UML, aimed at investigating to what extent refactoring activities induce faults. Specifically, we automatically detect (and then manually validate) 15,008 refactoring operations (of 52 different kinds) using an existing tool (Ref-Finder). Then, we use the SZZ algorithm to determine whether it is likely that refactorings induced a fault. Results indicate that, while some kinds of refactorings are unlikely to be harmful, others, such as refactorings involving hierarchies (e.g., pull up method), tend to induce faults very frequently. This suggests more accurate code inspection or testing activities when such specific refactorings are performed.
TL;DR: This work reports the experiences building source-code analysis tools at Google on top of a third-party, open-source, extensible compiler, and describes three tools in use on the Google Java code base.
Abstract: Large software companies need customized tools to manage their source code. These tools are often built in an ad-hoc fashion, using brittle technologies such as regular expressions and home-grown parsers. Changes in the language cause the tools to break. More importantly, these ad-hoc tools often do not support uncommon-but-valid code code patterns. We report our experiences building source-code analysis tools at Google on top of a third-party, open-source, extensible compiler. We describe three tools in use on our Java code base. The first, Strict Java Dependencies, enforces our dependency policy in order to reduce JAR file sizes and testing load. The second, error-prone, adds new error checks to the compilation process and automates repair of those errors at a whole-code base scale. The third, Thindex, reduces the indexing burden for a Java IDE so that it can support Google-sized projects.
TL;DR: The extent to which stemming improves the retrieval performance relates to the degree of natural language content in a query, which is mediated by other factors, such as the use of tf-idf to filter commonly occurring terms and the precise nature of the queries.
Abstract: As the popularity of text-based source code search and analysis grows, the use of stemmers to strip suffixes has increased Although widely investigated in the information retrieval community, the comparative effectiveness of stemmers in the domain of software is relatively unknown In this paper, we investigate which of the well-known stemmers perform best in the domain of Java software for concern location and bug localization For these two problems, we evaluate the use of stemming on over 500 search tasks for six different Java applications Using MAP and Rank Measure, we conducted an overall qualitative study and a query-by-query quantitative study of the impact of stemming on retrieval effectiveness As one might expect, our contribution demonstrates that how stemming affects retrieval performance is mediated by other factors, such as the use of tf-idf to filter commonly occurring terms and the precise nature of the queries Specifically, we find that the extent to which stemming improves the retrieval performance relates to the degree of natural language content in a query
TL;DR: This paper advocates a generic approach to understanding, analyzing and refactoring cross-language code by explicitly specifying and exploiting semantic links with the aim of giving developers the same amount of control over and confidence in multi-language programs they have for single- language code today.
Abstract: Software composed of artifacts written in multiple (programming) languages is pervasive in today's enterprise, desktop, and mobile applications. Since they form one system, artifacts from different languages reference one another, thus creating what we call semantic cross-language links. By their very nature, such links are out of scope of the individual programming language, they are ignored by most language-specific tools and are often only established -- and checked for errors -- at runtime. This is unfortunate since it requires additional testing, leads to brittle code, and lessens maintainability. In this paper, we advocate a generic approach to understanding, analyzing and refactoring cross-language code by explicitly specifying and exploiting semantic links with the aim of giving developers the same amount of control over and confidence in multi-language programs they have for single-language code today.
TL;DR: This work examines eight well-known open source Java systems by grouping the CFPs of the methods into equivalence classes, and exploring the results suggests that CC and similar measures need to be reconsidered as metrics for code understandability.
Abstract: Assessing the understandability of source code remains an elusive yet highly desirable goal for software developers and their managers. While many metrics have been suggested and investigated empirically, the McCabe cyclomatic complexity metric (CC) -- which is based on control flow complexity -- seems to hold enduring fascination within both industry and the research community despite its known limitations. In this work, we introduce the ideas of Control Flow Patterns (CFPs) and Compressed Control Flow Patterns (CCFPs), which eliminate some repetitive structure from control flow graphs in order to emphasize high-entropy graphs. We examine eight well-known open source Java systems by grouping the CFPs of the methods into equivalence classes, and exploring the results. We observed several surprising outcomes: first, the number of unique CFPs is relatively low, second, CC often does not accurately re-flect the intricacies of Java control flow, and third, methods with high CC often have very low entropy, suggesting that they may be relatively easy to understand. These findings challenge the widely-held belief that there is a clear-cut causal relationship between CC and understandability, and suggest that CC and similar measures need to be reconsidered as metrics for code understandability.
TL;DR: A new detection method that is free from the influence of the presence of repeated instructions, and then it detects code clones using a suffix array algorithm that prevents many false positives from being detected.
Abstract: A variety of code clone detection methods have been proposed before now. However, only a small part of them is widely used. Widely-used methods are line-based and token-based ones. They have high scalability because they neither require deep source code analysis nor constructing complex intermediate structures for the detection. High scalability is one of the big advantages in code clone detection tools. On the other hand, line/token-based detections yield many false positives. One of the factors is the presence of repeated instructions in the source code. For example, herein we assume that there are consecutive three printf statements in C source code. If we apply a token-based detection to them, the former two statements are detected as a code clone of the latter two statements. However, such overlapped code clones are redundant and so not useful for developers. In this paper, we propose a new detection method that is free from the influence of the presence of repeated instructions. The proposed method transforms every of repeated instructions into a special form, and then it detects code clones using a suffix array algorithm. The transformation prevents many false positives from being detected. Also, the detection speed remains. The proposed detection method has already been developed as a software tool, FRISC. We confirmed the usefulness of the proposed method by conducting a quantitative evaluation of FRISC with Bellon's oracle.
TL;DR: An approach, LIBCROOS, that combines the results of any IR technique with BCRs gathered through source code analyses and improves the rankings of both IR technique statistically when compared to LSI and VSM alone is presented and may reduce the developers' effort.
Abstract: Bug location assists developers in locating culprit source code that must be modified to fix a bug. Done manually, it requires intensive search activities with unpredictable costs of effort and time. Information retrieval (IR) techniques have been proven useful to speedup bug location in object-oriented programs. IR techniques compute the textual similarities between a bug report and the source code to provide a list of potential culprit classes to developers. They rank the list of classes in descending order of the likelihood of the classes to be related to the bug report. However, due to the low textual similarity between source code and bug reports, IR techniques may put a culprit class at the end of a ranked list, which forces developers to manually verify all non-culprit classes before finding the actual culprit class. Thus, even with IR techniques, developers are not saved from manual effort. In this paper, we conjecture that binary class relationships (BCRs) could improve the rankings by IR techniques of classes and, thus, help reducing developers' manual effort. We present an approach, LIBCROOS, that combines the results of any IR technique with BCRs gathered through source code analyses. We perform an empirical study on four programs -- Jabref, Lucene, muCommander, and Rhino -- to compare the accuracy, in terms of ranking, of LIBCROOS with two IR techniques: latent semantic indexing (LSI) and vector space model (VSM). The results of this empirical study show that LIBCROOS improves the rankings of both IR technique statistically when compared to LSI and VSM alone and, thus, may reduce the developers' effort.
TL;DR: An innovative toolset is demonstrated that provides the software developers with profile data and directs them to possible top-level, pipeline-style parallelization opportunities for an arbitrary sequential C program, complementary to the methods based on static code analysis and automatic code rewriting.
Abstract: Writing parallel code is traditionally considered a difficult task, even when it is tackled from the beginning of a project. In this paper, we demonstrate an innovative toolset that faces this challenge directly. It provides the software developers with profile data and directs them to possible top-level, pipeline-style parallelization opportunities for an arbitrary sequential C program. This approach is complementary to the methods based on static code analysis and automatic code rewriting and does not impose restrictions on the structure of the sequential code or the parallelization style, even though it is mostly aimed at coarse-grained task-level parallelization. The proposed toolset has been utilized to define parallel code organizations for a number of real-world representative applications and is based on and is provided as free source.
TL;DR: This paper presents recent example advances on cooperative testing and analysis, highlighting the need for effective support for cooperation between engineers and tools in state-of-the-art research and practice.
Abstract: Tool automation to reduce manual effort has been an active research area in various sub fields of software engineering such as software testing and analysis. To maximize the value of software testing and analysis, effective support for cooperation between engineers and tools is greatly needed and yet lacking in state-of-the-art research and practice. In particular, testing and analysis are in a great need of (1) effective ways for engineers to communicate their testing or analysis goals and guidance to tools and (2) tools with strong enough capabilities to accomplish the given testing or analysis goals and with effective ways to communicate challenges faced by them to engineers -- enabling a feedback loop between engineers and tools to refine and accomplish the testing or analysis goals. In addition, different tools have their respective strengths and weaknesses, and there is also a great need of allowing these tools to cooperate with each other. Similarly, there is a great need of allowing engineers (or even users) to cooperate to help tools such as in the form of crowd sourcing. A new research frontier on synergistic co operations between humans and tools, tools and tools, and humans and humans is yet to be explored. This paper presents recent example advances on cooperative testing and analysis.
TL;DR: It is concluded that near-miss clones require more attention regarding clone management techniques compared to identical clones.
Abstract: It is often claimed that duplicated source code fragments increase the maintenance effort in software systems. To investigate the impact of so called clones it is useful to analyze how they evolve. A previous study analyzed several aspects of the evolution of identical clones in nine open source systems and has found that the peculiarity of clone evolution is significantly different for each system, which makes a general conclusion difficult. In this paper we investigate in which ways the evolution of near-miss clones differs from the evolution of identical clones. By analyzing seven open source systems we draw comparisons between identical and near-miss clones. Based on the findings we conclude that near-miss clones require more attention regarding clone management techniques compared to identical clones.
TL;DR: This work adapts some well-known notions from partial evaluation in order to have a complete symbolic execution scheme which can then be used to check liveness properties like program termination.
Abstract: Symbolic execution, originally introduced as a method for program testing and debugging, is usually incomplete because of infinite symbolic execution paths. In this work, we adapt some well-known notions from partial evaluation in order to have a complete symbolic execution scheme which can then be used to check liveness properties like program termination. We also introduce a representation of the symbolic transitions as a term rewrite system so that existing termination provers for these systems can be used to verify the termination of the original program.
TL;DR: A hybrid approach to detecting software dependencies by combining conceptual and domain-based coupling metrics is proposed, which is able to detect database and source code dependencies with higher precision and recall as compared to its standalone constituents.
Abstract: Knowledge of software dependencies plays an important role in program comprehension and other maintenance activities Traditionally, dependencies are derived by source code analysis, however, such an approach can be difficult to use in multi-tier hybrid software systems, or legacy applications where conventional code analysis tools simply do not work as is In this paper, we propose a hybrid approach to detecting software dependencies by combining conceptual and domain-based coupling metrics In recent years, a great deal of research focused on deriving various coupling metrics from these sources of information with the aim of assisting software maintainers Conceptual metrics specify underlying relationships encoded by developers in identifiers and comments of source code classes whereas domain metrics exploit coupling manifested in domain-level information of software components and it is independent from software implementation The proposed approach is independent from programming language, as such it can be used in multi-tier hybrid systems or legacy applications We report the results of an empirical case study on a large-scale enterprise system where we demonstrate that the combined approach is able to detect database and source code dependencies with higher precision and recall as compared to its standalone constituents
TL;DR: The design and implementation of an interactive environment for discovering and browsing information flow in SPARK programs and utilizes classic slicing and chopping techniques to assist developers in writing information flow contracts are described.
Abstract: This tool paper describes the design and implementation of an interactive environment for discovering and browsing information flow in SPARK programs. SPARK is a subset of Ada that has been used in a number of industrial contexts for implementing certified safety and security critical systems. SPARK requires explicit specification of information flow properties in the form of procedure contracts. To write such contracts, developers need to understand the data and control dependencies in the program. Our tool Bakar Alir, implemented as an Eclipse Plug-in, utilizes classic slicing and chopping techniques to assist developers in writing information flow contracts.
TL;DR: Input Tracer is presented, a tool that utilizes DTA for aiding in manual program comprehension and analysis of unmodified x86 executables running in Linux and its ability to provide exact information on the origin of tainted data through a detailed use case, where the tool is used to find the root cause of a memory corruption bug.
Abstract: Third-party security analysis of closed-source programs has become an important part of a defense-in-depth approach to software security for many companies. In the absence of efficient tools, the analysis has generally been performed through manual reverse engineering of the machine code. As reverse engineering is an extremely time-consuming and costly task, much research has been performed to develop more powerful methods for analysis of program binaries. One such popular method is dynamic taint analysis (DTA), which is a type of runtime data-flow analysis, where certain input data is marked as tainted. By tracking the flow of tainted data, DTA can, for instance, be used to determine which computations in a program are affected by a certain part of the input. In this paper we present Input Tracer, a tool that utilizes DTA for aiding in manual program comprehension and analysis of unmodified x86 executables running in Linux. A brief overview of dynamic taint analysis is given, followed by a description of the tool and its implementation. We also demonstrate the tool's ability to provide exact information on the origin of tainted data through a detailed use case, where the tool is used to find the root cause of a memory corruption bug.
TL;DR: This paper investigates the propagation of security checks in 8 PHP applications that implement access control models and shows how, using the Data log language, one can implement conceptually complex data-flow algorithms in an incremental, intuitive and compact manner.
Abstract: In this paper, we present novel algorithms for the propagation of pattern-based properties in PHP applications. Intuitively, pattern-based properties designate those properties that are intrinsically associated to syntactic patterns in the source code. Security checks in access control models are an example of pattern-based properties. At the source code level, permissions are typically verified with stereotyped constructs, called security checks, that can be detected with syntactic patterns. Depending on the program, pattern-based properties can be a liased to variables that are propagated through the application. In that context, support from data-flow approaches is needed to track the propagation of patterns through the application. In the context of this paper, we focus on the alias-aware propagation of security checks through PHP applications. Specifically, we investigated the propagation of security checks in 8 PHP applications that implement access control models. We show how, using the Data log language, one can implement conceptually complex data-flow algorithms in an incremental, intuitive and compact manner. From the results perspective, we show how our algorithm identifies security checks and security check a liased variables in a precise way. The reported false positive rate varies between 0% and 4% for the investigated applications.
TL;DR: A program transformation technique is described that makes collaborative worm defense systems easy to build, predictable and fast-responsive, and software vendors and users can test, in advance, that the defense system will very unlikely apply a mitigation that breaks their software.
Abstract: This paper explores how much the source code analysis can assist worm defense system. Previously-proposed worm defense systems have used disparate mechanisms to detect worms, analyze exploits, verify alerts, and apply mitigations. Furthermore, previous systems have not offered predictability, i.e. it is not possible to verify, in advance, that the defense system will never generate a mitigation that breaks the program. This paper describes a program transformation technique that makes collaborative worm defense systems easy to build, predictable and fast-responsive. Our transformation provides a single building block that can be used to perform worm detection, exploit analysis, alert verification, and mitigation application. In fact, our transformation makes most of these tasks trivial. Furthermore, software vendors and users can test, in advance, that the defense system will very unlikely apply a mitigation that breaks their software. Mitigations are vulnerability-specific not exploit-specific. Finally, our system can respond extremely quickly to a new worm. The exploit analysis becomes trivial so sentinel hosts can issue an alert the instant they detect a worm. We have implemented a prototype of our system based on the Jones and Kelly program transformation for memory safety. During normal operation, our system incurs only 5% overhead. We take advantage of static analysis to develop several optimizations and make the Jones and Kelly approach to memory safety efficient and practical.
TL;DR: It is shown that a large number of real defects can be captured by impact sets computed by Static Execute After, albeit many of them are large.
Abstract: Impact analysis based on code dependence can be an integral part of software quality assurance by providing opportunities to identify those parts of the software system that are affected by a change. Because changes usually have far reaching effects in programs, effective and efficient impact analysis is vital, which has different applications including change propagation and regression testing. Static Execute After (SEA) is a relation on program elements (procedures) that is efficiently computable and accurate enough to be a candidate for use in impact analysis in practice. To assess the applicability of SEA in terms of capturing real defects, we present results on integrating it into the build system of Web Kit, a large, open source software system, and on related experiments. We show that a large number of real defects can be captured by impact sets computed by SEA, albeit many of them are large. We demonstrate that this is not an issue in applying it to regression test prioritization, but generally it can be an obstacle in the path to efficient use of impact analysis. We believe that the main reason for large impact sets is the formation of dependence clusters in code. As apparently dependence clusters cannot be easily avoided in the majority of cases, we focus on determining the effects these clusters have on impact analysis.
TL;DR: This work proposes source code analysis and program transformation to substantially automate the application of LUT transforms, and uses a novel optimization algorithm that selects Pareto optimal sets of expressions that benefit most from LUT transformation, based on error and performance estimates.
Abstract: Scientific programmers can speed up function evaluation by precomputing and storing function results in lookup table (LUTs), thereby replacing costly evaluation code with an inexpensive memory access. A code transform that replaces computation with LUT code can improve performance, however, accuracy is reduced because of error inherent in reconstructing values from LUT data. LUT transforms are commonly used to approximate expensive elementary functions. The current practice is for software developers to (1) manually identify expressions that can benefit from a LUT transform, (2) modify the code by hand to implement the LUT transform, and (3) run experiments to determine if the resulting error is within application requirements. This approach reduces productivity, obfuscates code, and limits programmer control over accuracy and performance. We propose source code analysis and program transformation to substantially automate the application of LUT transforms. Our approach uses a novel optimization algorithm that selects Pareto optimal sets of expressions that benefit most from LUT transformation, based on error and performance estimates. We demonstrate our methodology with the Mesa tool, which achieves speedups of 1.4-6.9x on scientific codes while managing introduced error. Our tool makes the programmer more productive and improves the chances of finding an effective solution.
TL;DR: This paper describes a functional dynamic Java byte code obfuscator based on the general ideas introduced by Aucsmith's algorithm that provides a very high level of security for the obfuscated code, but at the cost of an extreme performance overhead.
Abstract: This paper describes a functional dynamic Java byte code obfuscator based on the general ideas introduced by Aucsmith's algorithm. This tool provides a very high level of security for the obfuscated code due to the fact that the code that gets executed is not visible at all in the initial jar file, but at the cost of an extreme performance overhead. However, further improvements promise to drastically improve the performance of the obfuscated application.
TL;DR: The notion of minimal access modifiers is introduced, which is the most restrictive access modifier that allows all existing references to a type or method in the entire source code of a system.
Abstract: Access modifiers allow Java developers to define package and class interfaces tailored for different groups of clients. According to the principles of information hiding and encapsulation, the accessibility of types, methods, and fields should be as restrictive as possible. However, in programming practice, the potential of the given possibilities seems not always be fully exploited. Access Analysis is a plug-in for the Eclipse IDE that measures the usage of access modifiers for types and methods in Java. It calculates two metrics, Inappropriate Generosity with Accessibility of Types (IGAT) and Inappropriate Generosity with Accessibility of Methods (IGAM), which represent the degree of deviation between actual and necessary access modifiers. As an approximation for the necessary access modifier, we introduce the notion of minimal access modifiers. The minimal access modifier is the most restrictive access modifier that allows all existing references to a type or method in the entire source code of a system. Access Analysis determines minimal access modifiers by static source code analysis using the build-in Java DOM/AST API of Eclipse.
TL;DR: This paper implements a replacement for the Java Collections Framework with its benefits for points-to analysis by applying it to three different points- to analysis implementations and demonstrates the benefits in both precision and analysis cost.
Abstract: Points-to information is the basis for many analyses and transformations, e.g., for program understanding and optimization. Collections frameworks are part of most modern programming languages' infrastructures and used by many applications. The richness of features and the inherent structure of collection classes affect both performance and precision of points-to analysis negatively. In this paper, we discuss how to replace original collections frameworks with versions specialized for points-to analysis. We implement such a replacement for the Java Collections Framework and support its benefits for points-to analysis by applying it to three different points-to analysis implementations. In experiments, context-sensitive points-to analyses require, on average, 16-24\% less time while at the same time being more precise. Context-insensitive analysis in conjunction with in lining also benefits in both precision and analysis cost.
TL;DR: It is found that third-party plug-in that use only old non-APIs have a high chance of compatibility success in new SDK releases compared to those that use at least one newly introduced non-API.
Abstract: Incompatibility between applications developed on top of frameworks with new versions of the frameworks is a big nightmare to both developers and users of the applications. Understanding the factors that cause incompatibilities is a step to solving them. One such direction is to analyze and identify parts of the reusable code of the framework that are prone to change. In this study we carried out an empirical investigation on 11 Eclipse SDK releases (1.0 to 3.7) and 288 Eclipse third-party plug-ins (ETPs) with two main goals: First, to determine the relationship between the age of Eclipse non-APIs (internal implementations) used by an ETP and the compatibility of the ETP. We found that third-party plug-in that use only old non-APIs have a high chance of compatibility success in new SDK releases compared to those that use at least one newly introduced non-API. Second, our goal was to build and test a predictive model for the compatibility of an ETP, supported in a given SDK release in a newer SDK release. Our findings produced 23 statistically significant prediction models having high values of the strength of the relationship between the predictors and the prediction (logistic regression R2 of up to 0.810). In addition, the results from model testing indicate high values of up to 100% of precision and recall and up to 98% of accuracy of the predictions. Finally, despite the fact that SDK releases with API breaking changes, i.e., 1.0, 2.0 and 3.0, have got nothing to do with non-APIs, our findings reveal that non-APIs introduced in these releases have a significant impact on the compatibility of the ETPs that use them.