TL;DR: The goal of this work is to define and evaluate a set of metrics that can be used to assess the testability of the classes of a Java system.
Abstract: We investigate factors of the testability of object-oriented software systems. The starting point is given by a study of the literature to obtain both an initial model of testability and existing OO metrics related to testability. Subsequently, these metrics are evaluated by means of two case studies of large Java systems for which JUnit test cases exist. The goal of This work is to define and evaluate a set of metrics that can be used to assess the testability of the classes of a Java system.
TL;DR: A new approach for the detection of clones in source code, which is inspired by the concept of frequent itemsets from data mining, is described, represented as an abstract syntax tree in XML.
Abstract: In this paper we describe a new approach for the detection of clones in source code, which is inspired by the concept of frequent itemsets from data mining. The source code is represented as an abstract syntax tree in XML. Currently, such XML representations exist for instance for Java, C++, or PROLOG. Our approach is very flexible; it can be configured easily to work with multiple programming languages
TL;DR: A new approach for the detection of clones in source code, which is inspired by the concept of frequent itemsets from data mining, is described, represented as an abstract syntax tree in XML.
Abstract: In this paper we describe a new approach for the detection of clones in source code, which is inspired by the concept of frequent itemsets from data mining. The source code is represented as an abstract syntax tree in XML. Currently, such XML representations exist for instance for Java, C++, or PROLOG. Our approach is very flexible; it can be configured easily to work with multiple programming languages
TL;DR: This work reports upon an initial experiment using the technique of formal concept analysis for mining aspectual views from the source code, following a lightweight approach, where the names of classes and methods are considered.
Abstract: We report upon an initial experiment using the technique of formal concept analysis for mining aspectual views from the source code. An aspectual view is a set of source code entities, such as class hierarchies, classes and methods that are structurally related in some way, and often crosscut a particular application. Initially, we follow a lightweight approach, where we only consider the names of classes and methods. This simplistic technique already results in the discovery of interesting and meaningful aspectual views, leaving us confident that more complex approaches will perform even better, and merit to be studied in the future.
TL;DR: This work shows how search-based meta-heuristic algorithms can be used to automate, or partly automate the problem of finding good transformation sequences, and indicates that the genetic algorithm performs significantly better than either hill climbing or random search.
Abstract: Program transformation is useful in a number of applications including program comprehension, reverse engineering and compiler optimization. In all these applications, transformation algorithms are constructed by hand for each different transformation goal. Loosely speaking, a transformation algorithm defines a sequence of transformation steps to apply to a given program. It is notoriously hard to find good transformation sequences automatically, and so much (costly) human intervention is required. This work shows how search-based meta-heuristic algorithms can be used to automate, or partly automate the problem of finding good transformation sequences. In this case, the goal of transformation is to reduce program size, but the approach is sufficiently general that it can be used to optimize any source-code level metric. The search techniques used are random search (RS), hill climbing (HC) and genetic algorithms (GA). The paper reports the result of initial experiments on small synthetic program transformation problems. The results are encouraging. They indicate that the genetic algorithm performs significantly better than either hill climbing or random search.
TL;DR: This work proposes a distribution framework based on AOP and it describes the steps necessary to migrate an existing program to it and all required aspects are generated automatically.
Abstract: Aspect oriented programming (AOP) is a new programming paradigm that offers a novel modularization unit for the crosscutting concerns. Functionalities originally spread across several modules and tangled with each other can be factored out into a single, separate unit, called an aspect. The source code fragments introduced to port an existing application to a distributed environment (such as Java RAH) are typically scattered and tangled, thus representing an ideal candidate for the usage of aspects. In This work, we propose a distribution framework based on AOP and we describe the steps necessary to migrate an existing program to it. In our solution, the original application remains oblivious of the distribution concern and all required aspects are generated automatically. The approach was validated on a case study.
TL;DR: This work discusses the relationship between 'classical' source code (usually written in a programming language) and these other files by analyzing a publicly-available software versioning repository by studying the nature of the software repository, the different mixtures of source code found in several software projects stored in it.
Abstract: The concept of source code, understood as the source components used to obtain a binary, ready to execute version of a program, comprises currently more than source code written in a programming language. Specially when we move apart from systems-programming and enter the realm of end-user applications, we find source files with documentation, interface specifications, internationalization and localization modules, multimedia files, etc. All of them are source code in the sense that the developer works directly with them, and the application is built automatically using them as input. This work discusses the relationship between 'classical' source code (usually written in a programming language) and these other files by analyzing a publicly-available software versioning repository. Aspects that have been studied include the nature of the software repository, the different mixtures of source code found in several software projects stored in it, the specialization of developers to the different tasks, etc.
TL;DR: A tool platform that manages commonly used, fined-grained, information about Java source code by using an XML representation that exposes information resulting from global semantic analysis, which is never provided by the typical AST.
Abstract: Information about calls to the operating system (or kernel libraries) made by a binary executable may be used to determine whether the binary is malicious. Being aware of this approach, malicious programmers hide this information by making such calls without using the call instruction. For instance, the `call addr' instruction may be replaced by two push instructions and a return instruction, the first push pushes the address of the instruction after the return instruction, and the second push pushes the address addr. The code may be further obfuscated by spreading the three instructions and by splitting each instruction into multiple instructions. This paper presents a method to statically detect obfuscated calls in binary code. The notion of abstract stack is introduced to associate each element in the stack to the instruction that pushes the element. An abstract stack graph is a concise representation of all abstract stacks at every point in the program. An abstract stack graph, created by abstract interpretation of the binary executables, may be used to detect obfuscated calls and other stack related obfuscations
TL;DR: The empirical observations show that the heuristic advice provided by the approach can help software designers make better decision of why and how to restructure a program.
Abstract: Program restructuring is a key method for improving the quality of ill-structured programs, thereby increasing the understandability and reducing the maintenance cost. It is a challenging task and a great deal of research is still ongoing. This work presents an approach to program restructuring at the function level, based on clustering techniques with cohesion as the major concern. Clustering has been widely used to group related entities together. The approach focuses on automated support for identifying ill-structured or low-cohesive functions and providing heuristic advice in both the development and evolution phases. A new similarity measure is defined and studied intensively. The approach is applied to restructure a real industrial program. The empirical observations show that the heuristic advice provided by the approach can help software designers make better decision of why and how to restructure a program. Specific source code level software metrics are presented to demonstrate the value of the approach.
TL;DR: The projection framework is used to provide a declarative formulation in terms of the different equivalences preserved by the different forms of slicing to formalize the definition of executable dynamic and forward program slicing.
Abstract: This paper uses a projection theory of slicing to formalize the definition of executable dynamic and forward program slicing. Previous definitions, when given, have been operational, and previous descriptions have been algorithmic. The projection framework is used to provide a declarative formulation in terms of the different equivalences preserved by the different forms of slicing. The analysis of dynamic slicing reveals that the slicing criterion introduced by Korel and Laski contains three inter-woven criteria. It is shown how these three conceptually distinct criteria can be disentangled to reveal two new criteria. The analysis of dynamic slicing also reveals that the subsumes relationship between static and dynamic slicing is more intricate that previous authors have claimed. Finally, the paper uses the projection theory to investigate theoretical properties of forward slicing. This is achieved by first re-formulating forward slicing to provide an executable forward slice. This definition allows for formal investigation of the relationship between forward and backward slicing
TL;DR: In this article, the authors propose a tool platform that manages commonly used, fined-grained, information about Java source code by using an XML representation, which is suitable for developing tools which browse and manipulate actual source code.
Abstract: Recent IDEs have become more extensible tool platforms but do not concern themselves with how other tools running on them collaborate with each other. They compel developers to use proprietary representations or the classical abstract syntax tree (AST) to build source code tools. Although these representations contain sufficient information, they are neither portable nor extensible. This paper proposes a tool platform that manages commonly used, fined-grained, information about Java source code by using an XML representation. Our representation is suitable for developing tools which browse and manipulate actual source code since the original code is annotated with tags based on its structure and retained within the tags. Additionally, it exposes information resulting from global semantic analysis, which is never provided by the typical AST. Our proposed platform allows the developers to extend the representation for the purpose of sharing or exchanging various kinds of information about the source code, and also enables them to build new tools by using existing XML utilities
TL;DR: An approach to the reversal of the control flow of structured programs used to automatically generate adjoint code for numerical programs by semantic source transformation is described and the algorithmic steps for simple branches and loops are shown.
Abstract: We describe an approach to the reversal of the control flow of structured programs. It is used to automatically generate adjoint code for numerical programs by semantic source transformation. After a short introduction to applications and the implementation tool set, we describe the building blocks using a simple example. We then illustrate the code reversal within basic blocks. The main part of the paper covers the reversal of structured control flow graphs. We show the algorithmic steps for simple branches and loops and give a detailed algorithm for the reversal of arbitrary combinations of loops and branches in a general control flow graph
TL;DR: This work shows that by allowing transformation of the program's syntax it is possible to extract both procedures and functions in an amorphous manner, and that although theAmorphous extraction process is meaning preserving it is not necessarily syntax preserving.
Abstract: The procedure extraction problem is concerned with the meaning preserving formation of a procedure from a (not necessarily contiguous) selected set of statements. Previous approaches to the problem have used dependence analysis to identify the non-selected statements which must be 'promoted' (also selected) in order to preserve semantics. All previous approaches to the problem have been syntax preserving. This work shows that by allowing transformation of the program's syntax it is possible to extract both procedures and functions in an amorphous manner. That is, although the amorphous extraction process is meaning preserving it is not necessarily syntax preserving. The amorphous approach is advantageous in a variety of situations. These include when it is desirable to avoid promotion, when a value-returning function is to be extracted from a scattered set of assignments to a variable, and when side effects are present in the program from which the procedure is to be extracted.
TL;DR: The experiment shows that there is no significant increase in precision of the restricted form of slicing compared to the unrestricted traditional slicing, and a large part of an average slice is due to called procedures.
Abstract: Whether context-sensitive program analysis is more effective than context-insensitive analysis is an ongoing discussion. There is evidence that context-sensitivity matters in complex analyses like pointer analysis or program slicing. One might think that the context itself matters, because empirical data shows that context-sensitive program slicing is more precise and under some circumstances even faster than context-insensitive program slicing. Based on some experiments, we will show that this is not the case. The experiment requires backward slices to return to call sites specified by an abstract call stack. Such call stacks can be seen as a poor man's dynamic slicing: for a concrete execution, the call stack is captured, and static slices are restricted to the captured stack. The experiment shows that there is no significant increase in precision of the restricted form of slicing compared to the unrestricted traditional slicing. The reason is that a large part of an average slice is due to called procedures
TL;DR: The Framework Constraint Language (FCL) is a tool for detecting errors in framework usage and can be used to encode design rules such as the Law of Demeter and programming guidelines.
Abstract: The Framework Constraint Language (FCL) is a tool for detecting errors in framework usage. FCL is used to specify the syntactic constraints that frameworks impose on the code of framework-based applications. Violations of these constraints are then detected through static analysis. FCL can also be used to encode design rules such as the Law of Demeter and programming guidelines. This paper introduces FCL and demonstrates its utility in these areas. The version of the FCL language and associated checker described here is targeted at C++
TL;DR: This work takes a "step backward'' and shows how to generate exactly the interprocedural slicing criteria needed, using the program's call graph or a stack, and shows that under certain circumstances these criteria are equal to those generated by the call-graph/stack technique.
Abstract: Weiser's algorithm for computing interprocedural slices has a serious drawback: it generates spurious criteria which are not feasible in the control flow of the program. When these extraneous criteria are used the slice becomes imprecise in that it has statements that are not relevant to the computation. Horwitz, Reps and Binkley solved this problem by devising the system dependence graph with an associated algorithm that produced more precise interprocedural slices. We take a "step backward'' and show how to generate exactly the interprocedural slicing criteria needed, using the program's call graph or a stack. This technique can also be used on a family of program dependence graphs that represent all procedures in a program and are not interconnected by a system dependence graph. Then we show how to use the Horwitz, Reps and Binkley interprocedural slicing algorithm to generate criteria and show that the criteria so generated are equal to those generated by the call-graph/stack technique. Thus we present alternative, equivalent ways to generate precise slicing criteria across procedure boundaries. And finally we show that under certain circumstances. Weiser's technique for slicing across procedures is a bit "too strong," for it always generates sufficient criteria to obtain the entire program as a slice on any criteria.
TL;DR: This work presents the de-pipelining algorithm with a formal description, proof, and a set of working examples, and experiments with loops taken from some practical DSP programs are conducted on popular VLIW digital signal processors to verify the algorithm.
Abstract: Software pipelining is an optimization technique used to speed up loop execution. It is widely implemented in optimizing compilers for VLIW and superscalar processors that support instruction level parallelism. Software de-pipelining is the reverse of software pipelining; it restores the assembly code of a software-pipelined loop back to its semantically equivalent sequential form. Due to the non-sequential nature of the often optimized assembly code, it is very difficult to gain insight into the meaning of the code. Consequently, the task of de-pipelining the code of a software-pipelined loop is very complex and challenging. We present in This work our de-pipelining algorithm with a formal description, proof, and a set of working examples. Experiments with loops taken from some practical DSP programs are conducted on popular VLIW digital signal processors to verify the algorithm. Some applications of software de-pipelining are discussed.
TL;DR: This talk shows how, to accommodate the user base of spreadsheet languages, an interface to these methodologies can be provided in a manner that does not require an understanding of the theory behind the analyses, yet supports the interactive, incremental process by which spreadsheets are created.
Abstract: Summary form only given. Not long ago, most software was written by professional programmers, who could be presumed to have an interest in software engineering methodologies and in tools and techniques for improving software dependability. Today, however, a great deal of software is written not by professionals but by end-users, who create applications such as multimedia simulations, dynamic Web pages, and spreadsheets. Applications such as these are often used to guide important decisions or aid in important tasks, and it is important that they be sufficiently dependable, but evidence shows that they frequently are not. For example, studies have shown that a large percentage of the spreadsheets created by end-users contain faults. Despite such evidence, until recently, relatively little research had been done to help end-users create more dependable software. We have been working to address this problem by finding ways to provide at least some of the benefits of formal software engineering techniques to end-user programmers. In this talk, focusing on the spreadsheet application paradigm, I present several of our approaches, focusing on methodologies that utilize source-code-analysis techniques to help end-users build more dependable spreadsheets. Behind the scenes, our methodologies use static analyses such as dataflow analysis and slicing, together with dynamic analyses such as execution monitoring, to support user tasks such as validation and fault localization. I show how, to accommodate the user base of spreadsheet languages, an interface to these methodologies can be provided in a manner that does not require an understanding of the theory behind the analyses, yet supports the interactive, incremental process by which spreadsheets are created. Finally, I present empirical results gathered in the use of our methodologies that highlight several costs and benefits tradeoffs, and many opportunities for future work