TL;DR: This work presents conditioned slicing as a general slicing framework for program comprehension and shows how slices produced with traditional slicing methods can be reduced to conditioned slices.
Abstract: We present conditioned slicing as a general slicing framework for program comprehension. A conditioned slice consists of a subset of program statements which preserves the behavior of the original program with respect to a set of program executions. The set of initial states of the program that characterize these executions is specified in terms of a first order logic formula on the input variables of the program. Conditioned slicing allows a better decomposition of the program giving the maintainer the possibility to analyze code fragments with respect to different perspectives. We also show how slices produced with traditional slicing methods can be reduced to conditioned slices. Conditioned slices can be identified by using symbolic execution techniques and dependence graphs.
TL;DR: To meet the software quality challenge, research must first meet some substantial challenges and improve the current tools, technologies, and their cost-benefit characterizations, but researchers must also take the lead in looking beyond post-development testing to study the tools and technologies for building inherent quality into software.
Abstract: The challenge of research in software quality is to provide tools and technology that will enable the software industry to deploy software products and services that are safe, dependable, and usable within an economic framework allowing companies to compete effectively. Meeting this challenge will have a significant effect on national economies, national security, international competitiveness, and societal well-being. Software quality concerns are quite broad, including, for example, correctness, robustness, readability, and evolvability. There is no single monolithic measure of software quality, and no general agreement about how to quantify definitively any of the key quality concerns. The actual levels of quality achieved in practice are dictated by quality assessment tools and technologies and willingness to pay for applying them. To meet the software quality challenge, research must first meet some substantial challenges. It is necessary to improve the current tools, technologies, and their cost-benefit characterizations. But researchers must also take the lead in looking beyond post-development testing to study the tools and technologies for building inherent quality into software. New analysis and design technologies will have to play a greater role in achieving and demonstrating the quality of software, increasingly reducing dependence upon after-the-fact testing techniques. While awaiting the results of this needed innovative research, practitioners could benefit significantly from the transfer of past and current research to practice. One major obstacle to transition seems to be a scarcity of experimental work that is sufficiently solid and well measured to justify the risks entailed in a transition to industrial practice. Past technology transfer failures are damaging to both the research and practitioner communities.
TL;DR: It is shown that a high testability is not an unconditionally desirable property for a program, and for programs complex enough that they are unlikely to be completely fault free, increasing testability may produce a program which will be less trustworthy, even after successful testing.
Abstract: Program "testability" is informally, the probability that a program will fail under test if it contains at least one fault. When a dependability assessment has to be derived from the observation of a series of failure free test executions (a common need for software subject to "ultra high reliability" requirements), measures of testability can-in theory-be used to draw inferences on program correctness. We rigorously investigate the concept of testability and its use in dependability assessment, criticizing, and improving on, previously published results. We give a general descriptive model of program execution and testing, on which the different measures of interest can be defined. We propose a more precise definition of program testability than that given by other authors, and discuss how to increase testing effectiveness without impairing program reliability in operation. We then study the mathematics of using testability to estimate, from test results: the probability of program correctness and the probability of failures. To derive the probability of program correctness, we use a Bayesian inference procedure and argue that this is more useful than deriving a classical "confidence level". We also show that a high testability is not an unconditionally desirable property for a program. In particular, for programs complex enough that they are unlikely to be completely fault free, increasing testability may produce a program which will be less trustworthy, even after successful testing.
TL;DR: The traditional software architecture for compilers is revised to provide these features without unnecessarily complicating the analyses themselves, and the user is allowed to selectively trade off time for precision and to customize the termination of these costly analyses in order to provide finer user control.
Abstract: Building efficient tools for understanding large software systems is difficult. Many existing program understanding tools build control flow and data flow representations of the program a priori, and therefore may require prohibitive space and time when analyzing large systems. Since much of these representations may be unused during an analysis, we construct representations on demand, not in advance. Furthermore, some representations, such as the abstract syntax tree, may be used infrequently during an analysis. We discard these representations and recompute them as needed, reducing the overall space required. Finally, we permit the user to selectively trade off time for precision and to customize the termination of these costly analyses in order to provide finer user control. We revised the traditional software architecture for compilers to provide these features without unnecessarily complicating the analyses themselves. These techniques have been successfully applied in the design of a program slicer for the Comprehensive Health Care System (CHCS), a million line hospital management system written in the MUMPS programming language.
TL;DR: The problem of slicing concurrent object-oriented programs that has not been addressed in the literatures until now is addressed and a new program dependence representation named the system dependence net (SDN) is proposed, which extends previous program dependence representations to represent concurrent objects.
Abstract: Program slicing has many applications such as program debugging, testing, maintenance, and complexity measurement. This paper concerns the problem of slicing concurrent object-oriented programs that has not been addressed in the literatures until now. To solve this problem, we propose a new program dependence representation named the system dependence net (SDN), which extends previous program dependence representations to represent concurrent object-oriented programs. An SDN of a concurrent object-oriented program consists of a collection of procedure dependence nets each representing a main procedure, a free standing procedure, or a method in a class of the program, and some additional arcs to represent direct dependences between a call and the called procedure/method and transitive interprocedural data dependences. We construct the SDN to represent not only object-oriented features but also concurrency issues in a concurrent object-oriented program. Once a concurrent object-oriented program is represented by its SDN, the slices of the program can be computed based on the SDN as a simple vertex reachability problem in the net.
TL;DR: The method statically estimates connection matrices which encode the connection relationships between all heap-directed pointers at each program point and can be used by parallelizing compilers to determine when two heap-allocated objects are guaranteed to be disjoint.
Abstract: This paper presents a practical heap analysis technique, connection analysis, that can be used to disambiguate heap accesses in C programs. The technique is designed for analyzing programs that allocate many disjoint objects in the heap such as dynamically-allocated arrays in scientific programs. The method statically estimates connection matrices which encode the connection relationships between all heap-directed pointers at each program point. The results of the analysis can be used by parallelizing compilers to determine when two heap-allocated objects are guaranteed to be disjoint, and thus can be used to improve array dependence and interference analysis. The method has been implemented as a context-sensitive interprocedural analysis in the McCAT optimizing/parallelizing C compiler. Experimental results are given to compare the accuracy of connection analysis versus a conservative estimate based on points-to analysis.
TL;DR: The use of an algebraic source code query technique that blends expressive power with query compactness is demonstrated and a case study where SCA expressions are used to query a program in terms of program organization, resource flow, control flow, metrics and syntactic structure is presented.
Abstract: Querying source code is an essential aspect of a variety of software engineering tasks such as program understanding, reverse engineering, program structure analysis and program flow analysis. In this paper, we present and demonstrate the use of an algebraic source code query technique that blends expressive power with query compactness. The query framework of Source Code Algebra (SCA) permits users to express complex source code queries and views as algebraic expressions. Queries are expressed on an extensible, object-oriented database that stores program source code. The SCA algebraic approach offers multiple benefits such as an applicative query language, high expressive power, seamless handling of structural and flow information, clean formalism and potential for query optimization. We present a case study where SCA expressions are used to query a program in terms of program organization, resource flow, control flow, metrics and syntactic structure. Our experience with an SCA-based prototype query processor indicates that an algebraic approach to source code queries combines the benefits of expressive power and compact query formulation.
TL;DR: The compositional reachability analysis method is extended to check safety properties of subsystems which may contain actions that are not globally observable, and is supported by augmenting finite-state machines with a special undefined state /spl pi/.
Abstract: The software architecture of a distributed program can be represented by an hierarchical composition of subsystems, with interacting processes at the leaves of the hierarchy. Compositional reachability analysis has been proposed as a promising automated method to derive the overall behavior of a distributed program in stages, based on its architecture. The method is particularly suitable for the analysis of programs which are subject to evolutionary change. When a program evolves, only behavior of those subsystems affected by the change need be re-evaluated. The method however has a limitation. The properties available for analysis are constrained by the set of actions that remain globally observable. The properties of subsystems, may not be analyzed. We extend the method to check safety properties of subsystems which may contain actions that are not globally observable. These safety properties can still be checked in the framework of compositional reachability analysis. The extension is supported by augmenting finite-state machines with a special undefined state /spl pi/. The state is used to capture possible violation of the safety properties specified by software developers. The concepts are illustrated using a gas station system as a case study.
TL;DR: This work presents new techniques for analyzing predicated code and shows how conventional data flow algorithms can be systematically upgraded to be predicate sensitive by incorporating information about predicates into compiler analysis.
Abstract: Predicated execution offers new approaches to exploiting instruction-level parallelism (ILP), but it also presents new challenges for compiler analysis and optimization. In predicated code, each operation is guarded by a boolean operand whose run-time value determines whether the operation is executed or nullified. While research has shown the utility of predication in enhancing ILP, there has been little discussion of the difficulties surrounding compiler support for predicated execution. Conventional program analysis tools (e.g. data flow analysis) assume that operations execute unconditionally within each basic block and thus make incorrect assumptions about the run-rime behavior of predicated code. These tools can be modified to be correct without requiring predicate analysis, but this yields overly-conservative results in crucial areas such as scheduling and register allocation. To generate high-quality code for machines offering predicated execution, a compiler must incorporate information about relations between predicates into its analysis. We present new techniques for analyzing predicated code. Operations which compute predicates are analyzed to determine relations between predicate values. These relations are captured in a graph-based data structure, which supports efficient manipulation of boolean expression representing facts about predicated code. This approach forms the basis for predicate-sensitive data flow analysis. Conventional data flow algorithms can be systematically upgraded to be predicate sensitive by incorporating information about predicates. Predicate-sensitive data flow analysis yields significantly more accurate results than conventional data flow analysis when applied to predicated code.
TL;DR: The paper describes how to generate and simplify path conditions based on program slices and shows that the technique can indeed increase slice precision and reveal manipulations of the so-called calibration path.
Abstract: We show how to combine program slicing and constraint solving in order to obtain better slice accuracy. The method is used in a program analysis tool for the validation of computer-controlled measurement systems. It will be used by the Physikalisch-Technische Bundesanstalt for verification of legally required calibration standards. The paper describes how to generate and simplify path conditions based on program slices. An example shows that the technique can indeed increase slice precision and reveal manipulations of the so-called calibration path.
TL;DR: This paper presents a specification method for program analysis and transformation that is implemented prototypically in the optimizer generator OPTIMIX, and uses a simple variant of graph rewrite systems (edge addition rewrite systems).
Abstract: Implementing program optimizers is a task which swallows an enourmous amount of man-power. To reduce development time a simple and practial specification method is highly desirable. Such a method should comprise both program analysis and transformation. However, although several frameworks for program analysis exist, none of them can be used for analysis and transformation uniformly. This paper presents such a method. For program analysis we use a simple variant of graph rewrite systems (edge addition rewrite systems). For program transformation we apply more complex graph rewrite systems. Our specification method has been implemented prototypically in the optimizer generator OPTIMIX. OPTIMIX works with arbitrary intermediate languages and generates real-life program analyses and transformations. We demonstrate this by several examples and measurements.
TL;DR: CIAO as mentioned in this paper is an advanced programming environment supporting Logic and Constraint programming, with a simple concurrent kernel on top of which declarative and non-declarative extensions are added via librarles.
Abstract: CIAO is an advanced programming environment supporting Logic
and Constraint programming. It offers a simple concurrent kernel on top of which declarative and non-declarative extensions are added via librarles. Librarles are available for supporting the ISOProlog standard, several constraint domains, functional and higher order programming, concurrent and distributed programming, internet programming, and others. The source language allows declaring properties of predicates via assertions, including types and modes. Such properties are checked at compile-time or at run-time. The compiler and system architecture are designed to natively support modular global analysis, with the two objectives of proving properties in assertions and performing program optimizations, including transparently exploiting parallelism in programs. The purpose of this paper is to report on recent progress made in the context of the CIAO system, with special emphasis on the capabilities of the compiler, the techniques used for supporting such capabilities, and the results in the areas of program analysis and transformation already obtained with the system.
TL;DR: This work believes that implementing an analysis should require writing only the code to generate the constraints, and that a well engineered-library can take care of constraint representation, resolution, and transformation, and develops a scalable, expressive framework for solving a class of set constraints.
Abstract: Constraint-based program analyses are appealing because elaborate analyses can be described with a concise and simple set of constraint generation rules. Constraint resolution algorithms have been developed for many kinds of constraints, conceptually allowing an implementation of a constraint-based program analysis to reuse large pieces of existing code. In practice, however, new analyses often involve re-implementing new, complex constraint solving frameworks, tuned for the particular analysis in question. This approach wastes development time and interferes with the desire to experiment quickly with a number of different analyses. We believe that implementing an analysis should require writing only the code to generate the constraints, and that a well engineered-library can take care of constraint representation, resolution, and transformation. Writing such a library capable of handling small programs is not too difficult, but scaling to large programs is hard. Toward this goal, we are developing a scalable, expressive framework for solving a class of set constraints. Scalability is achieved through four techniques: polymorphism, simplification, separation, and sparse representation of constraints.
TL;DR: This work studied of how experts perform this design recovery activity by analyzing the cognitive processes of six experienced system developers engaged in the program comprehension phase of software reengineering.
TL;DR: In this paper, the authors present an accurate and fast multi-level binding-time analysis for higher-order functional languages based on constraint systems and run almost-linear in the size of the analyzed programs.
Abstract: Program specialization can divide a computation into several computation stages. We present the key ingredient of our approach to multi-level specialization: an accurate and fast multi-level binding-time analysis. Three efficient program analyses for higher-order, functional languages are presented which are based on constraint systems and run almost-linear in the size of the analyzed programs. The three constraint normalizations have been proven correct (soundness, completeness, termination, existence of best solution). The analyses have all been implemented for a substantial, higher-order subset of Scheme. Experiments with widely-available example programs confirm the excellent run-time behavior of the normalization algorithms.
TL;DR: A knowledge-based analysis approach that generates first order predicate logic annotations of loops is presented and a consistent and rigorous functional abstraction of the whole loop is synthesized from the specifications of its individual events.
Abstract: The paper presents a knowledge-based analysis approach that generates first order predicate logic annotations of loops. A classification of loops according to their complexity levels is presented. Based on this taxonomy, variations on the basic analysis approach that best fit each of the different classes are described. In general, mechanical annotation of loops is performed by first decomposing them using data flow analysis. This decomposition encapsulates closely related statements in events, that can be analyzed individually. Specifications of the resulting loop events are then obtained by utilizing patterns, called plans, stored in a knowledge base. Finally, a consistent and rigorous functional abstraction of the whole loop is synthesized from the specifications of its individual events. To test the analysis techniques and to assess their effectiveness, a case study was performed on an existing program of reasonable size. Results concerning the analyzed loops and the plans designed for them are given.
TL;DR: An algorithm for performing alias analysis on incomplete programs, that lets individual software components such as library routines, subroutines, or subsystems be independently analyzed and reuse the results of that analysis when analyzing calling programs, without incurring the expense of completely reanalyzing each calling program.
Abstract: Interprocedural dataflow information is useful for many software testing and analysis techniques, including dataflow testing, regression testing, program slicing and impact analysis. For programs with aliases, these testing and analysis techniques can yield invalid results, unless the dataflow information accounts for aliasing effects. Recent research provides algorithms for performing interprocedural dataflow analysis in the presence of aliases; however, these algorithms are expensive, and achieve precise results only on complete programs. This paper presents an algorithm for performing alias analysis on incomplete programs, that lets individual software components such as library routines, subroutines, or subsystems be independently analyzed. The paper also presents an algorithm for reusing the results of this separate analysis when linking the individual software components with calling modules. The primary advantage of our algorithms is that they let us analyze frequently used software components, such as library routines or classes, independently, and reuse the results of that analysis when analyzing calling programs, without incurring the expense of completely reanalyzing each calling program.
TL;DR: This dissertation presents a new framework for efficient program analysis using a new program representation called the DJ Graph, and presents several new algorithms for solving problems encountered in program analysis.
Abstract: Program analysis is a process of estimating properties of a program statically. Program analyses have many applications, including compiler optimizations, software maintenance and testing, and program verification. In this dissertation we present a new framework for efficient program analysis. At the heart of our approach is a new program representation called the DJ Graph. Using DJ graphs we present several new algorithms for solving problems encountered in program analysis. The problems that we have solved range from a simple loop identification problem to sophisticated exhaustive and incremental data flow analysis, including the construction of Sparse Evaluation Graphs. The algorithms presented here are simple, more general, and/or more efficient than existing methods for solving similar problems. To study the effectiveness of our algorithms on real programs we implemented many of them, and experimented on a number of FORTRAN procedures taken from standard benchmark suites. Our results indicate that the algorithms presented here perform well in practice.
TL;DR: interpretation understood as a theory of semantic approximation is a basis for a methodology that relies on the idea that the specification of an analyzer is an approximation of a semantics, where concrete or exact properties are replaced.
Abstract: interpretation understood as a theory of semantic approximation is a basis for such a methodology. It relies on the idea that the specification of an analyzer is an approximation of a semantics, where concrete or exact properties are replaced
TL;DR: This work presents trace-based program analysts, a semantics-based framework for statically analyzing and transforming programs with loops, assignments, and nested record structures, based on transfer transition systems, which define the small-step operational semantics of programming languages.
Abstract: We present trace-based program analysts, a semantics-based framework for statically analyzing and transforming programs with loops, assignments, and nested record structures. Trace-based analyses are based on transfer transition, systems, which define the small-step operational semantics of programming languages. Intuitively, transfer transition systems provide direct support for reasoning about the possible execution traces of a program, instead of just individual program states. The traces in a transfer transition system have many uses, including the finite representation of all possible terminating executions of a loop. Also, traces may be systematically "pieced together," thus allowing the composition of separately analyzed program fragments. The utility of the approach is demonstrated by showing three applications: software pipelining, loop-invariant removal, and data alias detection.
TL;DR: This paper develops a more precise slicing concept, called p-slices, defined using Dijkstra's weakest precondition (wp), to determine which statements will affect a specified predicate.
Abstract: Program slices have long been used as an aid to program understanding, especially in maintenance activities. Most slicing methods involve data and control flow analysis to determine what statements might affect a set of variables. Here, we develop a more precise slicing concept, called p-slices, defined using Dijkstra's weakest precondition (wp), to determine which statements will affect a specified predicate. Weakest preconditions are already known to be an effective technique for program understanding and analysis, and this paper unifies wp analysis and slicing and simplifies existing slicing algorithms. Slicing rules for assignment, conditional, and repetition statements are developed. The authors are currently using these techniques in their work with software maintenance teams and are incorporating p-slice computation into a program analysis tool.
TL;DR: An extensive study involving three test groups over a period of three different years was performed to determine differences between comprehension of recursive and iterative code constructs, suggesting a tendency toward an interaction effect between task and construct in terms of comprehension time.
TL;DR: Constraint-based program analyses are appealing because elaborate analyses can be described with a concise and simple set of constraint generation rules as discussed by the authors, which can be used to reuse large pieces of existing code.
Abstract: Constraint-based program analyses are appealing because elaborate analyses can be described with a concise and simple set of constraint generation rules. Constraint resolution algorithms have been developed for many kinds of constraints, conceptually allowing an implementation of a constraint-based program analysis to reuse large pieces of existing code. In practice, however, new analyses often involve re-implementing new, complex constraint solving frameworks, tuned for the particular analysis in question. This approach wastes development time and interferes with the desire to experiment quickly with a number of different analyses.
TL;DR: In this article, a static analysis of the source code of a program through static static analysis is presented, which makes it easy for the user to understand the process contents and source code containing input/output specifications.
Abstract: It is an object of this invention to provide a program analysis system and a program analysis method which efficiently analyze the source code of a program through static analysis and, based on the analysis result, makes it easy for the user to understand the process contents and source code containing input/output specifications. To achieve this object, the data information extracting means (1) extracts from the source code the data information representing the structure of data items contained in the source code. The relation information extracting means (5) extracts relation information representing the relation among data items for each position in the source code, based on the source code and the data information. The process information extracting means (6) extracts various types of relation information on each process in the source code as the process information representing the process, based on the source code, data information, and relation information. The specifying module (7) enables the user to specify an output range and form. The outputting means (2) outputs extracted information in the specified form.
TL;DR: In this article, a first application program (33) raises an error condition, a context switch (33,34,35), in one embodiment, transfers control to one of several Help programs (36), as selected automatically without user or system operator intervention.
Abstract: When a first application program (33) raises an error condition, a context switch (33,34,35), in one embodiment, transfers control to one of several Help programs (36), as selected automatically without user or system operator intervention Such a first application program (33) operates on a computer system (10) that includes a file system (50), a data structure (35) in memory (30), and a processor (20) that executes in sequence an operating system (32), the first application program (33), a constructor (34), and a second application program (36), for example, a Help program For each candidate Help program (36,56), the constructor (34) looks for prerequisite files (52,54,58) in the file system (50) If a candidate Help program's prerequisites are met, the constructor (34) sets a link value in the data structure (35) that directs a subsequent call from the first application program (33) to the Help program (36) selected by the constructor (34) In a second embodiment, a general method of developing the first application program (33) for context switching incorporates the step of including (210), in the first application program (33), a transfer of control (220) to a destination program (36) identified by a constructor (34), wherein the constructor (34), precluding manual direction by an operator, identifies the destination program (36) from several candidates (36,56) by testing an operational prerequisite (58)
TL;DR: The main challenges in performance analysis of embedded system are in the area of software performance analysis and analysis of concurrent components and here the research can be broadly classified into two areas.
Abstract: Embedded computer systems are characterized by the presence of one or more processors running application specific software. A large number of these systems must satisfy performance constraints in addition to cost constraints. Because embedded systems are constructed from large, complex components — CPUs and ASICs — we need new techniques to analyze the performance of these components as well as their compositions. Since the performance analysis of ASICs is considered to be a well studied problem, the main challenges in performance analysis of embedded system are in the area of software performance analysis and analysis of concurrent components. Here the research can be broadly classified into two areas. Program analysis methods consider the behavior of a single process on a given processor and attempt to determine bounds on the execution time of the process. This task is made complicated for modern CPUs by the presence of relatively non-deterministic features such as caches and instruction pipelines which make the execution time of one instruction depend on both the recent and distant history of the executed instruction trace. System analysis methods study the performance of a network of machines—either CPUs or ASICs—which execute a set of processes. Even if we know the execution time of each process in isolation, we must consider conflicts in allocation of processes to processing elements and the scheduling of events on the processing elements and communication channels.
TL;DR: Some preliminary empirical scalability results suggest that, at least under certain conditions, constraint-based program understanding is close to being applicable to real-world programs.
Abstract: Over the past decade, researchers in program understanding have formulated many program understanding algorithms but have published few studies of their relative scalability. Consequently, it is difficult to understand the relative limitations of these algorithms and to determine whether the field of program understanding is making progress. The paper attempts to address this deficiency by formalizing the search strategies of several different program understanding algorithms as constraint satisfaction problems, and by presenting some preliminary empirical scalability results for these constraint-based implementations. These initial results suggest that, at least under certain conditions, constraint-based program understanding is close to being applicable to real-world programs.
TL;DR: By allowing the use of rank 2 intersection, the non-standard type assignment system for the detection and elimination of dead-code in typed functional programs presented by Coppo et al in the Static Analysis Symposium '96 is extended.
Abstract: In this paper we extend, by allowing the use of rank 2 intersection, the non-standard type assignment system for the detection and elimination of dead-code in typed functional programs presented by Coppo et al in the Static Analysis Symposium '96. The main application of this method is the optimization of programs extracted from proofs in logical frameworks, but it could be used as well in the elimination of dead-code determined by program specialization. The use of nonstandard types (also called annotated types) allows to exploit the type structure of the language for investigating program properties. Dead-code is detected via annotated type inference, which can be performed in a complete way, by reducing it to the solution of a system of inequalities between annotation variables. Even though the language considered in the paper is the simply typed λ-calculus with cartesian product, if-then-else, fixpoint, and arithmetic constants we can generalize our approach to polymorphic languages like Miranda, Haskell, and CAML.
TL;DR: The model based approach includes the parameterised denotational semantics techniques developed for functional and imperative languages, but also more generally the use of mathematical modelling in the abstract interpretation of imperative, functional, concurrent and logic languages.
Abstract: Program analysis offers static techniques for predicting safe and computable approximations to the set of values or behaviours arising dynamically during computation; this may be used to validate program transformations or to generate more efficient code. The flow based approach includes the traditional data flow analysis techniques for mainly imperative languages, but also the control flow analysis techniques developed for functional and object oriented languages. The model based approach includes the parameterised denotational semantics techniques developed for functional and imperative languages, but also more generally the use of mathematical modelling in the abstract interpretation of imperative, functional, concurrent and logic languages. The inference based approach includes general logical techniques touching upon program verification techniques, but also annotated type and effect systems developed for functional, imperative and concurrent languages; it is this latter and rather recent approach that we now consider.