TL;DR: The tool supports almost all ANSI-C language features, including pointer constructs, dynamic memory allocation, recursion, and the float and double data types, and is integrated into a graphical user interface.
Abstract: We present a tool for the formal verification of ANSI-C programs using Bounded Model Checking (BMC). The emphasis is on usability: the tool supports almost all ANSI-C language features, including pointer constructs, dynamic memory allocation, recursion, and the float and double data types. From the perspective of the user, the verification is highly automated: the only input required is the BMC bound. The tool is integrated into a graphical user interface. This is essential for presenting long counterexample traces: the tool allows stepping through the trace in the same way a debugger allows stepping through a program.
TL;DR: In this paper, a finite-state abstraction of a sequential program with potentially recursive procedures and input from the environment is checked statically whether there are input sequences that can drive the system into "bad/good" executions.
Abstract: Given a finite-state abstraction of a sequential program with potentially recursive procedures and input from the environment, we wish to check statically whether there are input sequences that can drive the system into “bad/good” executions. Pushdown games have been used in recent years for such analyses and there is by now a very rich literature on the subject. (See, e.g., [BS92,Tho95,Wal96,BEM97,Cac02a,CDT02].)
TL;DR: This paper presents the first scalable context-sensitive, inclusion-based pointer alias analysis for Java programs, and develops a system called bddbddb that automatically translates Datalog programs into highly efficient BDD implementations.
Abstract: This paper presents the first scalable context-sensitive, inclusion-based pointer alias analysis for Java programs. Our approach to context sensitivity is to create a clone of a method for every context of interest, and run a context-insensitive algorithm over the expanded call graph to get context-sensitive results. For precision, we generate a clone for every acyclic path through a program's call graph, treating methods in a strongly connected component as a single node. Normally, this formulation is hopelessly intractable as a call graph often has 10 14 acyclic paths or more. We show that these exponential relations can be computed efficiently using binary decision diagrams (BDDs). Key to the scalability of the technique is a context numbering scheme that exposes the commonalities across contexts. We applied our algorithm to the most popular applications available on Sourceforge, and found that the largest programs, with hundreds of thousands of Java bytecodes, can be analyzed in under 20 minutes.This paper shows that pointer analysis, and many other queries and algorithms, can be described succinctly and declaratively using Datalog, a logic programming language. We have developed a system called bddbddb that automatically translates Datalog programs into highly efficient BDD implementations. We used this approach to develop a variety of context-sensitive algorithms including side effect analysis, type analysis, and escape analysis.
TL;DR: D DynamoRIO is presented, a fully-implemented runtime code manipulation system that supports code transformations on any part of a program, while it executes, with zero to thirty percent time and memory overhead on both Windows and Linux.
Abstract: This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamically-generated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every instruction—which is critical for program analysis, instrumentation, trace gathering, optimization, and similar tools—it can now only be done at runtime. Existing runtime tools are successful at inserting instrumentation calls, but no general framework has been developed for fine-grained and comprehensive code observation and modification without high overheads.
This thesis demonstrates the feasibility of building such a system in software. We present DynamoRIO, a fully-implemented runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO uses code caching technology to provide efficient, transparent, and comprehensive manipulation of an unmodified application running on a stock operating system and commodity hardware. DynamoRIO executes large, complex, modern applications with dynamically-loaded, generated, or even modified code. Despite the formidable obstacles inherent in the IA-32 architecture, DynamoRIO provides these capabilities efficiently, with zero to thirty percent time and memory overhead on both Windows and Linux.
DynamoRIO exports an interface for building custom runtime code manipulation tools of all types. It has been used by many researchers, with several hundred downloads of our public release, and is being commercialized in a product for protection against remote security exploits, one of numerous applications of runtime code manipulation. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)
TL;DR: Experimental results show that the power of the model checking engine can be used to provide assistance in understanding errors and to isolate faulty portions of the source code.
Abstract: In the event that a system does not satisfy a specification, a model checker will typically automatically produce a counterexample trace that shows a particular instance of the undesirable behavior. Unfortunately, the important steps that follow the discovery of a counterexample are generally not automated. The user must first decide if the counterexample shows genuinely erroneous behavior or is an artifact of improper specification or abstraction. In the event that the error is real, there remains the difficult task of understanding the error well enough to isolate and modify the faulty aspects of the system. This paper describes an automated approach for assisting users in understanding and isolating errors in ANSI C programs. The approach is based on distance metrics for program executions. Experimental results show that the power of the model checking engine can be used to provide assistance in understanding errors and to isolate faulty portions of the source code.
TL;DR: An algorithm is presented which can be used to safely approximate Interprocedural May Alias in the presence of pointers and has been implemented in a prototype analysis tool for C programs.
Abstract: During execution, when two or more names exist for the same location at some program point, we call them aliases. In a language which allows arbitrary pointers, the problem of determining aliases at a program point is P-space-hard [Lan92]. We present an algorithm for the Conditional May Alias problem, which can be used to safely approximate Interprocedural May Alias in the presence of pointers. This algorithm is as precise as possible in the worst case and has been implemented in a prototype analysis tool for C programs. Preliminary speed and precision results are presented.
TL;DR: A technique for identifying program properties that indicate errors, which generates machine learning models of program properties known to result from errors, and applies these models to program properties of user-written code to classify and rank properties that may lead the user to errors.
Abstract: This paper proposes a technique for identifying program properties that indicate errors. The technique generates machine learning models of program properties known to result from errors, and applies these models to program properties of user-written code to classify and rank properties that may lead the user to errors. Given a set of properties produced by the program analysis, the technique selects a subset of properties that are most likely to reveal an error. An implementation, the fault invariant classifier, demonstrates the efficacy of the technique. The implementation uses dynamic invariant detection to generate program properties. It uses support vector machine and decision tree learning tools to classify those properties. In our experimental evaluation, the technique increases the relevance (the concentration of fault-revealing properties) by a factor of 50 on average for the C programs, and 4.8 for the Java programs. Preliminary experience suggests that most of the fault-revealing properties do lead a programmer to an error.
TL;DR: This book details program analyses and transformations that extract the flow of data in computer memory systems and shows that correctness of program transformations is guaranteed by the conservation of data flow.
Abstract: From the Publisher:
This book details program analyses and transformations that extract the flow of data in computer memory systems. It emphasizes a framework for the optimization of code for imperative programs and greater computer systems efficiency. In addition, it shows that correctness of program transformations is guaranteed by the conservation of data flow.
Professionals and researchers in software engineering, computer engineering, program design analysis, and compiler design will benefit from its presentation of data-flow methods and memory optimization of compilers.
TL;DR: DMS is described, a practical, commercial program analysis and transformation system, and sketches a variety of tasks to which it has been applied, from redocumenting to large-scale system migration.
Abstract: While a number of research systems have demonstrated the potential value of program transformations, very few of these systems have made it into practice The core technology for such systems is well understood; what remains is integration and more importantly, the problem of handling the scale of the applications to be processed This paper describes DMS, a practical, commercial program analysis and transformation system, and sketches a variety of tasks to which it has been applied, from redocumenting to large-scale system migration Its success derives partly from a vision of design maintenance and the construction of infrastructure that appears necessary to support that vision DMS handles program scale by careful space management, computational scale via parallelism and knowledge acquisition scale via domains
TL;DR: The method for procedure summarization is based on the insight that in well-synchronized programs, any computation of a thread can be viewed as a sequence of transactions, each of which appears to execute atomically to other threads.
Abstract: The ability to summarize procedures is fundamental to building scalable interprocedural analyses. For sequential programs, procedure summarization is well-understood and used routinely in a variety of compiler optimizations and software defect-detection tools. However, the benefit of summarization is not available to multithreaded programs, for which a clear notion of summaries has so far remained unarticulated in the research literature.In this paper, we present an intuitive and novel notion of procedure summaries for multithreaded programs. We also present a model checking algorithm for these programs that uses procedure summarization as an essential component. Our algorithm can also be viewed as a precise interprocedural dataflow analysis for multithreaded programs. Our method for procedure summarization is based on the insight that in well-synchronized programs, any computation of a thread can be viewed as a sequence of transactions, each of which appears to execute atomically to other threads. We summarize within each transaction; the summary of a procedure comprises the summaries of all transactions within the procedure. We leverage the theory of reduction [17] to infer boundaries of these transactions.The procedure summaries computed by our algorithm allow reuse of analysis results across different call sites in a multithreaded program, a benefit that has hitherto been available only to sequential programs. Although our algorithm is not guaranteed to terminate on multithreaded programs that use recursion (reachability analysis for multithreaded programs with recursive procedures is undecidable [18]), there is a large class of programs for which our algorithm does terminate. We give a formal characterization of this class, which includes programs that use shared variables, synchronization, and recursion.
TL;DR: This work enables the use of whole-program class analyses for testing of polymorphism in partial programs, and identifies analyses that potentially are good candidates for use in coverage tools.
Abstract: Testing of polymorphism in object-oriented software may require coverage of all possible bindings of receiver classes and target methods at call sites. Tools that measure this coverage need to use class analysis to compute the coverage requirements. However, traditional whole-program class analysis cannot be used when testing incomplete programs. To solve this problem, we present a general approach for adapting whole-program class analyses to operate on program fragments. Furthermore, since analysis precision is critical for coverage tools, we provide precision measurements for several analyses by determining which of the computed coverage requirements are actually feasible for a set of subject components. Our work enables the use of whole-program class analyses for testing of polymorphism in partial programs, and identifies analyses that potentially are good candidates for use in coverage tools.
TL;DR: An expressive languagebased mechanism for securely manipulating dynamic security labels is presented both in the context of a Java-like programming language and in a core language based on the typed lambda calculus.
Abstract: This paper explores information flow control in systems in which the security classes of data can vary dynamically. Information flow policies provide the means to express strong security requirements for data confidentiality and integrity. Recent work on security-typed programming languages has shown that information flow can be analyzed statically, ensuring that programs will respect the restrictions placed on data. However, real computing systems have security policies that vary dynamically and that cannot be determined at the time of program analysis. For example, a file has associated access permissions that cannot be known with certainty until it is opened. Although one security-typed programming language has included support for dynamic security labels, there has been no examination of whether such a mechanism can securely control information flow. In this paper, we present an expressive languagebased mechanism for securely manipulating dynamic security labels. The mechanism is presented both in the context of a Java-like programming language and, more formally, in a core language based on the typed lambda calculus. This core language is expressive enough to encode previous dynamic label mechanisms; as importantly, any well-typed program is provably secure because it satisfies noninterference.
TL;DR: It is proved that the parametricity theorem for F implies the noninterference theorem for DCC, and the translation provides insights into DCC's type system and suggests implementation strategies of dependency calculi in polymorphic languages.
Abstract: Abadi et al. introduced the dependency core calculus (DCC) as a unifying framework to study many important program analyses such as binding time, information flow, slicing, and function call tracking. DCC uses a lattice of monads and a nonstandard typing rule for their associated bind operations to describe the dependency of computations in a program. Abadi et al. proved a noninterference theorem that establishes the correctness of DCC's type system and thus the correctness of the type systems for the analyses above.In this paper, we study the relationship between DCC and the Girard-Reynolds polymorphic lambda calculus (System F). We encode the recursion-free fragment of DCC into F via a type-directed translation. Our main theoretical result is that, following from the correctness of the translation, the parametricity theorem for F implies the noninterference theorem for DCC. In addition, the translation provides insights into DCC's type system and suggests implementation strategies of dependency calculi in polymorphic languages.
TL;DR: In this paper, the authors show how to use tabled logic programming to evaluate queries to abductive frameworks with integrity constraints when these frameworks contain both default negation and explicit negation.
Abstract: Abductive logic programming offers a formalism to declaratively express and solve problems in areas such as diagnosis, planning, belief revision and hypothetical reasoning. Tabled logic programming offers a computational mechanism that provides a level of declarativity superior to that of Prolog, and which has supported successful applications in fields such as parsing, program analysis, and model checking. In this paper we show how to use tabled logic programming to evaluate queries to abductive frameworks with integrity constraints when these frameworks contain both default and explicit negation. The result is the ability to compute abduction over well-founded semantics with explicit negation and answer sets. Our approach consists of a transformation and an evaluation method. The transformation adjoins to each objective literal $O$ in a program, an objective literal $\hbox{\it not}(O)$ along with rules that ensure that $\hbox{\it not}(O)$ will be true if and only if $O$ is false. We call the resulting program a dual program. The evaluation method, ABDUAL, then operates on the dual program. ABDUAL is sound and complete for evaluating queries to abductive frameworks whose entailment method is based on either the well-founded semantics with explicit negation, or on answer sets. Further, ABDUAL is asymptotically as efficient as any known method for either class of problems. In addition, when abduction is not desired, ABDUAL operating on a dual program provides a novel tabling method for evaluating queries to ground extended programs whose complexity and termination properties are similar to those of the best tabling methods for the well-founded semantics. A publicly available meta-interpreter has been developed for ABDUAL using the XSB system.
TL;DR: This work presents a mixed solution which builds an observation graph represented in a non symbolic way but where the nodes are essentially symbolic set of states and outperforms the pure symbolic methods which makes it attractive.
Abstract: Symbolic model-checking usually includes two steps: the building of a compact representation of a state graph and the evaluation of the properties of the system upon this data structure. In case of properties expressed with a linear time logic, it appears that the second step is often more time consuming than the first one. In this work, we present a mixed solution which builds an observation graph represented in a non symbolic way but where the nodes are essentially symbolic set of states. Due to the small number of events to be observed in a typical formula, this graph has a very moderate size and thus the complexity time of verification is neglectible w.r.t. the time to build the observation graph. Thus we propose different symbolic implementations for the construction of the nodes of this graph. The evaluations we have done on standard examples show that our method outperforms the pure symbolic methods which makes it attractive.
TL;DR: A survey of Tempo, a specializer for the C language, is presented and an overview of the design and use of its use in specializing realistic applications is given.
TL;DR: This paper gives a polynomial time algorithm to find the optimal program partition for given program input data and uses an option-clustering approach to handle different program partitions for different program execution options.
TL;DR: This work presents a technique to determine sound and precise JSR-14 types at each use of a class for which a generic type specification is available, and uses a precise and context-sensitive pointer analysis to determine possible types at allocation sites.
Abstract: Java 1.5 will include a type system (called JSR-14) that supports parametric polymorphism, or generic classes. This will bring many benefits to Java programmers, not least because current Java practice makes heavy use of logically-generic classes, including container classes. Translation of Java source code into semantically equivalent JSR-14 source code requires two steps: parameterization (adding type parameters to class definitions) and instantiation (adding the type arguments at each use of a parameterized class). Parameterization need be done only once for a class, whereas instantiation must be performed for each client, of which there are potentially many more. Therefore, this work focuses on the instantiation problem. We present a technique to determine sound and precise JSR-14 types at each use of a class for which a generic type specification is available. Our approach uses a precise and context-sensitive pointer analysis to determine possible types at allocation sites, and a set-constraint-based analysis (that incorporates guarded, or conditional, constraints) to choose consistent types for both allocation and declaration sites. The technique handles all features of the JSR-14 type system, notably the raw types that provide backward compatibility. We have implemented our analysis in a tool that automatically inserts type parameters into Java code, and we report its performance when applied to a number of real-world Java programs.
TL;DR: This paper shows how the simple structure of the linear programs encountered during symbolic minimum-cost reachability analysis of priced timed automata can be exploited in order to substantially improve the performance of the current algorithm.
Abstract: In this paper, we show how the simple structure of the linear programs encountered during symbolic minimum-cost reachability analysis of priced timed automata can be exploited in order to substantially improve the performance of the current algorithm. The idea is rooted in duality of linear programs and we show that each encountered linear program can be reduced to the dual problem of an instance of the min-cost flow problem. Thus, we only need to solve instances of the much simpler min-cost flow problem during minimum-cost reachability analysis. Experimental results using Uppaal show a 70-80 percent performance gain. As a main application area, we show how to solve energy-optimal task graph scheduling problems using the framework of priced timed automata.
TL;DR: This paper presented at the 2005 Mid-Atlantic Student Workshop on Programming Languages and Systems (MASPLAS) in April 2005 and was described as a "road map for the future development of Python".
Abstract: Also presented at the 2005 Mid-Atlantic Student Workshop on Programming Languages and Systems (MASPLAS) in April 2005.
TL;DR: A new algorithm based on Tarjan’s algorithm for detecting strongly connected components is presented, showing its correctness, how it can be efficiently implemented, and its interaction with other model checking techniques, such as bitstate hashing.
Abstract: State-of-the-art algorithms for on-the-fly automata-theoretic LTL model checking make use of nested depth-first search to look for accepting cycles in the product of the system and the Buchi automaton. Here we present a new algorithm based on Tarjan’s algorithm for detecting strongly connected components. We show its correctness, describe how it can be efficiently implemented, and discuss its interaction with other model checking techniques, such as bitstate hashing. The algorithm is compared to the old algorithms through experiments on both random and actual state spaces, using random and real formulas. Our measurements indicate that our algorithm investigates at most as many states as the old ones. In the case of a violation of the correctness property, the algorithm often explores significantly fewer states.
TL;DR: The paper describes the specification, design, analysis, and implementation of algorithms and data structures for efficiently solving existential and universal parametric regular path queries, and investigates the efficiency tradeoffs between different formulations of queries.
Abstract: Regular path queries are a way of declaratively expressing queries on graphs as regular-expression-like patterns that are matched against paths in the graph. There are two kinds of queries: existential queries, which specify properties about individual paths, and universal queries, which specify properties about all paths. They provide a simple and convenient framework for expressing program analyses as queries on graph representations of programs, for expressing verification (model-checking) problems as queries on transition systems, for querying semi-structured data, etc. Parametric regular path queries extend the patterns with variables, called parameters, which significantly increase the expressiveness by allowing additional information along single or multiple paths to be captured and relate.This paper shows how a variety of program analysis and model-checking problems can be expressed easily and succinctly using parametric regular path queries. The paper describes the specification, design, analysis, and implementation of algorithms and data structures for efficiently solving existential and universal parametric regular path queries. Major contributions include the first complete algorithms and data structures for directly and efficiently solving existential and universal parametric regular path queries, detailed complexity analysis of the algorithms, detailed analytical and experimental performance comparison of variations of the algorithms and data structures, and investigation of efficiency tradeoffs between different formulations of queries.
TL;DR: In this paper, a software system made of multiple computer program programs is facilitated by using information about cross-program interactions and dependency relationships between programs to analyze the individual programs in such a way that the behavior of the system as a whole is accurately represented.
Abstract: Defect detection in a software system made of multiple computer program programs is facilitated by using information about cross-program interactions and dependency relationships between programs to analyze the individual programs in such a way that the behavior of the system as a whole is accurately represented. A list of dependency relationships is read in; these dependency relationships are used to determine an order in which the programs should be analyzed. The programs are then analyzed in that order. Information from the analysis of the programs is used to inform the analysis of subsequently-analyzed programs.
TL;DR: This work uses (homomorphic) abstractions that reveal an error in the model to guide the model checker in searching for the error state in the original system, based on pattern databases commonly used in artificial intelligence.
Abstract: Arguably the two most important techniques that are used in model checking to counter the combinatorial explosion in the number of states are abstraction and guidance. In this work we combine these techniques in a natural way by using (homomorphic) abstractions that reveal an error in the model to guide the model checker in searching for the error state in the original system. The mechanism used to achieve this is based on pattern databases, commonly used in artificial intelligence. A pattern database represents an abstraction and is used as a heuristic to guide the search. In essence, therefore, the same abstraction is used to reduce the size of the model and guide a search algorithm. We implement this approach in NuSMV and evaluate it using 2 well-known circuit benchmarks. The results show that this method can outperform the original model checker by several orders of magnitude, in both time and space.
TL;DR: In this paper, a concurrent program is analyzed for the presence of data races by the creation of a sequential program from the concurrent program, which contains assertions which can be verified by an analysis tool.
Abstract: A concurrent program is analyzed for the presence of data races by the creation of a sequential program from the concurrent program. The sequential program contains assertions which can be verified by a sequential program analysis tool, and which, when violated, indicate the presence of a data race. The sequential program emulates multiple executions of the concurrent program by nondeterministically scheduling asynchronous threads of the concurrent program on a single runtime stack and nondeterministically removing the currently-executing thread from the stack before instructions of the program. Checking functions are used to provide assertions for data races, along with a global access variable, which indicates if a variable being analyzed for data races is currently being accessed by any threads.
TL;DR: An optimized design taking advantage of specialization relationships and leading to a significant runtime improvement as well as to more compact pattern specifications is described.
Abstract: Event tracing is a well-accepted technique for post-mortem performance analysis of parallel applications. The expert tool supports the analysis of large traces by automatically searching them for execution patterns that indicate inefficient behavior. However, the current search algorithm works with independent pattern specifications and ignores the specialization hierarchy existing between them, resulting in a long analysis time caused by repeated matching attempts as well as in replicated code. This article describes an optimized design taking advantage of specialization relationships and leading to a significant runtime improvement as well as to more compact pattern specifications.
TL;DR: The use of assertions to express correctness properties of programs is growing in practice and checkable redundancy can be very effective in finding defects in programs and in guiding developers to the cause of a defect.
Abstract: The use of assertions to express correctness properties of programs is growing in practice Assertions provide a form of checkable redundancy that can be very effective in finding defects in programs and in guiding developers to the cause of a defect A wide variety of assertion languages and associated validation techniques have been developed, but run-time monitoring is commonly thought to be the only practical solution
TL;DR: This work proposes an approach to supporting slicing over both program and database state, which requires the introduction of two new forms of data dependency into the standard program dependency graph.
Abstract: Program slicing has long been recognised as a valuable technique for supporting the software maintenance process. However, many programs operate over some kind of external state, as well as the internal program state. Arguably, the most significant form of external state is that used to store data associated with the application, for example, in a database management system. We propose an approach to supporting slicing over both program and database state, which requires the introduction of two new forms of data dependency into the standard program dependency graph. Our method expands the usefulness of program slicing techniques to the considerable number of database application programs that are being maintained within industry and science today.
TL;DR: This work aims to specify program transformations in a declarative style, and then to generate executable program transformers from such specifications by drawing on Bernhard Steffen's proposal to use model checking for certain analysis problems, and John Conway's theory of language factors.
Abstract: We aim to specify program transformations in a declarative style, and then to generate executable program transformers from such specifications. Many transformations require non-trivial program analysis to check their applicability, and it is prohibitively expensive to re-run such analyses after each transformation. It is desirable, therefore, that the analysis information is incrementally updated.We achieve this by drawing on two pieces of previous work: first, Bernhard Steffen's proposal to use model checking for certain analysis problems, and second, John Conway's theory of language factors. The first allows the neat specification of transformations, while the second opens the way for an incremental implementation. The two ideas are linked by using regular patterns instead of Steffen's modal logic: these patterns can be viewed as queries on the set of program paths.
TL;DR: A comparison of these two complimentary approaches of Vampir NG and the Debugging Wizard DeWiz delivers some promising ideas for future solutions in the area of parallel and distributed program analysis.
Abstract: Large scale high-performance computing systems pose a tough obstacle for todays program analysis tools. Their demands in computational performance and memory capacity for processing program analysis data exceed the capabilities of standard workstations and traditional analysis tools. The sophisticated approaches of Vampir NG (VNG) and the Debugging Wizard DeWiz intend to provide novel ideas for scalable parallel program analysis. While VNG exploits the power of cluster architectures for near real-time performance analysis, DeWiz utilizes distributed computing infrastructures for distinct analysis activities. A comparison of these two complimentary approaches delivers some promising ideas for future solutions in the area of parallel and distributed program analysis.