TL;DR: It is argued that the step from flow-insensitive to flow-sensitive is fundamentally limited for purely dynamic information-flow controls, and a general framework for hybrid mechanisms that is parameterized in the static part and in the reaction method of the enforcement is presented.
Abstract: This paper seeks to answer fundamental questions about trade-offs between static and dynamic security analysis. It has been previously shown that flow-sensitive static information-flow analysis is a natural generalization of flow-insensitive static analysis, which allows accepting more secure programs. It has been also shown that sound purely dynamic information-flow enforcement is more permissive than static analysis in the flow-insensitive case. We argue that the step from flow-insensitive to flow-sensitive is fundamentally limited for purely dynamic information-flow controls. We prove impossibility of a sound purely dynamic information-flow monitor that accepts programs certified by a classical flow-sensitive static analysis. A side implication is impossibility of permissive dynamic instrumented security semantics for information flow, which guides us to uncover an unsound semantics from the literature. We present a general framework for hybrid mechanisms that is parameterized in the static part and in the reaction method of the enforcement (stop, suppress, or rewrite) and give security guarantees with respect to termination-insensitive noninterference for a simple language with output.
TL;DR: In this paper, the equivalent static loads (ESLs) method is proposed and applied to nonlinear dynamic response optimization, where the ESLs are made from the results of non-linear dynamic analysis and used as external forces in linear static response optimization.
TL;DR: Like types are introduced, a novel intermediate point between dynamic and static typing, which provides some of the benefits of static typing without decreasing the expressiveness of the language.
Abstract: Many large software systems originate from untyped scripting language code. While good for initial development, the lack of static type annotations can impact code-quality and performance in the long run. We present an approach for integrating untyped code and typed code in the same system to allow an initial prototype to smoothly evolve into an efficient and robust program. We introduce like types , a novel intermediate point between dynamic and static typing. Occurrences of like types variables are checked statically within their scope but, as they may be bound to dynamic values, their usage is checked dynamically. Thus like types provide some of the benefits of static typing without decreasing the expressiveness of the language. We provide a formal account of like types in a core object calculus and evaluate their applicability in the context of a new scripting language.
TL;DR: Practical static and dynamic tools that can find inappropriate use of containers in Java programs are presented, and experimental results show that the static tool has a low false positive rate and produces more relevant information than its dynamic counterpart.
Abstract: Runtime bloat degrades significantly the performance and scalability of software systems. An important source of bloat is the inefficient use of containers. It is expensive to create inefficiently-used containers and to invoke their associated methods, as this may ultimately execute large volumes of code, with call stacks dozens deep, and allocate many temporary objects.This paper presents practical static and dynamic tools that can find inappropriate use of containers in Java programs. At the core of these tools is a base static analysis that identifies, for each container, the objects that are added to this container and the key statements (i.e., heap loads and stores) that achieve the semantics of common container operations such as ADD and GET. The static tool finds problematic uses of containers by considering the nesting relationships among the loops where these semantics-achieving statements are located, while the dynamic tool can instrument these statements and find inefficiencies by profiling their execution frequencies.The high precision of the base analysis is achieved by taking advantage of a context-free language (CFL)-reachability formulation of points-to analysis and by accounting for container-specific properties. It is demand-driven and client-driven, facilitating refinement specific to each queried container object and increasing scalability. The tools built with the help of this analysis can be used both to avoid the creation of container-related performance problems early during development, and to help with diagnosis when problems are observed during tuning. Our experimental results show that the static tool has a low false positive rate and produces more relevant information than its dynamic counterpart. Further case studies suggest that significant optimization opportunities can be found by focusing on statically-identified containers for which high allocation frequency is observed at run time.
TL;DR: TACO is presented, a prototype tool which implements a novel, general and fully automated technique for the SAT-based analysis of JML-annotated Java sequential programs dealing with complex linked data structures and can uncover bugs that cannot be detected by state-of-the-art tools based on SAT-s solving, model checking or SMT-solving.
Abstract: SAT-based bounded verification of annotated code consists of translating the code together with the annotations to a propositional formula, and analyzing the formula for specification violations using a SAT-solver. If a violation is found, an execution trace exposing the error is exhibited. Code involving linked data structures with intricate invariants is particularly hard to analyze using these techniques.In this article we present TACO, a prototype tool which implements a novel, general and fully automated technique for the SAT-based analysis of JML-annotated Java sequential programs dealing with complex linked data structures. We instrument code analysis with a symmetry-breaking predicate that allows for the parallel, automated computation of tight bounds for Java fields. Experiments show that the translations to propositional formulas require significantly less propositional variables, leading in the experiments we have carried out to an improvement on the efficiency of the analysis of orders of magnitude, compared to the non-instrumented SAT-based analysis. We show that, in some cases, our tool can uncover bugs that cannot be detected by state-of-the-art tools based on SAT-solving, model checking or SMT-solving.
TL;DR: This work presents an efficient novel static typestate analysis that is flow-sensitive, partially context- sensitive, and that generates residual runtime monitors, and uses an additional backward analysis to partition states into equivalence classes.
Abstract: Typestate analysis determines whether a program violates a set of finite-state properties. Because the typestate-analysis problem is statically undecidable, researchers have proposed a hybrid approach that uses residual monitors to signal property violations at runtime. We present an efficient novel static typestate analysis that is flow-sensitive, partially context-sensitive, and that generates residual runtime monitors. To gain efficiency, our analysis uses precise, flow-sensitive information on an intra-procedural level only, and models the remainder of the program using a flow-insensitive pointer abstraction. Unlike previous flow-sensitive analyses, our analysis uses an additional backward analysis to partition states into equivalence classes. Code locations that transition between equivalent states are irrelevant and require no monitoring. As we show in this work, this notion of equivalent states is crucial to obtaining sound runtime monitors. We proved our analysis correct, implemented the analysis in the Clara framework for typestate analysis, and applied it to the DaCapo benchmark suite. In half of the cases, our analysis determined exactly the property-violating program points. In many other cases, the analysis reduced the number of instrumentation points by large amounts, yielding significant speed-ups during runtime monitoring.
TL;DR: A complementary approach based on exploiting parallelism is described, which achieves a scaling of up to 3x on a 8-core machine for a suite of ten large C programs and outperforms a state-of-the-art, highly optimized, serial implementation of the same algorithm.
Abstract: Inclusion-based points-to analysis provides a good trade-off between precision of results and speed of analysis, and it has been incorporated into several production compilers including gcc. There is an extensive literature on how to speed up this algorithm using heuristics such as detecting and collapsing cycles of pointer-equivalent variables. This paper describes a complementary approach based on exploiting parallelism. Our implementation exploits two key insights. First, we show that inclusion-based points-to analysis can be formulated entirely in terms of graphs and graph rewrite rules. This exposes the amorphous data-parallelism in this algorithm and makes it easier to develop a parallel implementation. Second, we show that this graph-theoretic formulation reveals certain key properties of the algorithm that can be exploited to obtain an efficient parallel implementation. Our parallel implementation achieves a scaling of up to 3x on a 8-core machine for a suite of ten large C programs. For all but the smallest benchmarks, the parallel analysis outperforms a state-of-the-art, highly optimized, serial implementation of the same algorithm. To the best of our knowledge, this is the first parallel implementation of a points-to analysis.
TL;DR: In this paper, a fully coupled thermo-mechanical analysis of one-layered and multilayered isotropic and composite plates is proposed, where the temperature is considered a primary variable as the displacement.
TL;DR: Testing Manimal on several standard MapReduce programs, it is shown that selection alone can automatically reduce a standard program's runtime to 63% of conventional Map Reduce execution time on identical hardware.
Abstract: The MapReduce distributed programming framework is very popular, but currently lacks the optimization techniques that have been standard with relational database systems for many years. This paper proposes Manimal, which uses static code analysis to detect MapReduce program semantics and thereby enable wholly-automatic optimization of MapReduce programs. For example, a programmer's map function that emits data only when an if... statement holds true is essentially encoding a selection condition; code analysis can detect and characterize these conditions. If Manimal has an appropriate index available, it can then alter MapReduce execution to use it.Manimal can address many different optimization opportunities, including projections, structure-aware data compression, and others. However, this paper illustrates the system by focusing on one: efficient selection. We give a static analysis algorithm that can detect selections in user programs, and cover how Manimal can employ a B+Tree to execute these selections efficiently at runtime. Testing Manimal on several standard MapReduce programs, we show that selection alone can automatically reduce a standard program's runtime to 63% of conventional MapReduce execution time on identical hardware. We also give an in-depth discussion of other optimization targets and detection techniques.
TL;DR: It is shown that checking the extremality of a point reduces to checking whether there is only one minimal strongly connected component in an hypergraph, and the latter problem can be solved in almost linear time, which allows to eliminate quickly redun- dant generators.
Abstract: We develop a tropical analogue of the classical double description method allowing one to compute an internal representation (in terms of vertices) of a polyhedron defined externally (by inequalities). The heart of the tropical algorithm is a characterization of the extreme points of a polyhedron in terms of a system of constraints which define it. We show that checking the extremality of a point reduces to checking whether there is only one minimal strongly connected component in an hypergraph. The latter problem can be solved in almost linear time, which allows us to eliminate quickly redundant generators. We report extensive tests (including benchmarks from an application to static analysis) showing that the method outperforms experimentally the previous ones by orders of magnitude. The present tools also lead to worst case bounds which improve the ones provided by previous methods.
TL;DR: It is speculated that differential static analysis tools have the potential to be widely deployed on the developer's toolbox despite the fundamental stumbling blocks that limit the adoption of static analysis.
Abstract: It is widely believed that program analysis can be more closely targeted to the needs of programmers if the program is accompanied by further redundant documentation. This may include regression test suites, API protocol usage, and code contracts. To this should be added the largest and most redundant text of all: the previous version of the same program. It is the differences between successive versions of a legacy program already in use which occupy most of a programmer's time. Although differential analysis in the form of equivalence checking has been quite successful for hardware designs, it has not received as much attention in the static program analysis community.This paper briefly summarizes the current state of the art in differential static analysis for software, and suggests a number of promising applications. Although regression test generation has often been thought of as the ultimate goal of differential analysis, we highlight several other applications that can be enabled by differential static analysis. This includes equivalence checking, semantic diffing, differential contract checking, summary validation, invariant discovery and better debugging. We speculate that differential static analysis tools have the potential to be widely deployed on the developer's toolbox despite the fundamental stumbling blocks that limit the adoption of static analysis.
TL;DR: This paper presents the ReFEM method for static structural analysis, a novel reanalysis-based finite element method that is explicitly suited for optimisation-based black-box techniques.
TL;DR: This work presents a declarative domain-specific language, Ypnos, for expressing structured grid computations which encourages manual specification of causally sequential operations but then allows a simple, predictable, static analysis to generate optimised, parallel implementations.
Abstract: A fully automatic, compiler-driven approach to parallelisation can result in unpredictable time and space costs for compiled code. On the other hand, a fully manual approach to parallelisation can be long, tedious, prone to errors, hard to debug, and often architecture-specific. We present a declarative domain-specific language, Ypnos, for expressing structured grid computations which encourages manual specification of causally sequential operations but then allows a simple, predictable, static analysis to generate optimised, parallel implementations. We introduce the language and provide some discussion on the theoretical aspects of the language semantics, particularly the structuring of computations around the category theoretic notion of a comonad.
TL;DR: In this paper, the authors present an online algorithm for reducing SMT formulas to a simplified form containing no redundant subparts and demonstrate that on-line simplification of formulas dramatically improves scalability.
Abstract: Static analysis techniques that represent program states as formulas typically generate a large number of redundant formulas that are incrementally constructed from previous formulas. In addition to querying satisfiability and validity, analyses perform other operations on formulas, such as quantifier elimination, substitution, and instantiation, most of which are highly sensitive to formula size. Thus, the scalability of many static analysis techniques requires controlling the size of the generated formulas throughout the analysis. In this paper, we present a practical algorithm for reducing SMT formulas to a simplified form containing no redundant subparts. We present experimental evidence that on-line simplification of formulas dramatically improves scalability.
TL;DR: This work presents a technique that uses both static and dynamic analysis of object-oriented source code to improve resulting impact estimates in terms of recall, and shows that the hybrid technique improved recall between 90 and 115% compared to the static technique, and between 21.2 and 39%Compared to the dynamic one.
Abstract: Change impact analysis techniques that underestimate impact may cause important financial losses from the point of view of an IT services company. Thus, reducing false-negatives in these techniques is a goal with strong practical relevance. This work presents a technique that uses both static and dynamic analysis of object-oriented source code to improve resulting impact estimates in terms of recall. The technique consists of three steps: static analysis to identify structural dependencies between code entities, dynamic analysis to identify dependencies based on a succession relation derived from execution traces, and a ranking of results from both analyses that takes into account the relevance of dynamic dependencies. Evaluation was performed through prototype development and a multiple-case quantitative case study that compared our solution against a static technique and a dynamic one. Results showed that our hybrid technique improved recall between 90 and 115% compared to the static technique, and between 21.2 and 39% compared to the dynamic one.
TL;DR: A scheme is proposed, complementary to control flow graph flattening, which does not leak any control flowgraph information statically and can specify which minimum of information to hide from the program such that no control flow information is leaked.
Abstract: This paper proposes a general model for hiding control flow graph flattening in C programs. We explain what control flow graph flattening is and illustrate why it is successful as protection against static control flow analysis. Furthermore, we propose a scheme, complementary to control flow graph flattening, which does not leak any control flow graph information statically. Instead of relying on ad hoc security by using variable aliasing and global pointers to complicate data flow analysis of the switch variable, we try to base our security claims more on information theory, data flow, and cryptography. Our formal model is structured and extendable. Moreover, it can specify which minimum of information to hide from the program (e.g. a secret value or function) such that no control flow information is leaked. To express the robustness of our scheme we present some attacks and their feasibility. Finally, we sketch a few scenarios in which our solution could be deployed.
TL;DR: In this paper, a spectral method is proposed for solving static and dynamic problems in reinforced concrete beams in a unified way, and the influence of interfacial delaminations on the dynamic characteristics of the structures is studied.
TL;DR: The key to precision and scalability in all formal methods for static program analysis and verification is the handling of disjunctions arising in relational analyses.
Abstract: The key to precision and scalability in all formal methods for static program analysis and verification is the handling of disjunctions arising in relational analyses, the flow-sensitive traversal of conditionals and loops, the context-sensitive inter-procedural calls, the interleaving of concurrent threads, etc. Explicit case enumeration immediately yields to combinatorial explosion. The art of scalable static analysis is therefore to abstract disjunctions to minimize cost while preserving weak forms of disjunctions for expressivity.
TL;DR: In this paper, the authors proposed a system and method for detecting a malicious script in web pages, which includes a script decomposition module for decomposing a web page into scripts, a static analysis module for statically analyzing the decomposed scripts in the form of a document file, a dynamic analysis module that dynamically executes and analyzes the decoded scripts, and a comparison module for comparing an analysis result of the static analysis and an analysis results of the dynamic analysis to determine whether the decomosed scripts are malicious scripts.
Abstract: Provided are a system and method for detecting a malicious script. The system includes a script decomposition module for decomposing a web page into scripts, a static analysis module for statically analyzing the decomposed scripts in the form of a document file, a dynamic analysis module for dynamically executing and analyzing the decomposed scripts, and a comparison module for comparing an analysis result of the static analysis module and an analysis result of the dynamic analysis module to determine whether the decomposed scripts are malicious scripts. The system and method can recognize a hidden dangerous hypertext markup language (HTML) tag irrespective of an obfuscation technique for hiding a malicious script in a web page and thus can cope with an unknown obfuscation technique.
TL;DR: A geometrico-static model is provided, together with a general procedure aimed at effectively solving, in analytical form, the inverse and direct position problems of cable-driven parallel robots with less than six cables, in crane configuration.
Abstract: This paper studies the kinematics and statics of cable-driven parallel robots with less than six cables, in crane configuration. A geometrico-static model is provided, together with a general procedure aimed at effectively solving, in analytical form, the inverse and direct position problems. The stability of equilibrium is assessed within the framework of a constrained optimization problem, for which a purely algebraic formulation is provided. A spatial robot with three cables is studied as an application example.
TL;DR: A full architecture to perform static analysis on binaries that does not rely on unsound external components such as disassemblers is described, and Bounded Address Tracking is introduced, an abstract domain that is tailored towards machine code and is path sensitive up to a tunable bound assuring termination.
Abstract: Most closed source drivers installed on desktop systems today have never been exposed to formal analysis. Without vendor support, the only way to make these often hastily written, yet critical programs accessible to static analysis is to directly work at the binary level. In this paper, we describe a full architecture to perform static analysis on binaries that does not rely on unsound external components such as disassemblers. To precisely calculate data and function pointers without any type information, we introduce Bounded Address Tracking, an abstract domain that is tailored towards machine code and is path sensitive up to a tunable bound assuring termination. We implemented Bounded Address Tracking in our binary analysis platform Jakstab and used it to verify API specifications on several Windows device drivers. Even without assumptions about executable layout and procedures as made by state of the art approaches [1], we achieve more precise results on a set of drivers from the Windows DDK. Since our technique does not require us to compile drivers ourselves, we also present results from analyzing over 300 closed source drivers.
TL;DR: In this paper, an automatic debugging system and method for automatically identifying a source of a run-time error in a computer system is disclosed, which comprises a static analysis system, an instrumentation system and a post-execution analysis system.
Abstract: An automatic debugging system and method for automatically identifying a source of a run- time error in a computer system is disclosed. The debugging system comprises a static analysis system, an instrumentation system and a post-execution analysis system. The static analysis system is arranged to generate static analysis data on computer program code for the computer system, the static analysis data including information on possible behaviours of the computer program code when executed. The instrumentation system is arranged to instrument the computer program code by inserting marker triggers into the computer program code, the marker triggers being arranged to generate a marker associated with each of a number of predetermined points in the computer program code that would be reached during execution of the computer program code, each marker being uniquely identifiable and the points being determined in dependence on the static analysis data. The post execution analysis system is arranged to process data on a run-time error produced by execution of said instrumented computer program code, the generated markers and the static analysis data to identify the source of the run-time error.
TL;DR: Sawja as mentioned in this paper is a static analysis workshop for Java that provides OCaml modules for efficiently manipulating Java bytecode programs, including efficient functional data-structures for representing a program with implicit sharing and lazy parsing, an intermediate stackless representation, and fast computation and manipulation of complete programs.
Abstract: Static analysis is a powerful technique for automatic verification of programs but raises major engineering challenges when developing a full-fledged analyzer for a realistic language such as Java. Efficiency and precision of such a tool rely partly on low level components which only depend on the syntactic structure of the language and therefore should not be redesigned for each implementation of a new static analysis. This paper describes the Sawja library: a static analysis workshop fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main features of the library, including i) efficient functional data-structures for representing a program with implicit sharing and lazy parsing, ii) an intermediate stack-less representation, and iii) fast computation and manipulation of complete programs. We provide experimental evaluations of the different features with respect to time, memory and precision.
TL;DR: In this article, the authors propose a method for automatically generating abstract transformers for static analysis by abstract interpretation, focusing on linear constraints on programs operating on rational, real or floating-point variables and containing linear assignments and tests.
Abstract: We propose a method for automatically generating abstract transformers for static analysis by abstract interpretation. The method focuses on linear constraints on programs operating on rational, real or floating-point variables and containing linear assignments and tests. In addition to loop-free code, the same method also applies for obtaining least fixed points as functions of the precondition, which permits the analysis of loops and recursive functions. Our algorithms are based on new quantifier elimination and symbolic manipulation techniques. Given the specification of an abstract domain, and a program block, our method automatically outputs an implementation of the corresponding abstract transformer. It is thus a form of program transformation. The motivation of our work is data-flow synchronous programming languages, used for building control-command embedded systems, but it also applies to imperative and functional programming.
TL;DR: This paper introduces criteria of termination, denoted by TOC, DTOC and MTOC, that allow the efficient computation of universal solutions for standard constraints, disjunctive constraints, and when the source instance is assumed to be immutable (i.e., it is master data), respectively.
Abstract: A schema-mapping is a high level specification of a data-exchange setting where a set of source-to-target dependencies is used to realize basic operations from source to target relations (such as copy, selection, join or union) while the target schema is subject to a set of target constraints (such as inclusion dependencies or key constraints). In this paper, we consider strong schema-mappings that allow for additional constraints such as source dependencies on the source schema and target-to-source dependencies from the target relations back to the source. Furthermore, strong schema-mappings may include disjunctive dependencies. We argue that this extension is desirable when the source instance is to provide both a lower and upper bound on the information that a target instance can have.We first focus on the implication problem for strong schema-mappings which is to determine whether a given constraint δ is logically implied by the set Σ of constraints (denoted by σ v δ). After providing complete characterizations for this problem in terms of universal solutions (while supporting equality constraints), we introduce criteria of termination, denoted by TOC, DTOC and MTOC, that allow the efficient computation of universal solutions for standard constraints, disjunctive constraints, and when the source instance is assumed to be immutable (i.e., it is master data), respectively. We obtain decision procedures for the implication problem, provided that Σ satisfies these termination conditions, and give the corresponding complexity bounds. As an immediate application we revisit the problems of determinacy, relative information completeness and variations thereof, all for strong schema-mappings. Indeed, by viewing them as implication problems we obtain efficient decision procedures when the relevant termination conditions are satisfied.We then focus on the problem of deciding whether source-to-target constraints in a strong schema-mapping are already implied by the embedded (standard) schema-mapping. This problem is important if one wants to use target-to-source constraints in standard data-exchange tools. Since no such constraints are logically implied by standard schema-mappings (and hence the results established earlier are of no use), we provide an alternative semantics for implication. More specifically, we want the constraint to be satisfied by every solution corresponding to the output of a standard data-exchange tool. We consider three semantics based on universal solutions, cores and CWA-solutions, respectively. Decidability of the implication of general (resp. safe) target-to-source constraints is shown for the CWA-based semantics (resp. core-semantics).
TL;DR: The goal of this paper is to investigate how various refinements of allocation sites can improve precision, in particular, abstractions that use call stack, object recency, and heap connectivity information.
Abstract: The quality of a static analysis of heap-manipulating programs is largely determined by its heap abstraction. Object allocation sites are a commonly-used abstraction, but are too coarse for some clients. The goal of this paper is to investigate how various refinements of allocation sites can improve precision. In particular, we consider abstractions that use call stack, object recency, and heap connectivity information. We measure the precision of these abstractions dynamically for four different clients motivated by concurrency and on nine Java programs chosen from the DaCapo benchmark suite. Our dynamic results shed new light on aspects of heap abstractions that matter for precision, which allows us to more effectively navigate the large space of possible heap abstractions
TL;DR: Staged static analysis is explored as a way to analyze streaming JavaScript programs and it is found that in normal use, where updates to the code are small, it can update static analysis results quickly enough in the browser to be acceptable for everyday use.
Abstract: The advent of Web 2.0 has led to the proliferation of client-side code that is typically written in JavaScript. Recently, there has been an upsurge of interest in static analysis of client-side JavaScript for applications such as bug finding and optimization. However, most approaches in static analysis literature assume that the entire program is available to analysis. This, however, is in direct contradiction with the nature of Web 2.0 programs that are essentially being streamed at the user's browser. Users can see data being streamed to pages in the form of page updates, but the same thing can be done with code, essentially delaying the downloading of code until it is needed. In essence, the entire program is never completely available. Interacting with the application causes more code to be sent to the browser.
This paper explores staged static analysis as a way to analyze streaming JavaScript programs. We observe while there is variance in terms of the code that gets sent to the client, much of the code of a typical JavaScript application can be determined statically. As a result, we advocate the use of combined offline-online static analysis as a way to accomplish fast, browser-based client-side online analysis at the expense of a more thorough and costly server-based offline analysis on the static code. We find that in normal use, where updates to the code are small, we can update static analysis results quickly enough in the browser to be acceptable for everyday use. We demonstrate the staged analysis approach to be advantageous especially in mobile devices, by experimenting on popular applications such as Facebook.