TL;DR: This paper introduces a theory for estimating the error propagation in abstract interpretation, and hence in program analysis, and introduces a proof system for estimating an upper bound of the error accumulated by the abstract interpreter during program analysis.
Abstract: Imprecision is inherent in any decidable (sound) approximation of undecidable program properties. In abstract interpretation this corresponds to the release of false alarms, e.g., when it is used for program analysis and program verification. As all alarming systems, a program analysis tool is credible when few false alarms are reported. As a consequence, we have to live together with false alarms, but also we need methods to control them. As for all approximation methods, also for abstract interpretation we need to estimate the accumulated imprecision during program analysis. In this paper we introduce a theory for estimating the error propagation in abstract interpretation, and hence in program analysis. We enrich abstract domains with a weakening of a metric distance. This enriched structure keeps coherence between the standard partial order relating approximated objects by their relative precision and the effective error made in this approximation. An abstract interpretation is precise when it is complete. We introduce the notion of partial completeness as a weakening of precision. In partial completeness the abstract interpreter may produce a bounded number of false alarms. We prove the key recursive properties of the class of programs for which an abstract interpreter is partially complete with a given bound of imprecision. Then, we introduce a proof system for estimating an upper bound of the error accumulated by the abstract interpreter during program analysis. Our framework is general enough to be instantiated to most known metrics for abstract domains.
TL;DR: This work proposes a new program analysis approach that aims at solving program-level and procedure-level tasks with one model, by taking advantage of the great power of graph neural networks from the level of binary code, and can effectively work around emerging compilation-related problems.
Abstract: With the rapid growth of program scale, program analysis, mainte-nance and optimization become increasingly diverse and complex. Applying learning-assisted methodologies onto program analysis has attracted ever-increasing attention. However, a large number of program factors including syntax structures, semantics, running platforms and compilation configurations block the effective re-alization of these methods. To overcome these obstacles, existing works prefer to be on a basis of source code or abstract syntax tree, but unfortunately are sub-optimal for binary-oriented analysis tasks closely related to the compilation process. To this end, we propose a new program analysis approach that aims at solving program-level and procedure-level tasks with one model, by taking advantage of the great power of graph neural networks from the level of binary code. By fusing the semantics of control flow graphs, data flow graphs and call graphs into one model, and embedding instructions and values simultaneously, our method can effectively work around emerging compilation-related problems. By testing the proposed method on two tasks, binary similarity detection and dead store prediction, the results show that our method is able to achieve as high accuracy as 83.25%, and 82.77%.
TL;DR: In this article, the authors present a number of linear-time implementations of graph algorithms in GP 2, an experimental programming language based on graph transformation rules which aims to facilitate program analysis and verification.
TL;DR: This paper presents Kex, a platform for building program analysis tools for JVM bytecode, which provides three abstraction levels and proves that Kex can be used to implement competitive and powerful programAnalysis tools.
Abstract: Introduction: Over the last years program analysis methods were widely used for software quality assurance. Different types of program analysis require various levels of program representation, analysis methods, etc. Platforms that provide utilities to implement different types of analysis on their basis become very important because they allow one to simplify the process of development. Purpose: Development of a platform for analysis of JVM programs. Results: In this paper we present Kex, a platform for building program analysis tools for JVM bytecode. Kex provides three abstraction levels. First is Kfg, which is an SSA-based control flow graph representation for bytecode-level analysis and transformation. Second is a symbolic program representation called Predicate State, which consists of first order logic predicates that represent instructions of the original program, constraints, etc. The final level is SMT integration layer for constraint solving. It currently provides an interface for interacting with three SMT solvers. Practical relevance: We have evaluated our platform by considering two prototypes. First prototype is an automatic test generation tool that has participated in SBST 2021 tool competition. Second prototype is a tool for detection of automatic library integration errors. Both prototypes have proved that Kex can be used to implement competitive and powerful program analysis tools.
TL;DR: In this article , the authors present an accurate and efficient approach for IR mapping, which uses one-to-one correspondence between IDs of unchanged IR objects for incremental analysis of programs.
Abstract: An Intermediate Representation (IR) is a data structure to represent a program. It represents each program entity as an object (IR object), having a unique identification number (ID). Static program analysis tools perform analysis on IRs of the input program and compute analysis information at program points - which are represented as IR objects. The analysis information is stored against their corresponding ID. Performing incremental analysis on evolving systems involve the reuse of analysis information for the unchanged IR objects between versions of a program. However, the IDs changes over the version with the change. This acts as an obstacle to the reuse of analysis information. To overcome this, a one-to-one correspondence between IDs of unchanged IR objects is necessary. We term this correspondence as IR mapping. This paper presents an accurate and efficient approach for IR mapping. We formally proved the correctness of our IR mapping technique. We evaluated the time consumption of our technique on versions of a core banking application. We found that our approach consumes on an average of 5.7% of the total time taken by incremental analysis of programs ranging from 9K- 87K LoC.
TL;DR: In this article , the authors propose χ -terms as a means to capture and manipulate context-sensitive program information in a data-flow analysis, which is implemented as directed acyclic graphs without any redundant subgraphs.
Abstract: Abstract Static program analysis is in general more precise if it is sensitive to execution contexts (execution paths). But then it is also more expensive in terms of memory consumption. For languages with conditions and iterations, the number of contexts grows exponentially with the program size. This problem is not just a theoretical issue. Several papers evaluating inter-procedural context-sensitive data-flow analysis report severe memory problems, and the path-explosion problem is a major issue in program verification and model checking. In this paper we propose χ -terms as a means to capture and manipulate context-sensitive program information in a data-flow analysis. χ -terms are implemented as directed acyclic graphs without any redundant subgraphs. We introduce the k -approximation and the l -loop-approximation that limit the size of the context-sensitive information at the cost of analysis precision. We prove that every context-insensitive data-flow analysis has a corresponding k , l -approximated context-sensitive analysis, and that these analyses are sound and guaranteed to reach a fixed point. We also present detailed algorithms outlining a compact, redundancy-free, and DAG-based implementation of χ -terms.
TL;DR: In this article , the notion of flow-insensitive-completeness is formalized with two collecting semantics and a program transformation that permits to analyze a program in a flow insensitive manner without sacrificing the precision we could obtain with a flow sensitive approach.
Abstract: When designing a static analysis, choosing between a flow-insensitive or a flow-sensitive analysis often amounts to favor scalability over precision. It is well known than specific program representations can help to reconcile the two objectives at the same time. For example the SSA representation is used in modern compilers to perform a constant propagation analysis flow-insensitively without any loss of precision. This paper proposes a provably correct program transformation that reconciles them for any analysis. We formalize the notion of Flow-Insensitive-Completeness with two collecting semantics and provide a program transformation that permits to analyze a program in a flow insensitive manner without sacrificing the precision we could obtain with a flow sensitive approach.
TL;DR: The advent of a new program analysis paradigm that allows anyone to make precise statements about the behavior of programs as they run in production across hundreds and millions of machines or devices is discussed.
Abstract: We discuss the advent of a new program analysis paradigm that allows anyone to make precise statements about the behavior of programs as they run in production across hundreds and millions of machines or devices. The scale-oblivious, in vivo program analysis leverages an almost inconceivable rate of user-generated program executions across large fleets to analyze programs of arbitrary size and composition with negligible performance overhead. In this paper, we reflect on the program analysis problem, the prevalent paradigm, and the practical reality of program analysis at large software companies. We illustrate the new paradigm using several success stories and suggest a number of exciting new research directions.
TL;DR: Sydr as discussed by the authors proposes a strong optimistic solving method that eliminates irrelevant path predicate constraints for target branch inversion and separately handles symbolic branches that have nested control transfer instructions that pass control beyond the parent branch scope, e.g. return, goto, break, etc.
Abstract: Dynamic symbolic execution (DSE) is an effective method for automated program testing and bug detection. It is increasing the code coverage by the complex branches exploration during hybrid fuzzing. DSE tools invert the branches along some execution path and help fuzzer examine previously unavailable program parts. DSE often faces over- and underconstraint problems. The first one leads to significant analysis complication while the second one causes inaccurate symbolic execution.We propose strong optimistic solving method that eliminates irrelevant path predicate constraints for target branch inversion. We eliminate such symbolic constraints that the target branch is not control dependent on. Moreover, we separately handle symbolic branches that have nested control transfer instructions that pass control beyond the parent branch scope, e.g. return, goto, break, etc. We implement the proposed method in our dynamic symbolic execution tool Sydr.We evaluate the strong optimistic strategy, the optimistic strategy that contains only the last constraint negation, and their combination. The results show that the strategies combination helps increase either the code coverage or the average number of correctly inverted branches per one minute. It is optimal to apply both strategies together in contrast with other configurations.
TL;DR: An extension of the Java bytecode instrumentation tool BISM is presented that captures and prepares a model that abstracts the program behavior at the intra-procedural level and is presented for the users to write static analyzers and combine both static and runtime verification.
Abstract: In this paper, we present an extension of the Java bytecode instrumentation tool BISM that captures and prepares a model that abstracts the program behavior at the intra-procedural level. We analyze program methods we are interested in monitoring and construct a control-flow graph automaton where the states represent actions of the program that produce events. Directed towards monitoring general behavioral properties at runtime, the resulting model is presented for the users to write static analyzers and combine both static and runtime verification.
TL;DR: Tai-e as discussed by the authors is a new static analysis framework for Java that provides a series of fundamental services such as program abstraction, control flow graph construction, and points-to/alias information computation.
Abstract: Static analysis is a mature field with applications to bug detection, security analysis, and code optimization, etc. To facilitate these applications, static analysis frameworks play an essential role by providing a series of fundamental services such as program abstraction, control flow graph construction, and points-to/alias information computation, etc. However, despite impressive progress of static analysis, and this field has seen several popular frameworks in the last decades, it is still not clear how a static analysis framework should be designed in a way that analysis developers could benefit more: for example, what a good IR (for analysis) ought to look like? What functionalities should the module of fundamental analyses provide to ease client analyses? How to develop and integrate new analysis conveniently? How to manage multiple analyses? To answer these questions, in this work, we discuss the design trade-offs for the crucial components of a static analysis framework, and argue for the most appropriate design by following the HBDC (Harnessing the Best Designs of Classics) principle: for each crucial component, we compare the design choices made for it (possibly) by different classic frameworks such as Soot, WALA, SpotBugs and Doop, and choose arguably the best one, but if none is good enough, we then propose a better design. These selected or newly proposed designs finally constitute Tai-e, a new static analysis framework for Java. Specifically, Tai-e is novel in the designs of several aspects like IR, pointer analysis and development of new analyses, etc., leading to an easy-to-learn, easy-to-use and efficient system. To our knowledge, this is the first work that systematically explores the designs and implementations of various static analysis frameworks, and we believe it provides useful materials and viewpoints for building better static analysis infrastructures.
TL;DR: In this article , the authors propose a framework for interactive abstract interpretation in static program analysis, including postprocessing, without necessitating any modifications to the analysis specifications themselves, and they use lazy invalidation for analysis results affected by program change.
Abstract: To put static program analysis at the fingertips of the software developer, we propose a framework for interactive abstract interpretation. While providing sound analysis results, abstract interpretation in general can be quite costly. To achieve quick response times, we incrementalize the analysis infrastructure, including postprocessing, without necessitating any modifications to the analysis specifications themselves. We rely on the local generic fixpoint engine TD, which dynamically tracks dependencies, while exploring the unknowns contributing to answering an initial query. Lazy invalidation is employed for analysis results affected by program change. Dedicated improvements support the incremental analysis of concurrency deficiencies such as data-races. The framework has been implemented for multithreaded C within the static analyzer Goblint, using MagpieBridge to relay findings to IDEs. We evaluate our implementation w.r.t. the yard sticks of response time and consistency: formerly proven invariants should be retained - when they are not affected by the change. The results indicate that with our approach, a reanalysis after small changes only takes a fraction of from-scratch analysis time, while most of the precision is retained. We also provide examples of program development highlighting the usability of the overall approach.
TL;DR: The evaluation results show that static, dynamic and symbolic analysis techniques fail to identify the hidden information in Ambitr, and it is demonstrated that manual analysis of Am Bitr is extremely challenging.
Abstract: Software systems may contain critical program components such as patented program logic or sensitive data. When those components are reverse-engineered by adversaries, it can cause significantly damage (e.g., financial loss or operational failures). While protecting critical program components (e.g., code or data) in software systems is of utmost importance, existing approaches, unfortunately, have two major weaknesses: (1) they can be reverse-engineered via various program analysis techniques and (2) when an adversary obtains a legitimate-looking critical program component, he or she can be sure that it is genuine. In this paper, we propose Ambitr, a novel technique that hides critical program components. The core of Ambitr is Ambiguous Translator that can generate the critical program components when the input is a correct secret key. The translator is ambiguous as it can accept any inputs and produces a number of legitimate-looking outputs, making it difficult to know whether an input is correct secret key or not. The executions of the translator when it processes the correct secret key and other inputs are also indistinguishable, making the analysis inconclusive. Our evaluation results show that static, dynamic and symbolic analysis techniques fail to identify the hidden information in Ambitr. We also demonstrate that manual analysis of Ambitr is extremely challenging.
TL;DR: In this article , the authors introduce the concept of rule graphs that expose the developer selected information about the internal rules of data-flow analyses, which can help developers understand how the underlying analyses interpret the analyzed code and their reasoning for reporting certain warnings.
Abstract: As static data-flow analysis becomes able to report increasingly complex bugs, using an evergrowing set of complex internal rules encoded into flow functions, the analysis tools themselves grow more and more complex. In result, for users to be able to effectively use those tools on specific codebases, they require special configurations—a task which in industry is typically performed by individual developers or dedicated teams. To efficiently use and configure static analysis tools, developers need to build a certain understanding of the analysis’ rules, i.e., how the underlying analyses interpret the analyzed code and their reasoning for reporting certain warnings. In this article, we explore how to assist developers in understanding the analysis’ warnings, and finding weaknesses in the analysis’ rules. To this end, we introduce the concept of rule graphs that expose to the developer selected information about the internal rules of data-flow analyses. We have implemented rule graphs on top of a taint analysis, and show how the graphs can support the abovementioned tasks. Our user study and empirical evaluation show that using rule graphs helps developers understand analysis warnings more accurately than using simple warning traces, and that rule graphs can help developers identify causes for false positives in analysis rules.
TL;DR: In this paper , a node locator data structure is introduced that maps between source code spans and program representation nodes, and helps identify probed nodes in a robust way, after modifications to the source code.
Abstract: We present property probes, a mechanism for helping a developer interactively explore partial program analysis results in terms of the source program, and as the program is edited. A node locator data structure is introduced that maps between source code spans and program representation nodes, and that helps identify probed nodes in a robust way, after modifications to the source code. We have developed a client-server based tool supporting property probes, and argue that it is very helpful in debugging and understanding program analyses. We have evaluated our tool on several languages and analyses, including a full Java compiler and a tool for intraprocedural dataflow analysis. Our performance results show that the probe overhead is negligible even when analyzing large projects.
TL;DR: In this article , the authors propose a points-to analysis that can recover targets for function pointer calls, virtual calls and method calls for using in a static analysis, and the analysis results are intended for flow- and path-sensitive analysis.
Abstract: We propose a points-to analysis that can recover targets for function pointer calls, virtual calls and method calls for using in a static analysis. We use a flow-insensitive analysis, and the analysis results are intended for flow- and path-sensitive analysis which can improve the initial analysis precision within a single function. We implemented the proposed approach in a static analyzer for finding errors in C, C++, Go, Java and Kotlin programs. The devirtualization algorithm is fast enough and spends less than 6% of the total analysis time. It can work for projects like Tizen 7 with 27.5 MLoc of source code.
TL;DR: The SPLNum2Analyzer as mentioned in this paper analyzes #if-annotated C programs using a lifted domain, which is parametric in the choice of the domains for representing linear constraints and leaf nodes.
TL;DR: In this paper , a cross-level characterization approach for understanding the behavior of program execution at different levels in the process of writing, compiling, and running a program is presented, providing a richer view of the sources of performance gains and losses and helps identify program execution in a more accurate manner.
Abstract: Characterization of program execution plays a key role in performance improvement. There are numerous transformations applied to each step a program takes on its lowering from source code to a compiler intermediate representation to machine language to microarchitecture-specific execution. The unpredictable benefit of each transformation step could lead a notionally superior algorithm to exhibit inferior performance once actually run, and it can be opaque at what step in the transformation path contradicted the code developer's assumptions. However, conventional approaches to program execution characterization consider the behavior after only a single one of those steps, which limits the information that can be provided to the user. To help address the issue of myopic views of program execution, this paper presents a novel cross-level characterization approach for understanding the behavior of program execution at different levels in the process of writing, compiling, and running a program. We show that this approach provides a richer view of the sources of performance gains and losses and helps identify program execution in a more accurate manner.
TL;DR: In this paper , the authors focus on dead code detection and test input generation using symbolic execution and show that the symbolic execution method can be used to reduce symbolic execution time and to find out the unreachable path with less number of test cases.
Abstract: Testing is a well-known technique for identifying errors in software programs. Testing can be done in two ways: Static analysis and Dynamic analysis. Symbolic execution plays a vital role in static analysis for test case generation and to find the unreachable path with minimum test cases. Unreachable path is a part of a program which can never be executed i.e., the symbolic execution doesn’t continue for that path and the current execution stops there. It generates a test suite for loop-free programs that is achieved by path coverage. In the best case program loops implies increase in the number of paths exponentially and in the worst case the program will not terminate. The functions of symbolic execution are test input generation, unreachable path detection, finding bugs in software programs, debugging. In this paper, we focus on dead code detection and test input generation using symbolic execution. Our execution for Java programs uses Java Path Finder (JPF) model tester. Our analysis shows that the symbolic execution method can be used to reduce symbolic execution time and to find out the unreachable path with less number of test cases.
TL;DR: The mwp-flow analysis as discussed by the authors is a certified complexity analysis for C programs that is based on implicit computational complexity (ICC) and is able to determine polynomial bounds on the size of the values manipulated by an imperative program.
Abstract: Implicit Computational Complexity (ICC) drives better understanding of complexity classes, but it also guides the development of resources-aware languages and static source code analyzers. Among the methods developed, the mwp-flow analysis certifies polynomial bounds on the size of the values manipulated by an imperative program. This result is obtained by bounding the transitions between states instead of focusing on states in isolation, as most static analyzers do, and is not concerned with termination or tight bounds on values. Those differences, along with its built-in compositionality, make the mwp-flow analysis a good target for determining how ICC-inspired techniques diverge compared with more traditional static analysis methods. This paper's contributions are threefold: we fine-tune the internal machinery of the original analysis to make it tractable in practice; we extend the analysis to function calls and leverage its machinery to compute the result of the analysis efficiently; and we implement the resulting analysis as a lightweight tool to automatically perform data-size analysis of C programs. This documented effort prepares and enables the development of certified complexity analysis, by transforming a costly analysis into a tractable program, that furthermore decorrelates the problem of deciding if a bound exist with the problem of computing it.
TL;DR: This paper proposes a binary program dynamic slicing technology that uses the information related to the reading and writing of registers and memory addresses in the program execution trace to find the relationship of the data flow and control flow between the two instructions.
Abstract: In cause analysis of vulnerability, multi-level dereference of pointer and array element index analysis are often encountered at the code level, which is reflected in the case of indirect addressing at the assembly level. At present, the program slicing technology commonly used for cause analysis of vulnerability can not completely analyze the data flow and control flow of indirect addressing. In order to solve this problem, this paper proposes a binary program dynamic slicing technology for cause analysis of vulnerability. This technology uses the information related to the reading and writing of registers and memory addresses in the program execution trace to find the relationship of the data flow and control flow between the two instructions, which can more completely retain the information related to the instructions to be sliced, improve the automation component in cause analysis of vulnerability and reduce the cost of manual analysis. In addition, using the static characteristics of execution trace, this paper can meet the needs of researchers for repeated debugging and analysis of a program execution at different time points in the process of program execution.
TL;DR: In this paper , the authors present a static analysis technique to compute executes-before pairs of tasks for a general class of event driven programs based on a small but comprehensive set of rules evaluated on a novel structure called the task post graph of a program.
Abstract: The executes-before relation between tasks is fundamental in the analysis of Event Driven Programs with several downstream applications like race detection and identifying redundant synchronizations. We present a sound, efficient, and effective static analysis technique to compute executes-before pairs of tasks for a general class of event driven programs. The analysis is based on a small but comprehensive set of rules evaluated on a novel structure called the task post graph of a program. We show how to use the executes-before information to identify disjoint-blocks in event driven programs and further use them to improve the precision of data race detection for these programs. We have implemented our analysis in the Flowdroid framework in a tool called AndRacer and evaluated it on several Android apps, bringing out the scalability, recall, and improved precision of the analyses
TL;DR: In this paper , a secure static analysis-as-a-service (SaaaS) system is proposed, where a client may outsource static analysis to the cloud. But the authors do not discuss the privacy of the design and implementation of static analysis and the source code of the target program.
Abstract: We present a secure static-analysis-as-a-service (SaaaS) system where a client may outsource static analysis to the cloud. To address copyright concerns associated with SaaaS, clients are allowed to encrypt the source code of a target program and upload it to the cloud. Our goal is to secure the privacy of the design and implementation of static analysis as well as the source code of the target program. Considering a family of static analyses written in Datalog, we propose a generic protocol that combines homomorphic encryption (HE) with secure two-party computation to manage the huge cost of HE operations. The server occasionally delegates sub-parts of analysis which are costly in the cipher-world to the client without exposing the design of analysis. During server-client interactions, the information of both sides (client and server) is not leaked to the opposite. We evaluated our system on two static analyses in Datalog in secrecy, which have not been feasible using the previous techniques. For example, Andersen pointer analysis is completed in an average of 45 mins for 14 C programs comprising up to 1.6 KLoC.
TL;DR: Cross-Level Characterization (CLC) as discussed by the authors analyzes similarities and differences in resource counts as measured at each level of instrumentation during a program's transformation from source code through execution on a specific microarchitecture.
Abstract: Program behavior can be defined as a collection of executions [1]. Program behavior strongly relates to actual program performance but can be complicated to be characterized and analyzed. Characterization is important as it helps better understand program behavior by measuring various operations a program performs. There are many existing techniques [2]–[7] for program characterization, which operate at different levels of instrumentation: source code, intermediate representation (IR), instruction set architecture (ISA), and CPU microarchitecture. Each of these levels provides different capabilities and limitations. In this paper, we introduce Cross-Level Characterization (CLC), an analysis of similarities and differences in resource counts as measured at each level of instrumentation during a program’s transformation from source code through execution on a specific microarchitecture.
TL;DR: NoCFG as discussed by the authors is a sound and scalable method for approximating a call graph that supports a wide variety of programming languages, such as Python and C#, and it works on a coarse abstraction of the program, discarding many programming language constructs.
Abstract: Interprocedural analysis refers to gathering information about the entire program rather than for a single procedure only, as in intraprocedural analysis. It enables a more precise analysis; however, it is complicated due to the difficulty of constructing an accurate program call graph. Algorithms for constructing sound call graphs must trade-off precision against scalability. Many precise call graph techniques are complex and are difficult to scale due to the kind of type-inference analysis they use, in particular the use of some variations of points-to analysis. This forces use cases that require both soundness and scale such as vulnerability propagation analysis to resort to simpler variants such as Class Hierarchy Analysis. These kinds of analyses have no sound equivalent for dynamically typed languages such as Python and JavaScript that gained more popularity over recent years. To address this problem, we propose NoCFG, a new sound and scalable method for approximating a call graph that supports a wide variety of programming languages. A key property of NoCFG is that it works on a coarse abstraction of the program, discarding many of the programming language constructs. Due to the coarse program abstraction, extending it to support also other languages is easy. We evaluate NoCFG for real-world projects written in both Python and C# and the results demonstrate a high precision rate of ≥ 89% and scalability through a security use-case over projects with up to 2 million lines of code.
Abstract: Pointer analysis addresses a fundamental problem in program analysis: determining statically whether or not a given pointer may reference an object in the program. It underpins almost all forms of other static analysis, including program understanding, program verification, bug detection, security analysis, compiler optimization, and symbolic execution. However, existing pointer analysis techniques suffer from efficiency and scalability issues for large programs. Improving their efficiency while still maintaining their precision is a long-standing hard problem. This thesis aims to improve the efficiency and scalability of pointer analysis for object-oriented programming languages such as Java by exploring fine-grained context sensitivity. Unlike traditional approaches, which apply context-sensitivity either uniformly to all methods or selectively to a subset of methods in a program, we go one step further by applying context-sensitivity only to a subset of precision-critical variables and objects so that we can reduce significantly the scale of Pointer Assignment Graph (PAG). Conducting pointer analysis on a smaller PAG enables the pointer analysis to run significantly faster while preserving most of its precision. This thesis makes its contributions by introducing three different fine-grained pointer analysis approaches for Java programs. The first approach, called TURNER, can accelerate k-object-sensitive pointer analysis (i.e., kOBJ) for Java significantly with negligible precision loss by exploiting object containment and reachability. The second approach, called context debloating, can accelerate all existing object-sensitive pointer analysis algorithms for Java by eliminating the context explosion problem completely for context-independent objects. In addition, we have also developed the first supporting tool, named CONCH, for identifying context-independent objects. The last approach, called P3CTX, represents the first precision-preserving technique for accelerating k-callsite-sensitive pointer analysis (kCFA) for Java based on a complete CFL-reachability formulation of kCFA for Java with built-in on-the-fly call graph construction (for the first time).
TL;DR: In this article , a static analysis of compiled P4 programs to obtain this knowledge can be fast and accurate enough for an on-device attack scenario, but it is unclear whether such an analysis can scale beyond probabilistic detection.
Abstract: Programmable network data planes are paving the way for networking innovations, with the ability to perform complex, stateful tasks defined in high-level languages such as P4. The enhanced capabilities of programmable data plane devices has made verification of their runtime behaviour, using established methods such as probe packets, impossible to scale beyond probabilistic detection. This has created a potential opportunity for an attacker, with access to a compromised device, to subtly alter its forwarding program to mishandle only a small subset of packets, evading probabilistic detection. In practice, such subtle binary instrumentation attacks require extensive knowledge of the forwarding program, yet it is unclear whether a static analysis of compiled P4 programs to obtain this knowledge can be fast and accurate enough for an on-device attack scenario. In this work, we investigate this possibility by implementing a static analysis of P4 programs compiled to BPF bytecode. This analysis gathers sufficient information for the attacker to identify appropriate (reliably correct) edits to the program. We found that, due to predictable compiler behaviours, our analysis remains accurate even when several program behaviours are abstracted away. Our evaluation of the analysis requirements shows that, from a defensive perspective, there is scope for selectively manipulating those instructions in P4-BPF programs that are critical to attack-focused analysis in order to increase its difficulty, without increasing the number of program instructions.
TL;DR: This paper introduces the first tool QueryMax, which significantly speeds up an application code analysis without dropping any precision, and enables these two analyses to achieve, relative to a whole-program analysis, an average recall of 87%, a precision of 100% and a geometric mean speedup of 10x.
Abstract: Long analysis times are a key bottleneck for the widespread adoption of whole-program static analysis tools. Fortunately, however, a user is often only interested in finding errors in the application code, which constitutes a small fraction of the whole program. Current application-focused analysis tools overapproximate the effect of the library and hence reduce the precision of the analysis results. However, empirical studies have shown that users have high expectations on precision and will ignore tool results that don't meet these expectations. In this paper, we introduce the first tool QueryMax that significantly speeds up an application code analysis without dropping any precision. QueryMax acts as a pre-processor to an existing analysis tool to select a partial library that is most relevant to the analysis queries in the application code. The selected partial library plus the application is given as input to the existing static analysis tool, with the remaining library pointers treated as the bottom element in the abstract domain. This achieves a significant speedup over a whole-program analysis, at the cost of a few lost errors, and with no loss in precision. We instantiate and run experiments on QueryMax for a cast-check analysis and a null-pointer analysis. For a particular configuration, QueryMax enables these two analyses to achieve, relative to a whole-program analysis, an average recall of 87%, a precision of 100% and a geometric mean speedup of 10x.