TL;DR: It is shown that Slither's bug detection is fast, accurate, and outperforms other static analysis tools at finding issues in Ethereum smart contracts in terms of speed, robustness, and balance of detection and false positives.
Abstract: This paper describes Slither, a static analysis framework designed to provide rich information about Ethereum smart contracts. It works by converting Solidity smart contracts into an intermediate representation called SlithIR. SlithIR uses Static Single Assignment (SSA) form and a reduced instruction set to ease implementation of analyses while preserving semantic information that would be lost in transforming Solidity to bytecode. Slither allows for the application of commonly used program analysis techniques like dataflow and taint tracking. Our framework has four main use cases: (1) automated detection of vulnerabilities, (2) automated detection of code optimization opportunities, (3) improvement of the user's understanding of the contracts, and (4) assistance with code review. In this paper, we present an overview of Slither, detail the design of its intermediate representation, and evaluate its capabilities on real-world contracts. We show that Slither's bug detection is fast, accurate, and outperforms other static analysis tools at finding issues in Ethereum smart contracts in terms of speed, robustness, and balance of detection and false positives. We compared tools using a large dataset of smart contracts and manually reviewed results for 1000 of the most used contracts.
TL;DR: This paper proposes to use as the global context the Program Dependence Graph and Data Flow Graph to connect the method under investigation with the other relevant methods that might contribute to the buggy code to reduce the false positive rate and improve the recall of the model.
Abstract: Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the rule-based program analysis approaches, and mining-based approaches. However, they are still limited in detecting bugs that involve multiple methods and suffer high rate of false positives. In this paper, we propose a combination approach with the use of contexts and attention neural network to overcome those limitations. We propose to use as the global context the Program Dependence Graph (PDG) and Data Flow Graph (DFG) to connect the method under investigation with the other relevant methods that might contribute to the buggy code. The global context is complemented by the local context extracted from the path on the AST built from the method’s body. The use of PDG and DFG enables our model to reduce the false positive rate, while to complement for the potential reduction in recall, we make use of the attention neural network mechanism to put more weights on the buggy paths in the source code. That is, the paths that are similar to the buggy paths will be ranked higher, thus, improving the recall of our model. We have conducted several experiments to evaluate our approach on a very large dataset with +4.973M methods in 92 different project versions. The results show that our tool can have a relative improvement up to 160% on F-score when comparing with the state-of-the-art bug detection approaches. Our tool can detect 48 true bugs in the list of top 100 reported bugs, which is 24 more true bugs when comparing with the baseline approaches. We also reported that our representation is better suitable for bug detection and relatively improves over the other representations up to 206% in accuracy.
TL;DR: Slither as discussed by the authors uses Static Single Assignment (SSA) form and a reduced instruction set to ease implementation of analyses while preserving semantic information that would be lost in transforming Solidity to bytecode.
Abstract: This paper describes Slither, a static analysis framework designed to provide rich information about Ethereum smart contracts. It works by converting Solidity smart contracts into an intermediate representation called SlithIR. SlithIR uses Static Single Assignment (SSA) form and a reduced instruction set to ease implementation of analyses while preserving semantic information that would be lost in transforming Solidity to bytecode. Slither allows for the application of commonly used program analysis techniques like dataflow and taint tracking. Our framework has four main use cases: (1) automated detection of vulnerabilities, (2) automated detection of code optimization opportunities, (3) improvement of the user's understanding of the contracts, and (4) assistance with code review.
In this paper, we present an overview of Slither, detail the design of its intermediate representation, and evaluate its capabilities on real-world contracts. We show that Slither's bug detection is fast, accurate, and outperforms other static analysis tools at finding issues in Ethereum smart contracts in terms of speed, robustness, and balance of detection and false positives. We compared tools using a large dataset of smart contracts and manually reviewed results for 1000 of the most used contracts.
TL;DR: Solt-verify takes smart contracts written in Solidity and discharges verification conditions using modular program analysis and SMT solvers, built on top of the Solidity compiler, to effectively reason about high-level contract properties while modeling low-level language semantics precisely.
Abstract: We present solc-verify, a source-level verification tool for Ethereum smart contracts. solc-verify takes smart contracts written in Solidity and discharges verification conditions using modular program analysis and SMT solvers. Built on top of the Solidity compiler, solc-verify reasons at the level of the contract source code, as opposed to the more common approaches that operate at the level of Ethereum bytecode. This enables solc-verify to effectively reason about high-level contract properties while modeling low-level language semantics precisely. The properties, such as contract invariants, loop invariants, and function pre- and post-conditions, can be provided as annotations in the code by the developer. This enables automated, yet user-friendly formal verification for smart contracts. We demonstrate solc-verify by examining real-world examples where our tool can effectively find bugs and prove correctness of non-trivial properties with minimal user effort.
TL;DR: SOC-verify as discussed by the authors is a source-level verification tool for Ethereum smart contracts written in Solidity and discharges verification conditions using modular program analysis and SMT solvers.
Abstract: We present solc-verify, a source-level verification tool for Ethereum smart contracts. Solc-verify takes smart contracts written in Solidity and discharges verification conditions using modular program analysis and SMT solvers. Built on top of the Solidity compiler, solc-verify reasons at the level of the contract source code, as opposed to the more common approaches that operate at the level of Ethereum bytecode. This enables solc-verify to effectively reason about high-level contract properties while modeling low-level language semantics precisely. The contract properties, such as contract invariants, loop invariants, and function pre- and post-conditions, can be provided as annotations in the code by the developer. This enables automated, yet user-friendly formal verification for smart contracts. We demonstrate solc-verify by examining real-world examples where our tool can effectively find bugs and prove correctness of non-trivial properties with minimal user effort.
TL;DR: Senx is proposed, which, given a set of safety properties and a single input that triggers the vulnerability, detects the safety property violated by the vulnerability input and generates a corresponding patch that enforces the safetyproperty and thus, removes the vulnerability.
Abstract: Security vulnerabilities are among the most critical software defects in existence. When identified, programmers aim to produce patches that prevent the vulnerability as quickly as possible, motivating the need for automatic program repair (APR) methods to generate patches automatically. Unfortunately, most current APR methods fall short because they approximate the properties necessary to prevent the vulnerability using examples. Approximations result in patches that either do not fix the vulnerability comprehensively, or may even introduce new bugs. Instead, we propose property-based APR, which uses human-specified, program-independent and vulnerability-specific safety properties to derive source code patches for security vulnerabilities. Unlike properties that are approximated by observing the execution of test cases, such safety properties are precise and complete. The primary challenge lies in mapping such safety properties into source code patches that can be instantiated into an existing program. To address these challenges, we propose Senx, which, given a set of safety properties and a single input that triggers the vulnerability, detects the safety property violated by the vulnerability input and generates a corresponding patch that enforces the safety property and thus, removes the vulnerability. Senx solves several challenges with property-based APR: it identifies the program expressions and variables that must be evaluated to check safety properties and identifies the program scopes where they can be evaluated, it generates new code to selectively compute the values it needs if calling existing program code would cause unwanted side effects, and it uses a novel access range analysis technique to avoid placing patches inside loops where it could incur performance overhead. Our evaluation shows that the patches generated by Senx successfully fix 32 of 42 real-world vulnerabilities from 11 applications including various tools or libraries for manipulating graphics/media files, a programming language interpreter, a relational database engine, a collection of programming tools for creating and managing binary programs, and a collection of basic file, shell, and text manipulation tools.
TL;DR: The experiments show that VetPLC outperforms state-of-the-art techniques and can generate event sequences that can be used to automatically detect hidden safety violations.
Abstract: Safety violations in programmable logic controllers (PLCs), caused either by faults or attacks, have recently garnered significant attention. However, prior efforts at PLC code vetting suffer from many drawbacks. Static analyses and verification cause significant false positives and cannot reveal specific runtime contexts. Dynamic analyses and symbolic execution, on the other hand, fail due to their inability to handle real-world PLC programs that are event-driven and timing sensitive. In this paper, we propose VetPLC, a temporal context-aware, program analysis-based approach to produce timed event sequences that can be used for automatic safety vetting. To this end, we (a) perform static program analysis to create timed event causality graphs in order to understand causal relations among events in PLC code and (b) mine temporal invariants from data traces collected in Industrial Control System (ICS) testbeds to quantitatively gauge temporal dependencies that are constrained by machine operations. Our VetPLC prototype has been implemented in 15K lines of code. We evaluate it on 10 real-world scenarios from two different ICS settings. Our experiments show that VetPLC outperforms state-of-the-art techniques and can generate event sequences that can be used to automatically detect hidden safety violations.
TL;DR: A new deep-learning-based graph embedding approach to accurate detection of CFR vulnerabilities by applying a recent graph convolutional network to embed code fragments in a compact and low-dimensional representation that preserves high-level control-flow information of a vulnerable program.
Abstract: Static vulnerability detection has shown its effectiveness in detecting well-defined low-level memory errors. However, high-level control-flow related (CFR) vulnerabilities, such as insufficient control flow management (CWE-691), business logic errors (CWE-840), and program behavioral problems (CWE-438), which are often caused by a wide variety of bad programming practices, posing a great challenge for existing general static analysis solutions. This paper presents a new deep-learning-based graph embedding approach to accurate detection of CFR vulnerabilities. Our approach makes a new attempt by applying a recent graph convolutional network to embed code fragments in a compact and low-dimensional representation that preserves high-level control-flow information of a vulnerable program. We have conducted our experiments using 8,368 real-world vulnerable programs by comparing our approach with several traditional static vulnerability detectors and state-of-the-art machine-learning-based approaches. The experimental results show the effectiveness of our approach in terms of both accuracy and recall. Our research has shed light on the promising direction of combining program analysis with deep learning techniques to address the general static analysis challenges.
TL;DR: In this paper, the authors propose Sequence-coverage Directed Fuzzing (SCDF), a lightweight directed fuzzing technique which explores towards the user-specified program statements efficiently.
Abstract: Existing directed fuzzers are not efficient enough. Directed symbolic-execution-based whitebox fuzzers, e.g. BugRedux, spend lots of time on heavyweight program analysis and constraints solving at runtime. Directed greybox fuzzers, such as AFLGo, perform well at runtime, but considerable calculation during instrumentation phase hinders the overall performance. In this paper, we propose Sequence-coverage Directed Fuzzing (SCDF), a lightweight directed fuzzing technique which explores towards the user-specified program statements efficiently. Given a set of target statement sequences of a program, SCDF aims to generate inputs that can reach the statements in each sequence in order and trigger bugs in the program. Moreover, we present a novel energy schedule algorithm, which adjusts on demand a seed's energy according to its ability of covering the given statement sequences calculated on demand. We implement the technique in a tool LOLLY in order to achieve efficiency both at instrumentation time and at runtime. Experiments on several real-world software projects demonstrate that LOLLY outperforms two well-established tools on efficiency and effectiveness, i.e., AFLGo-a directed greybox fuzzer and BugRedux-a directed symbolic-execution-based whitebox fuzzer.
TL;DR: A large experimental evaluation is presented showing that JFS is both competitive with and complementary to state-of-the-art SMT solvers with respect to solving floating-point constraints, and that the coverage-guided approach of JFS provides significant benefit over naive fuzzing in the floating- point domain.
Abstract: We investigate the use of coverage-guided fuzzing as a means of proving satisfiability of SMT formulas over finite variable domains, with specific application to floating-point constraints. We show how an SMT formula can be encoded as a program containing a location that is reachable if and only if the program’s input corresponds to a satisfying assignment to the formula. A coverage-guided fuzzer can then be used to search for an input that reaches the location, yielding a satisfying assignment. We have implemented this idea in a tool, Just Fuzz-it Solver (JFS), and we present a large experimental evaluation showing that JFS is both competitive with and complementary to state-of-the-art SMT solvers with respect to solving floating-point constraints, and that the coverage-guided approach of JFS provides significant benefit over naive fuzzing in the floating-point domain. Applied in a portfolio manner, the JFS approach thus has the potential to complement traditional SMT solvers for program analysis tasks that involve reasoning about floating-point constraints.
TL;DR: This work provides Code4Bench, a benchmark comprising 3,421,357 programs totaling of 306,053,105 lines of code in 41 versions of 28 programming languages such as C/C++, Java, Python, and Kotlin, which advances the state-of-the-art in conducting reproducible and comparative experiments.
Abstract: Reproducible research relies on well-designed benchmarks. However, evaluation on a single benchmark increases the risk of overfitting; that is, an optimization to reach a certain performance. In recent years several well-designed benchmarks have been constructed for different subfields of program analysis. However, they often involve real-world industrial projects in few languages such as C or Java. We provide Code4Bench, a benchmark comprising 3,421,357 programs totaling of 306,053,105 lines of code in 41 versions of 28 programming languages such as C/C++, Java, Python, and Kotlin. We have constructed this benchmark from Codeforces, a famous programming competition website, which is widely used by international programmers. Code4Bench advances the state-of-the-art in conducting reproducible and comparative experiments. It helps mitigate the bias and increase the generality and conclusiveness of the results. We present our methodology in construction of Code4Bench and give various descriptive statistics. We have also conducted an online survey on the users of Codeforces’ website whose code is included in the benchmark. The survey is concerned about the user's demographic information and programming habits, whose results are also provided in the benchmark. Finally, we leveraged an automatic process by which we localized faults within the faulty versions and categorize them according to a coarse-grained classification. In addition to its usage in empirical studies, Code4Bench can be used to teach programming and evolve algorithmic problems. We release Code4Bench in database format to allow researchers to extract other data of the benchmark by arbitrary queries. Code4Bench version 1.0.0 is publicly available at https://zenodo.org/record/2582968 , with DOI 10.5281/zenodo.2582968, under the terms of Creative Commons Attribution 4.0 International license.
TL;DR: In this paper, a dynamic analysis based on atomic conditions is proposed to detect program inputs that trigger large floating-point errors in numerical code, which can be used for debugging, repair and synthesis.
Abstract: This paper tackles the important, difficult problem of detecting program inputs that trigger large floating-point errors in numerical code. It introduces a novel, principled dynamic analysis that leverages the mathematically rigorously analyzed condition numbers for atomic numerical operations, which we call atomic conditions, to effectively guide the search for large floating-point errors. Compared with existing approaches, our work based on atomic conditions has several distinctive benefits: (1) it does not rely on high-precision implementations to act as approximate oracles, which are difficult to obtain in general and computationally costly; and (2) atomic conditions provide accurate, modular search guidance. These benefits in combination lead to a highly effective approach that detects more significant errors in real-world code (e.g., widely-used numerical library functions) and achieves several orders of speedups over the state-of-the-art, thus making error analysis significantly more practical. We expect the methodology and principles behind our approach to benefit other floating-point program analysis tasks such as debugging, repair and synthesis. To facilitate the reproduction of our work, we have made our implementation, evaluation data and results publicly available on GitHub at https://github.com/FP-Analysis/atomic-condition.
TL;DR: This paper introduces AutoSC, which combines program analysis and the principle of software naturalness to fill in a partially completed statement, and shows that it outperforms a state-of-the-art approach from 9X-69X in top-1 accuracy.
Abstract: Automatic code completion helps improve developers' productivity in their programming tasks. A program contains instructions expressed via code statements, which are considered as the basic units of program execution. In this paper, we introduce AutoSC, which combines program analysis and the principle of software naturalness to fill in a partially completed statement. AutoSC benefits from the strengths of both directions, in which the completed code statement is both frequent and valid. AutoSC is first trained on a large code corpus to derive the templates of candidate statements. Then, it uses program analysis to validate and concretize the templates into syntactically and type-valid candidate statements. Finally, these candidates are ranked by using a language model trained on the lexical form of the source code in the code corpus. Our empirical evaluation on the large datasets of real-world projects shows that AutoSC achieves 38.9--41.3% top-1 accuracy and 48.2--50.1% top-5 accuracy in statement completion. It also outperforms a state-of-the-art approach from 9X--69X in top-1 accuracy.
TL;DR: In this article, a type-based technique for detecting side-channel leaks is proposed, which leverages Datalog-based declarative analysis and domain-specific optimizations.
Abstract: The code generation modules inside modern compilers, which use a limited number of CPU registers to store a large number of program variables, may introduce side-channel leaks even in software equipped with state-of-the-art countermeasures. We propose a program analysis and transformation based method to eliminate such leaks. Our method has a type-based technique for detecting leaks, which leverages Datalog-based declarative analysis and domain-specific optimizations to achieve high efficiency and accuracy. It also has a mitigation technique for the compiler's backend, more specifically the register allocation modules, to ensure that leaky intermediate computation results are stored in different CPU registers or memory locations. We have implemented and evaluated our method in LLVM for the x86 instruction set architecture. Our experiments on cryptographic software show that the method is effective in removing the side channel while being efficient, i.e., our mitigated code is more compact and runs faster than code mitigated using state-of-the-art techniques.
TL;DR: The comprehensive evaluation of the tool SemCluster on benchmarks drawn from solutions to small programming assignments shows that it generates far fewer clusters, precisely identifies distinct solution strategies, and boosts the performance of clustering-based program repair, all within a reasonable amount of time.
Abstract: A fundamental challenge in automated reasoning about programming assignments at scale is clustering student submissions based on their underlying algorithms. State-of-the-art clustering techniques are sensitive to control structure variations, cannot cluster buggy solutions with similar correct solutions, and either require expensive pair-wise program analyses or training efforts. We propose a novel technique that can cluster small imperative programs based on their algorithmic essence: (A) how the input space is partitioned into equivalence classes and (B) how the problem is uniquely addressed within individual equivalence classes. We capture these algorithmic aspects as two quantitative semantic program features that are merged into a program's vector representation. Programs are then clustered using their vector representations. The computation of our first semantic feature leverages model counting to identify the number of inputs belonging to an input equivalence class. The computation of our second semantic feature abstracts the program's data flow by tracking the number of occurrences of a unique pair of consecutive values of a variable during its lifetime. The comprehensive evaluation of our tool SemCluster on benchmarks drawn from solutions to small programming assignments shows that SemCluster (1) generates far fewer clusters than other clustering techniques, (2) precisely identifies distinct solution strategies, and (3) boosts the performance of clustering-based program repair, all within a reasonable amount of time.
TL;DR: A probabilistic framework, Drake, to apply program analyses to continuously evolving programs is presented and it is shown that Drake is applicable to a broad range of analyses that are based on deductive reasoning.
Abstract: Programs often evolve by continuously integrating changes from multiple programmers. The effective adoption of program analysis tools in this continuous integration setting is hindered by the need to only report alarms relevant to a particular program change. We present a probabilistic framework, Drake, to apply program analyses to continuously evolving programs. Drake is applicable to a broad range of analyses that are based on deductive reasoning. The key insight underlying Drake is to compute a graph that concisely and precisely captures differences between the derivations of alarms produced by the given analysis on the program before and after the change. Performing Bayesian inference on the graph thereby enables to rank alarms by likelihood of relevance to the change. We evaluate Drake using Sparrow—a static analyzer that targets buffer-overrun, format-string, and integer-overflow errors—on a suite of ten widely-used C programs each comprising 13k–112k lines of code. Drake enables to discover all true bugs by inspecting only 30 alarms per benchmark on average, compared to 85 (3× more) alarms by the same ranking approach in batch mode, and 118 (4× more) alarms by a differential approach based on syntactic masking of alarms which also misses 4 of the 26 bugs overall.
TL;DR: This article presents an interprocedural and configuration-aware change impact analysis (CIA) approach for determining the possibly impacted source code elements when changing the source code of a product family.
Abstract: Understanding variability is essential to allow the configuration of software systems to diverse requirements. Variability-aware program analysis techniques have been proposed for analyzing the space of program variants. Such techniques are highly beneficial, e.g., to determine the potential impact of changes during maintenance. This article presents an interprocedural and configuration-aware change impact analysis (CIA) approach for determining the possibly impacted source code elements when changing the source code of a product family. The approach also supports engineers, who are adapting the code of specific product variants after an initial pre-configuration. The approach can be adapted to work with different variability mechanisms, it is more precise than existing CIA approaches, and it can be implemented using standard control flow and data flow analysis. We report evaluation results on the benefit and performance of the approach using industrial product lines.
TL;DR: A type-based technique for detecting leaks, which leverages Datalog-based declarative analysis and domain-specific optimizations to achieve high efficiency and accuracy and a mitigation technique for the compiler's backend, more specifically the register allocation modules, to ensure that leaky intermediate computation results are stored in different CPU registers or memory locations.
Abstract: The code generation modules inside modern compilers such as GCC and LLVM, which use a limited number of CPU registers to store a large number of program variables, may introduce side-channel leaks even in software equipped with state-of-the-art countermeasures. We propose a program analysis and transformation based method to eliminate this side channel. Our method has a type-based technique for detecting leaks, which leverages Datalog-based declarative analysis and domain-specific optimizations to achieve high efficiency and accuracy. It also has a mitigation technique for the compiler's backend, more specifically the register allocation modules, to ensure that potentially leaky intermediate computation results are always stored in different CPU registers or spilled to memory with isolation. We have implemented and evaluated our method in LLVM for the x86 instruction set architecture. Our experiments on cryptographic software show that the method is effective in removing the side channel while being efficient, i.e., our mitigated code is more compact and runs faster than code mitigated using state-of-the-art techniques.
TL;DR: This work design and implement their own general-purpose Datalog engine, called RecStep, on top of a parallel single-node relational system, and demonstrates that it generally out-performs state-of-the-art parallel Datalogs engines on complex and large-scale Datalogg evaluation, by a 4-6X margin.
Abstract: Recursive query processing has experienced a recent resurgence, as a result of its use in many modern application domains, including data integration, graph analytics, security, program analysis, networking and decision making. Due to the large volumes of data being processed, several research efforts across multiple communities have explored how to scale up recursive queries, typically expressed in Datalog. Our experience with these tools indicate that their performance does not translate across domains---e.g., a tool designed for large-scale graph analytics does not exhibit the same performance on program-analysis tasks, and vice versa.Starting from the above observation, we make the following two contributions. First, we perform a detailed experimental evaluation comparing a number of state-of-the-art Datalog systems on a wide spectrum of graph analytics and program-analysis tasks, and summarize the pros and cons of existing techniques. Second, we design and implement our own general-purpose Datalog engine, called RecStep, on top of a parallel single-node relational system. We outline the techniques we applied on RecStep, as well as the contribution of each technique to the overall performance. Using RecStep as a baseline, we demonstrate that it generally out-performs state-of-the-art parallel Datalog engines on complex and large-scale Datalog evaluation, by a 4-6X margin. An additional insight from our work is that it is possible to build a high-performance Datalog system on top of a relational engine, an idea that has been dismissed in past work.
TL;DR: A new machine-learning algorithm with disjunctive model for data-driven program analysis as well as a learning algorithm to find the model parameters that is able to express nonlinear combinations of program properties.
Abstract: We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simple-minded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for data-driven program analysis as well as a learning algorithm to find the model parameters. Our model uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: context-sensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.
TL;DR: In a preliminary experiment on a neural model recently proposed in the literature, it is found that the model is very brittle, and simple perturbations in the input can cause the model to make mistakes in its prediction.
Abstract: Deep neural networks have been increasingly used in software engineering and program analysis tasks. They usually take a program and make some predictions about it, e.g., bug prediction. We call these models neural program analyzers. The reliability of neural programs can impact the reliability of the encompassing analyses. In this paper, we describe our ongoing efforts to develop effective techniques for testing neural programs. We discuss the challenges involved in developing such tools and our future plans. In our preliminary experiment on a neural model recently proposed in the literature, we found that the model is very brittle, and simple perturbations in the input can cause the model to make mistakes in its prediction.
TL;DR: The experimental results with 18 real-world programs show that the algorithm can learn a good controller and the analysis with this controller meets the constraint and utilizes available memory effectively.
Abstract: We present a new technique for developing a resource-aware program analysis. Such an analysis is aware of constraints on available physical resources, such as memory size, tracks its resource use, and adjusts its behaviors during fixpoint computation in order to meet the constraint and achieve high precision. Our resource-aware analysis adjusts behaviors by coarsening program abstraction, which usually makes the analysis consume less memory and time until completion. It does so multiple times during the analysis, under the direction of what we call a controller. The controller constantly intervenes in the fixpoint computation of the analysis and decides how much the analysis should coarsen the abstraction. We present an algorithm for learning a good controller automatically from benchmark programs. We applied our technique to a static analysis for C programs, where we control the degree of flow-sensitivity to meet a constraint on peak memory consumption. The experimental results with 18 real-world programs show that our algorithm can learn a good controller and the analysis with this controller meets the constraint and utilizes available memory effectively.
TL;DR: The review covers eight computer security problems being solved by applications of Deep Learning: security-oriented program analysis, defending return-oriented programming (ROP) attacks, achieving control-flow integrity (CFI), defending network attacks, malware classification, system-event-based anomaly detection, memory forensics, and fuzzing for software security.
Abstract: Although using machine learning techniques to solve computer security challenges is not a new idea, the rapidly emerging Deep Learning technology has recently triggered a substantial amount of interests in the computer security community. This paper seeks to provide a dedicated review of the very recent research works on using Deep Learning techniques to solve computer security challenges. In particular, the review covers eight computer security problems being solved by applications of Deep Learning: security-oriented program analysis, defending return-oriented programming (ROP) attacks, achieving control-flow integrity (CFI), defending network attacks, malware classification, system-event-based anomaly detection, memory forensics, and fuzzing for software security.
TL;DR: This article categorizes locality definitions in three groups and shows whether and how they can be interconverted, and gives a new measurement algorithm that is asymptotically more time/space efficient than previous approaches.
Abstract: In many areas of program and system analysis and optimization, locality is a common concept and has been defined and measured in many ways. This article aims to formally establish relations between these previously disparate types of locality. It categorizes locality definitions in three groups and shows whether and how they can be interconverted. For the footprint, a recent metric, it gives a new measurement algorithm that is asymptotically more time/space efficient than previous approaches. Using the conversion relations, the new algorithm derives with the same efficiency different locality metrics developed and used in program analysis, memory management, and cache design.
TL;DR: This work identifies several classes of simplification techniques that are critical for the efficient processing of string constraints in SMT solvers and provides experimental evidence that implementing them results in significant improvements over the performance of state-of-the-art SMTsolvers for extended string constraints.
Abstract: Satisfiability Modulo Theories (SMT) solvers with support for the theory of strings have recently emerged as powerful tools for reasoning about string-manipulating programs. However, due to the complex semantics of extended string functions, it is challenging to develop scalable solvers for the string constraints produced by program analysis tools. We identify several classes of simplification techniques that are critical for the efficient processing of string constraints in SMT solvers. These techniques can reduce the size and complexity of input constraints by reasoning about arithmetic entailment, multisets, and string containment relationships over input terms. We provide experimental evidence that implementing them results in significant improvements over the performance of state-of-the-art SMT solvers for extended string constraints.
TL;DR: The experiments show that the proposed approach is effective to locate vulnerabilities on 24 out of 25 DARPA’s CGC programs, and can effectively classifies 453 program crashes into 19 groups based on their root causes.
Abstract: Locating vulnerabilities is an important task for security auditing, exploit writing, and code hardening. However, it is challenging to locate vulnerabilities in binary code, because most program semantics (e.g., boundaries of an array) is missing after compilation. Without program semantics, it is difficult to determine whether a memory access exceeds its valid boundaries in binary code. In this work, we propose an approach to locate vulnerabilities based on memory layout recovery. First, we collect a set of passed executions and one failed execution. Then, for passed and failed executions, we restore their program semantics by recovering fine-grained memory layouts based on the memory addressing model. With the memory layouts recovered in passed executions as reference, we can locate vulnerabilities in failed execution by memory layout identification and comparison. Our experiments show that the proposed approach is effective to locate vulnerabilities on 24 out of 25 DARPA’s CGC programs (96%), and can effectively classifies 453 program crashes (in 5 Linux programs) into 19 groups based on their root causes.
TL;DR: This chapter reviews the cutting-edge research accomplishments in addressing the scalability challenges such as constraint solving and path explosion, as well as advances in applying symbolic execution in testing, security, and probabilistic program analysis.
Abstract: Symbolic execution is a systematic technique for checking programs, which forms a basis for various software testing and verification techniques. It provides a powerful analysis in principle but remains challenging to scale and generalize symbolic execution in practice. This chapter reviews the cutting-edge research accomplishments in addressing these challenges in the last 5 years, including advances in addressing the scalability challenges such as constraint solving and path explosion, as well as advances in applying symbolic execution in testing, security, and probabilistic program analysis.
TL;DR: This paper analyses one of the most fundamental and versatile variational inference algorithms, called score estimator or REINFORCE, using tools from denotational semantics and program analysis, and formally expresses what this algorithm does on models denoted by programs, and exposes implicit assumptions made by the algorithm on the models.
Abstract: Probabilistic programming is the idea of writing models from statistics and machine learning using program notations and reasoning about these models using generic inference engines. Recently its combination with deep learning has been explored intensely, leading to the development of deep probabilistic programming languages such as Pyro. At the core of this development lie inference engines based on stochastic variational inference algorithms. When asked to find information about the posterior distribution of a model written in such a language, these algorithms convert this posterior-inference query into an optimisation problem and solve it approximately by gradient ascent. In this paper, we analyse one of the most fundamental and versatile variational inference algorithms, called score estimator, using tools from denotational semantics and program analysis. We formally express what this algorithm does on models denoted by programs, and expose implicit assumptions made by the algorithm. The violation of these assumptions may lead to an undefined optimisation objective or the loss of convergence guarantee of the optimisation process. We then describe rules for proving these assumptions, which can be automated by static program analyses. Some of our rules use nontrivial facts from continuous mathematics, and let us replace requirements about integrals in the assumptions, by conditions involving differentiation or boundedness, which are much easier to prove automatically. Following our general methodology, we have developed a static program analysis for Pyro that aims at discharging the assumption about what we call model-guide support match. Applied to the eight representative model-guide pairs from the Pyro webpage, our analysis finds a bug in one of these cases, reveals a non-standard use of an inference engine in another, and shows the assumptions are met in the remaining cases.
TL;DR: It is shown that the proposed VM-based obfuscator not only renders the frequency analysis attack ineffective, but dictates the execution and program analysis to follow the original control flow of the program, making state-of-the-art backward tainting and slicing ineffective.
Abstract: VM-based software obfuscation has emerged as an effective technique for program obfuscation. Despite various attempts in improving its effectiveness and security, existing VM-based software obfuscators use potentially multiple but static secret mappings between virtual and native opcodes to hide the underlying instructions. In this paper, we present an attack using frequency analysis to effectively recover the secret mapping to compromise the protection, and then propose a novel VM-based obfuscator in which each basic block uses a dynamic and control-flow-aware mapping between the virtual and native instructions. We show that our proposed VM-based obfuscator not only renders the frequency analysis attack ineffective, but dictates the execution and program analysis to follow the original control flow of the program, making state-of-the-art backward tainting and slicing ineffective. We implement a prototype of our VM-based obfuscator and show its effectiveness with experiments on SPEC benchmarking and other real-world applications.
TL;DR: It is proved that the solution to the linear program corresponding to the relaxation of OPA is integral, and hence is also a solution to OPA, which yields the first efficient protocol mixer.
Abstract: Multi-party computation (MPC) protocols have been extensively optimized in an effort to bring this technology to practice, which has already started bearing fruits. The choice of which MPC protocol to use depends on the computation we are trying to perform. Protocol mixing is an effective black-box ---with respect to the MPC protocols---approach to optimize performance. Despite, however, considerable progress in the recent years existing works are heuristic and either give no guarantee or require an exponential (brute-force) search to find the optimal assignment, a problem which was conjectured to be NP hard. We provide a theoretically founded approach to optimal (MPC) protocol assignment, i.e., optimal mixing, and prove that under mild and natural assumptions, the problem is tractable both in theory and in practice for computing best two-out-of-three combinations. Concretely, for the case of two protocols, we utilize program analysis techniques---which we tailor to MPC---to define a new integer program, which we term the Optimal Protocol Assignment (in short, OPA) problem whose solution is the optimal (mixed) protocol assignment for these two protocols. Most importantly, we prove that the solution to the linear program corresponding to the relaxation of OPA is integral, and hence is also a solution to OPA. Since linear programming can be efficiently solved, this yields the first efficient protocol mixer. We showcase the quality of our OPA solver by applying it to standard benchmarks from the mixing literature. Our OPA solver can be applied on any two-out-of-three protocol combinations to obtain a best two-out-of-three protocol assignment.