Top 47 papers published in the topic of Program analysis in 2022

Showing papers on "Program analysis published in 2022"

Partial (In)Completeness in abstract interpretation: limiting the imprecision in program analysis

[...]

Marco Campion, Mila Dalla Preda, Roberto Giacobazzi

11 Jan 2022-Proceedings of the ACM on programming languages

TL;DR: This paper introduces a theory for estimating the error propagation in abstract interpretation, and hence in program analysis, and introduces a proof system for estimating an upper bound of the error accumulated by the abstract interpreter during program analysis.

...read moreread less

Abstract: Imprecision is inherent in any decidable (sound) approximation of undecidable program properties. In abstract interpretation this corresponds to the release of false alarms, e.g., when it is used for program analysis and program verification. As all alarming systems, a program analysis tool is credible when few false alarms are reported. As a consequence, we have to live together with false alarms, but also we need methods to control them. As for all approximation methods, also for abstract interpretation we need to estimate the accumulated imprecision during program analysis. In this paper we introduce a theory for estimating the error propagation in abstract interpretation, and hence in program analysis. We enrich abstract domains with a weakening of a metric distance. This enriched structure keeps coherence between the standard partial order relating approximated objects by their relative precision and the effective error made in this approximation. An abstract interpretation is precise when it is complete. We introduce the notion of partial completeness as a weakening of precision. In partial completeness the abstract interpreter may produce a bounded number of false alarms. We prove the key recursive properties of the class of programs for which an abstract interpreter is partially complete with a given bound of imprecision. Then, we introduce a proof system for estimating an upper bound of the error accumulated by the abstract interpreter during program analysis. Our framework is general enough to be instantiated to most known metrics for abstract domains.

...read moreread less

17 citations

Proceedings Article•10.1145/3524610.3527900•

Exploring GNN Based Program Embedding Technologies for Binary Related Tasks

[...]

Yixin Guo, Pengcheng Li, Yingwei Luo, Xiaoli Wang, Zhenlin Wang - Show less +1 more

1 May 2022

TL;DR: This work proposes a new program analysis approach that aims at solving program-level and procedure-level tasks with one model, by taking advantage of the great power of graph neural networks from the level of binary code, and can effectively work around emerging compilation-related problems.

...read moreread less

Abstract: With the rapid growth of program scale, program analysis, mainte-nance and optimization become increasingly diverse and complex. Applying learning-assisted methodologies onto program analysis has attracted ever-increasing attention. However, a large number of program factors including syntax structures, semantics, running platforms and compilation configurations block the effective re-alization of these methods. To overcome these obstacles, existing works prefer to be on a basis of source code or abstract syntax tree, but unfortunately are sub-optimal for binary-oriented analysis tasks closely related to the compilation process. To this end, we propose a new program analysis approach that aims at solving program-level and procedure-level tasks with one model, by taking advantage of the great power of graph neural networks from the level of binary code. By fusing the semantics of control flow graphs, data flow graphs and call graphs into one model, and embedding instructions and values simultaneously, our method can effectively work around emerging compilation-related problems. By testing the proposed method on two tasks, binary similarity detection and dead store prediction, the results show that our method is able to achieve as high accuracy as 83.25%, and 82.77%.

...read moreread less

12 citations

Journal Article•10.1016/J.SCICO.2021.102727•

Fast rule-based graph programs

[...]

Graham Campbell¹, Brian Courtehoute², Detlef Plump²•Institutions (2)

Newcastle University¹, University of York²

01 Feb 2022-Science of Computer Programming

TL;DR: In this article, the authors present a number of linear-time implementations of graph algorithms in GP 2, an experimental programming language based on graph transformation rules which aims to facilitate program analysis and verification.

...read moreread less

8 citations

Journal Article•10.31799/1684-8853-2022-1-30-43•

Kex: A Platform For Analysis Of JVM Programs

[...]

Azat Abdullin, Vladimir Itsykson

02 Mar 2022-Informatsionno-upravliaiushchie sistemy

TL;DR: This paper presents Kex, a platform for building program analysis tools for JVM bytecode, which provides three abstraction levels and proves that Kex can be used to implement competitive and powerful programAnalysis tools.

...read moreread less

Abstract: Introduction: Over the last years program analysis methods were widely used for software quality assurance. Different types of program analysis require various levels of program representation, analysis methods, etc. Platforms that provide utilities to implement different types of analysis on their basis become very important because they allow one to simplify the process of development. Purpose: Development of a platform for analysis of JVM programs. Results: In this paper we present Kex, a platform for building program analysis tools for JVM bytecode. Kex provides three abstraction levels. First is Kfg, which is an SSA-based control flow graph representation for bytecode-level analysis and transformation. Second is a symbolic program representation called Predicate State, which consists of first order logic predicates that represent instructions of the original program, constraints, etc. The final level is SMT integration layer for constraint solving. It currently provides an interface for interacting with three SMT solvers. Practical relevance: We have evaluated our platform by considering two prototypes. First prototype is an automatic test generation tool that has participated in SBST 2021 tool competition. Second prototype is a tool for detection of automatic library integration errors. Both prototypes have proved that Kex can be used to implement competitive and powerful program analysis tools.

...read moreread less

7 citations

Book Chapter•10.1007/978-3-031-08679-3_9•

Traits: Correctness-by-Construction for Free

[...]

Tobias Runge, Alex Potanin, Thomas Thüm, Ina Schaefer

1 Jan 2022

5 citations

Proceedings Article•10.1145/3511430.3511451•

IR Mapping: Intermediate Representation (IR) based Mapping to facilitate Incremental Static Analysis

[...]

Vaidehi Ghime, Ankita Khadsare, Anushri Jana, Bharti Chimdyalwar

24 Feb 2022

TL;DR: In this article , the authors present an accurate and efficient approach for IR mapping, which uses one-to-one correspondence between IDs of unchanged IR objects for incremental analysis of programs.

...read moreread less

Abstract: An Intermediate Representation (IR) is a data structure to represent a program. It represents each program entity as an object (IR object), having a unique identification number (ID). Static program analysis tools perform analysis on IRs of the input program and compute analysis information at program points - which are represented as IR objects. The analysis information is stored against their corresponding ID. Performing incremental analysis on evolving systems involve the reuse of analysis information for the unchanged IR objects between versions of a program. However, the IDs changes over the version with the change. This acts as an obstacle to the reuse of analysis information. To overcome this, a one-to-one correspondence between IDs of unchanged IR objects is necessary. We term this correspondence as IR mapping. This paper presents an accurate and efficient approach for IR mapping. We formally proved the correctness of our IR mapping technique. We evaluated the time consumption of our technique on versions of a core banking application. We found that our approach consumes on an average of 5.7% of the total time taken by incremental analysis of programs ranging from 9K- 87K LoC.

...read moreread less

4 citations

Journal Article•10.1007/s00224-022-10093-w•

A Framework for Memory Efficient Context-Sensitive Program Analysis

[...]

Mathias Hedenborg, Jonas Lundberg, Welf Löwe, Martin Trapp

18 Jul 2022-Theory of computing systems

TL;DR: In this article , the authors propose χ -terms as a means to capture and manipulate context-sensitive program information in a data-flow analysis, which is implemented as directed acyclic graphs without any redundant subgraphs.

...read moreread less

Abstract: Abstract Static program analysis is in general more precise if it is sensitive to execution contexts (execution paths). But then it is also more expensive in terms of memory consumption. For languages with conditions and iterations, the number of contexts grows exponentially with the program size. This problem is not just a theoretical issue. Several papers evaluating inter-procedural context-sensitive data-flow analysis report severe memory problems, and the path-explosion problem is a major issue in program verification and model checking. In this paper we propose χ -terms as a means to capture and manipulate context-sensitive program information in a data-flow analysis. χ -terms are implemented as directed acyclic graphs without any redundant subgraphs. We introduce the k -approximation and the l -loop-approximation that limit the size of the context-sensitive information at the cost of analysis precision. We prove that every context-insensitive data-flow analysis has a corresponding k , l -approximated context-sensitive analysis, and that these analyses are sound and guaranteed to reach a fixed point. We also present detailed algorithms outlining a compact, redundancy-free, and DAG-based implementation of χ -terms.

...read moreread less

4 citations

Book Chapter•10.1007/978-3-030-94583-1_10•

A Flow-Insensitive-Complete Program Representation

[...]

Solène Mirliaz, David Pichardie

1 Jan 2022

TL;DR: In this article , the notion of flow-insensitive-completeness is formalized with two collecting semantics and a program transformation that permits to analyze a program in a flow insensitive manner without sacrificing the precision we could obtain with a flow sensitive approach.

...read moreread less

Abstract: When designing a static analysis, choosing between a flow-insensitive or a flow-sensitive analysis often amounts to favor scalability over precision. It is well known than specific program representations can help to reconcile the two objectives at the same time. For example the SSA representation is used in modern compilers to perform a constant propagation analysis flow-insensitively without any loss of precision. This paper proposes a provably correct program transformation that reconciles them for any analysis. We formalize the notion of Flow-Insensitive-Completeness with two collecting semantics and provide a program transformation that permits to analyze a program in a flow insensitive manner without sacrificing the precision we could obtain with a flow sensitive approach.

...read moreread less

4 citations

Proceedings Article•10.1109/icse-nier55298.2022.9793535•

Statistical Reasoning About Programs

[...]

Marcel Böhme

1 May 2022

TL;DR: The advent of a new program analysis paradigm that allows anyone to make precise statements about the behavior of programs as they run in production across hundreds and millions of machines or devices is discussed.

...read moreread less

Abstract: We discuss the advent of a new program analysis paradigm that allows anyone to make precise statements about the behavior of programs as they run in production across hundreds and millions of machines or devices. The scale-oblivious, in vivo program analysis leverages an almost inconceivable rate of user-generated program executions across large fleets to analyze programs of arbitrary size and composition with negligible performance overhead. In this paper, we reflect on the program analysis problem, the prevalent paradigm, and the practical reality of program analysis at large software companies. We illustrate the new paradigm using several success stories and suggest a number of exciting new research directions.

...read moreread less

3 citations

Proceedings Article•10.1109/ivmem57067.2022.9983965•

Strong Optimistic Solving for Dynamic Symbolic Execution

[...]

23 Sep 2022

TL;DR: Sydr as discussed by the authors proposes a strong optimistic solving method that eliminates irrelevant path predicate constraints for target branch inversion and separately handles symbolic branches that have nested control transfer instructions that pass control beyond the parent branch scope, e.g. return, goto, break, etc.

...read moreread less

Abstract: Dynamic symbolic execution (DSE) is an effective method for automated program testing and bug detection. It is increasing the code coverage by the complex branches exploration during hybrid fuzzing. DSE tools invert the branches along some execution path and help fuzzer examine previously unavailable program parts. DSE often faces over- and underconstraint problems. The first one leads to significant analysis complication while the second one causes inaccurate symbolic execution.We propose strong optimistic solving method that eliminates irrelevant path predicate constraints for target branch inversion. We eliminate such symbolic constraints that the target branch is not control dependent on. Moreover, we separately handle symbolic branches that have nested control transfer instructions that pass control beyond the parent branch scope, e.g. return, goto, break, etc. We implement the proposed method in our dynamic symbolic execution tool Sydr.We evaluate the strong optimistic strategy, the optimistic strategy that contains only the last constraint negation, and their combination. The results show that the strategies combination helps increase either the code coverage or the average number of correctly inverted branches per one minute. It is optimal to apply both strategies together in contrast with other configurations.

...read moreread less

2 citations

Proceedings Article•10.1145/3477314.3507239•

Capturing program models with BISM

[...]

Chukri Soueidi, Yliès Falcone

25 Apr 2022

TL;DR: An extension of the Java bytecode instrumentation tool BISM is presented that captures and prepares a model that abstracts the program behavior at the intra-procedural level and is presented for the users to write static analyzers and combine both static and runtime verification.

...read moreread less

Abstract: In this paper, we present an extension of the Java bytecode instrumentation tool BISM that captures and prepares a model that abstracts the program behavior at the intra-procedural level. We analyze program methods we are interested in monitoring and construct a control-flow graph automaton where the states represent actions of the program that produce events. Directed towards monitoring general behavioral properties at runtime, the resulting model is presented for the users to write static analyzers and combine both static and runtime verification.

...read moreread less

Posted Content•10.48550/arxiv.2208.00337•

Tai-e: A Static Analysis Framework for Java by Harnessing the Best Designs of Classics

[...]

30 Jul 2022

TL;DR: Tai-e as discussed by the authors is a new static analysis framework for Java that provides a series of fundamental services such as program abstraction, control flow graph construction, and points-to/alias information computation.

...read moreread less

Abstract: Static analysis is a mature field with applications to bug detection, security analysis, and code optimization, etc. To facilitate these applications, static analysis frameworks play an essential role by providing a series of fundamental services such as program abstraction, control flow graph construction, and points-to/alias information computation, etc. However, despite impressive progress of static analysis, and this field has seen several popular frameworks in the last decades, it is still not clear how a static analysis framework should be designed in a way that analysis developers could benefit more: for example, what a good IR (for analysis) ought to look like? What functionalities should the module of fundamental analyses provide to ease client analyses? How to develop and integrate new analysis conveniently? How to manage multiple analyses? To answer these questions, in this work, we discuss the design trade-offs for the crucial components of a static analysis framework, and argue for the most appropriate design by following the HBDC (Harnessing the Best Designs of Classics) principle: for each crucial component, we compare the design choices made for it (possibly) by different classic frameworks such as Soot, WALA, SpotBugs and Doop, and choose arguably the best one, but if none is good enough, we then propose a better design. These selected or newly proposed designs finally constitute Tai-e, a new static analysis framework for Java. Specifically, Tai-e is novel in the designs of several aspects like IR, pointer analysis and development of new analyses, etc., leading to an easy-to-learn, easy-to-use and efficient system. To our knowledge, this is the first work that systematically explores the designs and implementations of various static analysis frameworks, and we believe it provides useful materials and viewpoints for building better static analysis infrastructures.

...read moreread less

Posted Content•10.48550/arxiv.2209.10445•

Interactive Abstract Interpretation: Reanalyzing Whole Programs for Cheap

[...]

21 Sep 2022

TL;DR: In this article , the authors propose a framework for interactive abstract interpretation in static program analysis, including postprocessing, without necessitating any modifications to the analysis specifications themselves, and they use lazy invalidation for analysis results affected by program change.

...read moreread less

Abstract: To put static program analysis at the fingertips of the software developer, we propose a framework for interactive abstract interpretation. While providing sound analysis results, abstract interpretation in general can be quite costly. To achieve quick response times, we incrementalize the analysis infrastructure, including postprocessing, without necessitating any modifications to the analysis specifications themselves. We rely on the local generic fixpoint engine TD, which dynamically tracks dependencies, while exploring the unknowns contributing to answering an initial query. Lazy invalidation is employed for analysis results affected by program change. Dedicated improvements support the incremental analysis of concurrency deficiencies such as data-races. The framework has been implemented for multithreaded C within the static analyzer Goblint, using MagpieBridge to relay findings to IDEs. We evaluate our implementation w.r.t. the yard sticks of response time and consistency: formerly proven invariants should be retained - when they are not affected by the change. The results indicate that with our approach, a reanalysis after small changes only takes a fraction of from-scratch analysis time, while most of the precision is retained. We also provide examples of program development highlighting the usability of the overall approach.

...read moreread less

Proceedings Article•10.1145/3510003.3510139•

Hiding Critical Program Components via Ambiguous Translation

[...]

Chi-Gon Jung, Doowon Kim, An Chen, Weihang Wang, Yunhui Zheng, Kyu Hyung Lee, Yonghwi Kwon - Show less +3 more

1 May 2022

TL;DR: The evaluation results show that static, dynamic and symbolic analysis techniques fail to identify the hidden information in Ambitr, and it is demonstrated that manual analysis of Am Bitr is extremely challenging.

...read moreread less

Abstract: Software systems may contain critical program components such as patented program logic or sensitive data. When those components are reverse-engineered by adversaries, it can cause significantly damage (e.g., financial loss or operational failures). While protecting critical program components (e.g., code or data) in software systems is of utmost importance, existing approaches, unfortunately, have two major weaknesses: (1) they can be reverse-engineered via various program analysis techniques and (2) when an adversary obtains a legitimate-looking critical program component, he or she can be sure that it is genuine. In this paper, we propose Ambitr, a novel technique that hides critical program components. The core of Ambitr is Ambiguous Translator that can generate the critical program components when the input is a correct secret key. The translator is ambiguous as it can accept any inputs and produces a number of legitimate-looking outputs, making it difficult to know whether an input is correct secret key or not. The executions of the translator when it processes the correct secret key and other inputs are also indistinguishable, making the analysis inconclusive. Our evaluation results show that static, dynamic and symbolic analysis techniques fail to identify the hidden information in Ambitr. We also demonstrate that manual analysis of Ambitr is extremely challenging.

...read moreread less

Journal Article•10.1109/tse.2020.2999534•

Explaining Static Analysis With Rule Graphs

[...]

01 Feb 2022-IEEE Transactions on Software Engineering

TL;DR: In this article , the authors introduce the concept of rule graphs that expose the developer selected information about the internal rules of data-flow analyses, which can help developers understand how the underlying analyses interpret the analyzed code and their reasoning for reporting certain warnings.

...read moreread less

Abstract: As static data-flow analysis becomes able to report increasingly complex bugs, using an evergrowing set of complex internal rules encoded into flow functions, the analysis tools themselves grow more and more complex. In result, for users to be able to effectively use those tools on specific codebases, they require special configurations—a task which in industry is typically performed by individual developers or dedicated teams. To efficiently use and configure static analysis tools, developers need to build a certain understanding of the analysis’ rules, i.e., how the underlying analyses interpret the analyzed code and their reasoning for reporting certain warnings. In this article, we explore how to assist developers in understanding the analysis’ warnings, and finding weaknesses in the analysis’ rules. To this end, we introduce the concept of rule graphs that expose to the developer selected information about the internal rules of data-flow analyses. We have implemented rule graphs on top of a taint analysis, and show how the graphs can support the abovementioned tasks. Our user study and empirical evaluation show that using rule graphs helps developers understand analysis warnings more accurately than using simple warning traces, and that rule graphs can help developers identify causes for false positives in analysis rules.

...read moreread less

Proceedings Article•10.1145/3567512.3567525•

Property Probes: Source Code Based Exploration of Program Analysis Results

[...]

Anton Risberg Alaküla, Görel Hedin, Niklas Fors, Adrian Pop

29 Nov 2022

TL;DR: In this paper , a node locator data structure is introduced that maps between source code spans and program representation nodes, and helps identify probed nodes in a robust way, after modifications to the source code.

...read moreread less

Abstract: We present property probes, a mechanism for helping a developer interactively explore partial program analysis results in terms of the source program, and as the program is edited. A node locator data structure is introduced that maps between source code spans and program representation nodes, and that helps identify probed nodes in a robust way, after modifications to the source code. We have developed a client-server based tool supporting property probes, and argue that it is very helpful in debugging and understanding program analyses. We have evaluated our tool on several languages and analyses, including a full Java compiler and a tool for intraprocedural dataflow analysis. Our performance results show that the probe overhead is negligible even when analyzing large projects.

...read moreread less

Proceedings Article•10.1109/ispras57371.2022.10076859•

Devirtualization for static analysis with low level intermediate representation

[...]

1 Dec 2022

TL;DR: In this article , the authors propose a points-to analysis that can recover targets for function pointer calls, virtual calls and method calls for using in a static analysis, and the analysis results are intended for flow- and path-sensitive analysis.

...read moreread less

Abstract: We propose a points-to analysis that can recover targets for function pointer calls, virtual calls and method calls for using in a static analysis. We use a flow-insensitive analysis, and the analysis results are intended for flow- and path-sensitive analysis which can improve the initial analysis precision within a single function. We implemented the proposed approach in a static analyzer for finding errors in C, C++, Go, Java and Kotlin programs. The devirtualization algorithm is fast enough and spends less than 6% of the total analysis time. It can work for projects like Tizen 7 with 27.5 MLoc of source code.

...read moreread less

Journal Article•10.1016/j.scico.2022.102845•

Special issue on Application-oriented aspects of graphs and graph transformation (ICGT 2020)

[...]

Timo Kehrer, Fabio Gadducci

01 Sep 2022-Science of Computer Programming

TL;DR: The SPLNum2Analyzer as mentioned in this paper analyzes #if-annotated C programs using a lifted domain, which is parametric in the choice of the domains for representing linear constraints and leaf nodes.

...read moreread less

Proceedings Article•10.1109/mascots56607.2022.00013•

Cross-Level Characterization of Program Execution

[...]

1 Oct 2022

TL;DR: In this paper , a cross-level characterization approach for understanding the behavior of program execution at different levels in the process of writing, compiling, and running a program is presented, providing a richer view of the sources of performance gains and losses and helps identify program execution in a more accurate manner.

...read moreread less

Abstract: Characterization of program execution plays a key role in performance improvement. There are numerous transformations applied to each step a program takes on its lowering from source code to a compiler intermediate representation to machine language to microarchitecture-specific execution. The unpredictable benefit of each transformation step could lead a notionally superior algorithm to exhibit inferior performance once actually run, and it can be opaque at what step in the transformation path contradicted the code developer's assumptions. However, conventional approaches to program execution characterization consider the behavior after only a single one of those steps, which limits the information that can be provided to the user. To help address the issue of myopic views of program execution, this paper presents a novel cross-level characterization approach for understanding the behavior of program execution at different levels in the process of writing, compiling, and running a program. We show that this approach provides a richer view of the sources of performance gains and losses and helps identify program execution in a more accurate manner.

...read moreread less

Journal Article•10.13053/cys-26-2-3887•

Test Case Generation using Symbolic Execution

[...]

Saumendra Pattnaik, Bidush Kumar Sahoo, Chhabi Rani Panigrahi, Binod Kumar Pattanayak, Bibudhendu Pati - Show less +1 more

30 Jun 2022-Computación Y Sistemas

TL;DR: In this paper , the authors focus on dead code detection and test input generation using symbolic execution and show that the symbolic execution method can be used to reduce symbolic execution time and to find out the unreachable path with less number of test cases.

...read moreread less

Abstract: Testing is a well-known technique for identifying errors in software programs. Testing can be done in two ways: Static analysis and Dynamic analysis. Symbolic execution plays a vital role in static analysis for test case generation and to find the unreachable path with minimum test cases. Unreachable path is a part of a program which can never be executed i.e., the symbolic execution doesn’t continue for that path and the current execution stops there. It generates a test suite for loop-free programs that is achieved by path coverage. In the best case program loops implies increase in the number of paths exponentially and in the worst case the program will not terminate. The functions of symbolic execution are test input generation, unreachable path detection, finding bugs in software programs, debugging. In this paper, we focus on dead code detection and test input generation using symbolic execution. Our execution for Java programs uses Java Path Finder (JPF) model tester. Our analysis shows that the symbolic execution method can be used to reduce symbolic execution time and to find out the unreachable path with less number of test cases.

...read moreread less

Book Chapter•10.1007/978-3-030-94822-1_19•

ReHAna: An Efficient Program Analysis Framework to Uncover Reflective Code in Android

[...]

Jinming Luo¹•Institutions (1)

University of Nebraska–Lincoln¹

1 Jan 2022

Posted Content•10.48550/arxiv.2203.03943•

mwp-Analysis Improvement and Implementation: Realizing Implicit Computational Complexity

[...]

8 Mar 2022

TL;DR: The mwp-flow analysis as discussed by the authors is a certified complexity analysis for C programs that is based on implicit computational complexity (ICC) and is able to determine polynomial bounds on the size of the values manipulated by an imperative program.

...read moreread less

Abstract: Implicit Computational Complexity (ICC) drives better understanding of complexity classes, but it also guides the development of resources-aware languages and static source code analyzers. Among the methods developed, the mwp-flow analysis certifies polynomial bounds on the size of the values manipulated by an imperative program. This result is obtained by bounding the transitions between states instead of focusing on states in isolation, as most static analyzers do, and is not concerned with termination or tight bounds on values. Those differences, along with its built-in compositionality, make the mwp-flow analysis a good target for determining how ICC-inspired techniques diverge compared with more traditional static analysis methods. This paper's contributions are threefold: we fine-tune the internal machinery of the original analysis to make it tractable in practice; we extend the analysis to function calls and leverage its machinery to compute the result of the analysis efficiently; and we implement the resulting analysis as a lightweight tool to automatically perform data-size analysis of C programs. This documented effort prepares and enables the development of certified complexity analysis, by transforming a costly analysis into a tractable program, that furthermore decorrelates the problem of deciding if a bound exist with the problem of computing it.

...read moreread less

Proceedings Article•10.1109/cvidliccea56201.2022.9825448•

Research on Binary Program Dynamic Slicing Technology for Cause Analysis of Vulnerability

[...]

Peiyu Lu, Chao Feng, Chao-jing Tang

20 May 2022

TL;DR: This paper proposes a binary program dynamic slicing technology that uses the information related to the reading and writing of registers and memory addresses in the program execution trace to find the relationship of the data flow and control flow between the two instructions.

...read moreread less

Abstract: In cause analysis of vulnerability, multi-level dereference of pointer and array element index analysis are often encountered at the code level, which is reflected in the case of indirect addressing at the assembly level. At present, the program slicing technology commonly used for cause analysis of vulnerability can not completely analyze the data flow and control flow of indirect addressing. In order to solve this problem, this paper proposes a binary program dynamic slicing technology for cause analysis of vulnerability. This technology uses the information related to the reading and writing of registers and memory addresses in the program execution trace to find the relationship of the data flow and control flow between the two instructions, which can more completely retain the information related to the instructions to be sliced, improve the automation component in cause analysis of vulnerability and reduce the cost of manual analysis. In addition, using the static characteristics of execution trace, this paper can meet the needs of researchers for repeated debugging and analysis of a program execution at different time points in the process of program execution.

...read moreread less

Proceedings Article•10.1145/3540250.3549116•

Static executes-before analysis for event driven programs

[...]

Rekha Pai, Abhishek Uppar, Akshatha Shenoy, Pranshul Kushwaha, Deepak D'Souza - Show less +1 more

7 Nov 2022

TL;DR: In this paper , the authors present a static analysis technique to compute executes-before pairs of tasks for a general class of event driven programs based on a small but comprehensive set of rules evaluated on a novel structure called the task post graph of a program.

...read moreread less

Abstract: The executes-before relation between tasks is fundamental in the analysis of Event Driven Programs with several downstream applications like race detection and identifying redundant synchronizations. We present a sound, efficient, and effective static analysis technique to compute executes-before pairs of tasks for a general class of event driven programs. The analysis is based on a small but comprehensive set of rules evaluated on a novel structure called the task post graph of a program. We show how to use the executes-before information to identify disjoint-blocks in event driven programs and further use them to improve the precision of data race detection for these programs. We have implemented our analysis in the Flowdroid framework in a tool called AndRacer and evaluated it on several Android apps, bringing out the scalability, recall, and improved precision of the analyses

...read moreread less

Journal Article•10.1109/access.2022.3177841•

Datalog Static Analysis in Secrecy

[...]

01 Jan 2022-IEEE Access

TL;DR: In this paper , a secure static analysis-as-a-service (SaaaS) system is proposed, where a client may outsource static analysis to the cloud. But the authors do not discuss the privacy of the design and implementation of static analysis and the source code of the target program.

...read moreread less

Abstract: We present a secure static-analysis-as-a-service (SaaaS) system where a client may outsource static analysis to the cloud. To address copyright concerns associated with SaaaS, clients are allowed to encrypt the source code of a target program and upload it to the cloud. Our goal is to secure the privacy of the design and implementation of static analysis as well as the source code of the target program. Considering a family of static analyses written in Datalog, we propose a generic protocol that combines homomorphic encryption (HE) with secure two-party computation to manage the huge cost of HE operations. The server occasionally delegates sub-parts of analysis which are costly in the cipher-world to the client without exposing the design of analysis. During server-client interactions, the information of both sides (client and server) is not leaked to the opposite. We evaluated our system on two static analyses in Datalog in secrecy, which have not been feasible using the previous techniques. For example, Andersen pointer analysis is completed in an average of 45 mins for 14 C programs comprising up to 1.6 KLoC.

...read moreread less

Proceedings Article•10.1109/ispass55109.2022.00036•

Cross-Level Characterization of Program Behavior : (Extended Poster Abstract)

[...]

1 May 2022

TL;DR: Cross-Level Characterization (CLC) as discussed by the authors analyzes similarities and differences in resource counts as measured at each level of instrumentation during a program's transformation from source code through execution on a specific microarchitecture.

...read moreread less

Abstract: Program behavior can be defined as a collection of executions [1]. Program behavior strongly relates to actual program performance but can be complicated to be characterized and analyzed. Characterization is important as it helps better understand program behavior by measuring various operations a program performs. There are many existing techniques [2]–[7] for program characterization, which operate at different levels of instrumentation: source code, intermediate representation (IR), instruction set architecture (ISA), and CPU microarchitecture. Each of these levels provides different capabilities and limitations. In this paper, we introduce Cross-Level Characterization (CLC), an analysis of similarities and differences in resource counts as measured at each level of instrumentation during a program’s transformation from source code through execution on a specific microarchitecture.

...read moreread less

Proceedings Article•10.1145/3477314.3507126•

A lightweight approach for sound call graph approximation

[...]

25 Apr 2022

TL;DR: NoCFG as discussed by the authors is a sound and scalable method for approximating a call graph that supports a wide variety of programming languages, such as Python and C#, and it works on a coarse abstraction of the program, discarding many programming language constructs.

...read moreread less

Abstract: Interprocedural analysis refers to gathering information about the entire program rather than for a single procedure only, as in intraprocedural analysis. It enables a more precise analysis; however, it is complicated due to the difficulty of constructing an accurate program call graph. Algorithms for constructing sound call graphs must trade-off precision against scalability. Many precise call graph techniques are complex and are difficult to scale due to the kind of type-inference analysis they use, in particular the use of some variations of points-to analysis. This forces use cases that require both soundness and scale such as vulnerability propagation analysis to resort to simpler variants such as Class Hierarchy Analysis. These kinds of analyses have no sound equivalent for dynamically typed languages such as Python and JavaScript that gained more popularity over recent years. To address this problem, we propose NoCFG, a new sound and scalable method for approximating a call graph that supports a wide variety of programming languages. A key property of NoCFG is that it works on a coarse abstraction of the program, discarding many of the programming language constructs. Due to the coarse program abstraction, extending it to support also other languages is easy. We evaluate NoCFG for real-world projects written in both Python and C# and the results demonstrate a high precision rate of ≥ 89% and scalability through a security use-case over projects with up to 2 million lines of code.

...read moreread less

Journal Article•10.26190/unsworks/23987•

Efficient and Precise Pointer Analysis with Fine-Grained Context Sensitivity

[...]

He Dongjie

2 May 2022

Abstract: Pointer analysis addresses a fundamental problem in program analysis: determining statically whether or not a given pointer may reference an object in the program. It underpins almost all forms of other static analysis, including program understanding, program verification, bug detection, security analysis, compiler optimization, and symbolic execution. However, existing pointer analysis techniques suffer from efficiency and scalability issues for large programs. Improving their efficiency while still maintaining their precision is a long-standing hard problem. This thesis aims to improve the efficiency and scalability of pointer analysis for object-oriented programming languages such as Java by exploring fine-grained context sensitivity. Unlike traditional approaches, which apply context-sensitivity either uniformly to all methods or selectively to a subset of methods in a program, we go one step further by applying context-sensitivity only to a subset of precision-critical variables and objects so that we can reduce significantly the scale of Pointer Assignment Graph (PAG). Conducting pointer analysis on a smaller PAG enables the pointer analysis to run significantly faster while preserving most of its precision. This thesis makes its contributions by introducing three different fine-grained pointer analysis approaches for Java programs. The first approach, called TURNER, can accelerate k-object-sensitive pointer analysis (i.e., kOBJ) for Java significantly with negligible precision loss by exploiting object containment and reachability. The second approach, called context debloating, can accelerate all existing object-sensitive pointer analysis algorithms for Java by eliminating the context explosion problem completely for context-independent objects. In addition, we have also developed the first supporting tool, named CONCH, for identifying context-independent objects. The last approach, called P3CTX, represents the first precision-preserving technique for accelerating k-callsite-sensitive pointer analysis (kCFA) for Java based on a complete CFL-reachability formulation of kCFA for Java with built-in on-the-fly call graph construction (for the first time).

...read moreread less

Proceedings Article•10.1109/netsoft54395.2022.9844121•

Investigating the Vulnerability of Programmable Data Planes to Static Analysis-Guided Attacks

[...]

27 Jun 2022

TL;DR: In this article , a static analysis of compiled P4 programs to obtain this knowledge can be fast and accurate enough for an on-device attack scenario, but it is unclear whether such an analysis can scale beyond probabilistic detection.

...read moreread less

Abstract: Programmable network data planes are paving the way for networking innovations, with the ability to perform complex, stateful tasks defined in high-level languages such as P4. The enhanced capabilities of programmable data plane devices has made verification of their runtime behaviour, using established methods such as probe packets, impossible to scale beyond probabilistic detection. This has created a potential opportunity for an attacker, with access to a compromised device, to subtly alter its forwarding program to mishandle only a small subset of packets, evading probabilistic detection. In practice, such subtle binary instrumentation attacks require extensive knowledge of the forwarding program, yet it is unclear whether a static analysis of compiled P4 programs to obtain this knowledge can be fast and accurate enough for an on-device attack scenario. In this work, we investigate this possibility by implementing a static analysis of P4 programs compiled to BPF bytecode. This analysis gathers sufficient information for the attacker to identify appropriate (reliably correct) edits to the program. We found that, due to predictable compiler behaviours, our analysis remains accurate even when several program behaviours are abstracted away. Our evaluation of the analysis requirements shows that, from a defensive perspective, there is scope for selectively manipulating those instructions in P4-BPF programs that are critical to attack-focused analysis in order to increase its difficulty, without increasing the number of program instructions.

...read moreread less

Proceedings Article•10.1145/3510003.3510046•

Fast and Precise Application Code Analysis using a Partial Library

[...]

Akshay Utture, Jens Palsberg

1 May 2022

TL;DR: This paper introduces the first tool QueryMax, which significantly speeds up an application code analysis without dropping any precision, and enables these two analyses to achieve, relative to a whole-program analysis, an average recall of 87%, a precision of 100% and a geometric mean speedup of 10x.

...read moreread less

Abstract: Long analysis times are a key bottleneck for the widespread adoption of whole-program static analysis tools. Fortunately, however, a user is often only interested in finding errors in the application code, which constitutes a small fraction of the whole program. Current application-focused analysis tools overapproximate the effect of the library and hence reduce the precision of the analysis results. However, empirical studies have shown that users have high expectations on precision and will ignore tool results that don't meet these expectations. In this paper, we introduce the first tool QueryMax that significantly speeds up an application code analysis without dropping any precision. QueryMax acts as a pre-processor to an existing analysis tool to select a partial library that is most relevant to the analysis queries in the application code. The selected partial library plus the application is given as input to the existing static analysis tool, with the remaining library pointers treated as the bottom element in the abstract domain. This achieves a significant speedup over a whole-program analysis, at the cost of a few lost errors, and with no loss in precision. We instantiate and run experiments on QueryMax for a cast-check analysis and a null-pointer analysis. For a particular configuration, QueryMax enables these two analyses to achieve, relative to a whole-program analysis, an average recall of 87%, a precision of 100% and a geometric mean speedup of 10x.

...read moreread less