TL;DR: A methodology for automatically evaluating the effectiveness of static code analyzers based on CVE reports is proposed and it is found that this false negative rate can be reduced to 30% to 69% when combining the results of static analyzers, at the cost of 15 percentage points more functions flagged.
Abstract: Static code analysis is often used to scan source code for security vulnerabilities. Given the wide range of existing solutions implementing different analysis techniques, it is very challenging to perform an objective comparison between static analysis tools to determine which ones are most effective at detecting vulnerabilities. Existing studies are thereby limited in that (1) they use synthetic datasets, whose vulnerabilities do not reflect the complexity of security bugs that can be found in practice and/or (2) they do not provide differentiated analyses w.r.t. the types of vulnerabilities output by the static analyzers. Hence, their conclusions about an analyzer's capability to detect vulnerabilities may not generalize to real-world programs. In this paper, we propose a methodology for automatically evaluating the effectiveness of static code analyzers based on CVE reports. We evaluate five free and open-source and one commercial static C code analyzer(s) against 27 software projects containing a total of 1.15 million lines of code and 192 vulnerabilities (ground truth). While static C analyzers have been shown to perform well in benchmarks with synthetic bugs, our results indicate that state-of-the-art tools miss in-between 47% and 80% of the vulnerabilities in a benchmark set of real-world programs. Moreover, our study finds that this false negative rate can be reduced to 30% to 69% when combining the results of static analyzers, at the cost of 15 percentage points more functions flagged. Many vulnerabilities hence remain undetected, especially those beyond the classical memory-related security bugs.
TL;DR: Wang et al. as discussed by the authors proposed a hybrid approach that examines permissions, text, and network-based features both statically and dynamically by monitoring memory usage, system call logs, and CPU usage.
TL;DR: In this paper , the authors present a comprehensive study and analysis of current malware and detection techniques using the snowball approach, including machine learning and deep learning techniques, and compare the performance evaluation metrics, current research challenges, and possible future direction of research.
TL;DR: A Transformer-based learning approach to identify false positive bug warnings is proposed and it is demonstrated that these models can improve the precision of static analysis by 17.5%.
Abstract: Due to increasingly complex software design and rapid iterative development, code defects and security vulnerabilities are prevalent in modern software. In response, programmers rely on static analysis tools to regularly scan their codebases and find potential bugs. In order to maximize coverage, however, these tools generally tend to report a significant number of false positives, requiring developers to manually verify each warning. To address this problem, we propose a Transformer-based learning approach to identify false positive bug warnings. We demonstrate that our models can improve the precision of static analysis by 17.5%. In addition, we validated the generalizability of this approach across two major bug types: null dereference and resource leak.
TL;DR: A hypothesis to this end is posed that assumes the applicability of machine-learning solutions for IoT system static analysis, and a proposal of an intelligent framework concept for the static analysis of IoT systems is proposed.
Abstract: Ensuring security for modern IoT systems requires the use of complex methods to analyze their software. One of the most in-demand methods that has repeatedly been proven to be effective is static analysis. However, the progressive complication of the connections in IoT systems, the increase in their scale, and the heterogeneity of elements requires the automation and intellectualization of manual experts’ work. A hypothesis to this end is posed that assumes the applicability of machine-learning solutions for IoT system static analysis. A scheme of this research, which is aimed at confirming the hypothesis and reflecting the ontology of the study, is given. The main contributions to the work are as follows: systematization of static analysis stages for IoT systems and decisions of machine-learning problems in the form of formalized models; review of the entire subject area publications with analysis of the results; confirmation of the machine-learning instrumentaries applicability for each static analysis stage; and the proposal of an intelligent framework concept for the static analysis of IoT systems. The novelty of the results obtained is a consideration of the entire process of static analysis (from the beginning of IoT system research to the final delivery of the results), consideration of each stage from the entirely given set of machine-learning solutions perspective, as well as formalization of the stages and solutions in the form of “Form and Content” data transformations.
TL;DR: Wang et al. as discussed by the authors proposed a new robust dynamic analysis method for malware detection by using specific fine-grained behavioral features, i.e., control flow traces, which can effectively detect malware with an accuracy of up to 95.7%.
TL;DR: In this paper , the authors highlight strengths and weaknesses of CNNs used for static malware detection, starting from images obtained from byte-wise conversion of binary executable files to pixel images to critically analyze the assumptions underlying the performance of this type of technique.
TL;DR: In this paper , a new method is proposed by using static analysis and gathering as most useful features of android applications as possible, along with two new proposed features, and then passing them to a functional API deep learning model.
Abstract: The computers nowadays are being replaced by the smartphones for the most of the internet users around the world, and Android is getting the most of the smartphone systems’ market. This rise of the usage of smartphones generally, and the Android system specifically, leads to a strong need to effectively secure Android, as the malware developers are targeting it with sophisticated and obfuscated malware applications. Consequently, a lot of studies were performed to propose a robust method to detect and classify android malicious software (malware). Some of them were effective, some were not; with accuracy below 90%, and some of them are being outdated; using datasets that became old containing applications for old versions of Android that are rarely used today. In this paper, a new method is proposed by using static analysis and gathering as most useful features of android applications as possible, along with two new proposed features, and then passing them to a functional API deep learning model we made. This method was implemented on a new and classified android application dataset, using 14079 malware and benign samples in total, with malware samples classified into four malware classes. Two major experiments with this dataset were implemented, one for malware detection with the dataset samples categorized into two classes as just malware and benign, the second one was made for malware detection and classification, using all the five classes of the dataset. As a result, our model overcomes the related works when using just two classes with F1-score of 99.5%. Also, high malware detection and classification performance was obtained by using the five classes, with F1-score of 97%.
TL;DR: In this article , a hybrid framework for malware classification is presented, where static malware executables and dynamic process memory dumps are converted to images mapped through space-filling curves, from which visual features are extracted for classification.
TL;DR: JuCify as discussed by the authors proposes a unified model of all code in Android apps, where they extract and merge call graphs of native code and bytecode to make the final model readily-usable by a common Android analysis framework.
Abstract: Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art static analysis approaches have mostly overlooked the presence of such native code, which, however, may implement some key sensitive, or even malicious, parts of the app behavior. This limitation of the state of the art is a severe threat to validity in a large range of static analyses that do not have a complete view of the executable code in apps. To address this issue, we propose a new advance in the ambitious research direction of building a unified model of all code in Android apps. The JuCify approach presented in this paper is a significant step towards such a model, where we extract and merge call graphs of native code and bytecode to make the final model readily-usable by a common Android analysis framework: in our implementation, JuCify builds on the Soot internal intermediate representation. We performed empirical investigations to highlight how, without the unified model, a significant amount of Java methods called from the native code are "unreachable" in apps' call-graphs, both in goodware and malware. Using JuCify, we were able to enable static analyzers to reveal cases where malware relied on native code to hide invocation of payment library code or of other sensitive code in the Android framework. Additionally, JuCify's model enables state-of-the-art tools to achieve better precision and recall in detecting data leaks through native code. Finally, we show that by using JuCify we can find sensitive data leaks that pass through native code.
TL;DR: Wang et al. as mentioned in this paper proposed a hybrid Android malware detection approach that combines dynamic and static tactics based on machine-learning-based method and improved state-based algorithm based on DAMBA.
Abstract: Recent years have witnessed a significant increase in the use of Android devices in many aspects of our life. However, users can download Android apps from third-party channels, which provides numerous opportunities for malware. Attackers utilize unsolicited permissions to gain access to the sensitive private intelligence of users. Since signature-based antivirus solutions no longer meet practical needs, efficient and adaptable solutions are desperately needed, especially in new variants. As a remedy, we propose a hybrid Android malware detection approach that combines dynamic and static tactics. We firstly adopt static analysis inferring different permission usage patterns between malware and benign apps based on the machine-learning-based method. To classify the suspicious apps further, we extract the object reference relationships from the memory heap to construct a dynamic feature base. We then present an improved state-based algorithm based on DAMBA. Experimental results on a real-world dataset of 21,708 apps show that our approach outperforms the well-known detector with 97.5% F1-measure. Besides, our system is demonstrated to resist permission abuse behaviors and obfuscation techniques.
TL;DR: In this paper , the authors performed an analysis of 10 influential research works on Android malware detection using a common evaluation framework and identified five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models and their performances.
TL;DR: Seader is an example-based approach to detect and repair security-API misuses via pattern matching and can help developers correctly use security APIs.
Abstract: The Java libraries JCA and JSSE offer cryptographic APIs to facilitate secure coding. When developers misuse some of the APIs, their code becomes vulnerable to cyber-attacks. To eliminate such vulnerabilities, people built tools to detect security-API misuses via pattern matching. However, most tools do not (1) fix misuses or (2) allow users to extend tools' pattern sets. To overcome both limitations, we created Seader-an example-based approach to detect and repair security-API misuses. Given an exemplar $\langle\text{insecure, secure}\rangle$ code pair, Seader compares the snippets to infer any API-misuse template and corresponding fixing edit. Based on the inferred info, given a program, Seader performs inter-procedural static analysis to search for security-API misuses and to propose customized fixes. For evaluation, we applied Seader to 28 $\langle\text{insecure, secure}\rangle$ code pairs; Seader successfully inferred 21 unique API-misuse templates and related fixes. With these $\langle\text{vulnerability, fix}\rangle$ patterns, we applied Seader to a program benchmark that has 86 known vulnerabilities. Seader detected vulnerabilities with 95% precision, 72% recall, and 82% F-score. We also applied Seader to 100 open-source projects and manually checked 77 suggested repairs; 76 of the repairs were correct. Seader can help developers correctly use security APIs.
TL;DR: In some cases, user dissatisfaction even leads to tool abandonment as mentioned in this paper , which hinders the adoption of static analysis tools, and in some cases leads to a tool abandonment due to user dissatisfaction.
Abstract: Static analysis tools support developers in detecting potential coding issues, such as bugs or vulnerabilities. Research on static analysis emphasizes its technical challenges but also mentions severe usability shortcomings. These shortcomings hinder the adoption of static analysis tools, and in some cases, user dissatisfaction even leads to tool abandonment.
TL;DR: A chaincode vulnerability detection framework that combines the dynamic symbolic execution and the static abstract syntax tree analysis technology and the accuracy of the proposed vulnerability detection method for Hyperledger Fabric smart contracts is proposed.
Abstract: Hyperledger Fabric is another development of blockchain technology after Ethereum, which is more suitable as an operating platform for smart contracts. However, the testing technology of Hyperledger Fabric smart contracts (also known as chaincode) is not yet mature currently. Based on this, this paper studies the vulnerability detection of Golang chaincodes. Firstly, we summarize 17 kinds of Golang chaincode vulnerabilities by investigating existing research. Secondly, taking the high accuracy of dynamic detection and the high efficiency of static detection into consideration, we propose a chaincode vulnerability detection framework that combines the dynamic symbolic execution and the static abstract syntax tree analysis technology. We also implement a supporting-tool that can detect the above 15 types of vulnerabilities. Finally, we test the tool by 15 chaincodes collected from GitHub and unknown vulnerabilities were detected in 13 projects. The precision turned out to be 91% after manual inspection. In order to verify the recall rate, we manually inject 30 vulnerabilities into the collected chaincodes and all of them are detected. The evaluation results show the accuracy of the proposed vulnerability detection method for Hyperledger Fabric smart contracts.
TL;DR: Microwalk-CI is introduced, a novel side-channel analysis framework for easy integration into a JavaScript development workflow, that extends existing dynamic approaches with a new analysis algorithm, that allows efficient localization and quantification of leakages, making it suitable for use in practical development.
Abstract: Secret-dependent timing behavior in cryptographic implementations has resulted in exploitable vulnerabilities, undermining their security. Over the years, numerous tools to automatically detect timing leakage or even to prove their absence have been proposed. However, a recent study at IEEE S&P 2022 showed that, while many developers are aware of one or more analysis tools, they have major difficulties integrating these into their workflow, as existing tools are tedious to use and mapping discovered leakages to their originating code segments requires expert knowledge. In addition, existing tools focus on compiled languages like C, or analyze binaries, while the industry and open-source community moved to interpreted languages, most notably JavaScript. In this work, we introduce Microwalk-CI, a novel side-channel analysis framework for easy integration into a JavaScript development workflow. First, we extend existing dynamic approaches with a new analysis algorithm, that allows efficient localization and quantification of leakages, making it suitable for use in practical development. We then present a technique for generating execution traces from JavaScript applications, which can be further analyzed with our and other algorithms originally designed for binary analysis. Finally, we discuss how Microwalk-CI can be integrated into a continuous integration (CI) pipeline for efficient and ongoing monitoring. We evaluate our analysis framework by conducting a thorough evaluation of several popular JavaScript cryptographic libraries, and uncover a number of critical leakages.
TL;DR: In this paper , symbolic execution is used to deobfuscate and analyze Excel 4.0 Office Macro (XL4) files automatically in a dynamic analysis environment (a sandbox) and concretize any symbolic values encountered during symbolic exploration.
Abstract: Malicious software (malware) poses a significant threat to the security of our networks and users. In the ever-evolving malware landscape, Excel 4.0 Office macros (XL4) have recently become an important attack vector. These macros are often hidden within apparently legitimate documents and under several layers of obfuscation. As such, they are difficult to analyze using static analysis techniques. Moreover, the analysis in a dynamic analysis environment (a sandbox) is challenging because the macros execute correctly only under specific environmental conditions that are not always easy to create. This paper presents SYMBEXCEL, a novel solution that leverages symbolic execution to deobfuscate and analyze Excel 4.0 macros automatically. Our approach proceeds in three stages: (1) The malicious document is parsed and loaded in memory; (2) Our symbolic execution engine executes the XL4 formulas; and (3) Our Engine concretizes any symbolic values encountered during the symbolic exploration, therefore evaluating the execution of each macro under a broad range of (meaningful) environment configurations. SYMBEXCEL significantly outperforms existing deobfuscation tools, allowing us to reliably extract Indicators of Compromise (IoCs) and other critical forensics information. Our experiments demonstrate the effectiveness of our approach, especially in deobfuscating novel malicious documents that make heavy use of environment variables and are often not identified by commercial anti-virus software.
TL;DR: The results prove the efficiency of permissions and the action repetition feature set and their influential roles in detecting malware in Android applications and show empirically very close accuracy results when using static, dynamic, and hybrid analyses.
Abstract: Android applications have recently witnessed a pronounced progress, making them among the fastest growing technological fields to thrive and advance. However, such level of growth does not evolve without some cost. This particularly involves increased security threats that the underlying applications and their users usually fall prey to. As malware becomes increasingly more capable of penetrating these applications and exploiting them in suspicious actions, the need for active research endeavors to counter these malicious programs becomes imminent. Some of the studies are based on dynamic analysis, and others are based on static analysis, while some are completely dependent on both. In this paper, we studied static, dynamic, and hybrid analyses to identify malicious applications. We leverage machine learning classifiers to detect malware activities as we explain the effectiveness of these classifiers in the classification process. Our results prove the efficiency of permissions and the action repetition feature set and their influential roles in detecting malware in Android applications. Our results show empirically very close accuracy results when using static, dynamic, and hybrid analyses. Thus, we use static analyses due to their lower cost compared to dynamic and hybrid analyses. In other words, we found the best results in terms of accuracy and cost (the trade-off) make us select static analysis over other techniques.
TL;DR: This paper applies numerous obfuscation techniques to Wasm programs, and test their effectiveness in producing a fully obfuscated Wasm program, and shows that obfuscation can be highly effective and can cause even a state-of-the-art detector to misclassify the obfuscate Wasm samples.
Abstract: WebAssembly (Wasm) has seen a lot of attention lately as it spreads through the mobile computing domain and becomes the new standard for performance-oriented web development. It has diversified its uses far beyond just web applications by acting as an execution environment for mobile agents, containers for IoT devices, and enabling new serverless approaches for edge computing. Within the numerous uses of Wasm, not all of them are benign. With the rise of Wasm-based cryptojacking malware, analyzing Wasm applications has been a hot topic in the literature, resulting in numerous Wasm-based cryptojacking detection systems. Many of these methods rely on static analysis, which traditionally can be circumvented through obfuscation. However, the feasibility of the obfuscation techniques for Wasm programs has never been investigated thoroughly. In this paper, we address this gap and perform the first look at code obfuscation for Wasm. We apply numerous obfuscation techniques to Wasm programs, and test their effectiveness in producing a fully obfuscated Wasm program. Particularly, we obfuscate both benign Wasm-based web applications and cryptojacking malware instances and feed them into a state-of-the-art Wasm cryptojacking detector to see if current Wasm analysis methods can be subverted with obfuscation. Our analysis shows that obfuscation can be highly effective and can cause even a state-of-the-art detector to misclassify the obfuscated Wasm samples.
TL;DR: In this paper , a hybrid solution that combines static and dynamic approaches is proposed to take advantage of their strengths while covering their weaknesses, and an iterative clustering process is applied on the processed data in order to generate the microservices decomposition.
Abstract: In order to benefit from the advantages offered by the microservices architectural design, many companies have started migrating their monolithic application to this newer design. However, due to the high cost and development time associated to this task, automated approaches need to be developed to solve these issues. Solutions that tackle this problem can be classified based on the information available for the monolithic application which are often based on source code or runtime traces. The latter provides a more accurate representation of the interactions between the classes within the application however it often fails to cover all of the classes. On the other hand, the source code of the application is more readily available and can be used to extract additional information like semantic meaning of the classes. The objective of this paper is to provide a hybrid solution that combines both of these approaches in order to take advantage of their strengths while covering their weaknesses. The proposed solution performs static and dynamic analysis on the monolithic application based on the available information and the user’s input. Afterwards, an iterative clustering process is applied on the processed data in order to generate the microservices decomposition. We compare different strategies for combining the static and dynamic approaches and we evaluate the performance of the hybrid approach compared to each of the separate approaches on 4 monolith applications. We provide as well a comparison with state-of-the-art solutions.
TL;DR: In this paper, the authors introduce an explicit notion of consistency as a graduated property, depending on the number of constraint violations in a graph, which has enabled the definition of notions such as constraint-preserving and constraint-guaranteeing graph transformations.
TL;DR: VarAlyzer as mentioned in this paper transforms preprocessor constructs to plain C while preserving their variability and semantics, and then solves any given distributive analysis problem on transformed product lines in a variability-aware manner.
Abstract: Abstract Many critical codebases are written in C, and most of them use preprocessor directives to encode variability, effectively encoding software product lines. These preprocessor directives, however, challenge any static code analysis. SPLlift, a previously presented approach for analyzing software product lines, is limited to Java programs that use a rather simple feature encoding and to analysis problems with a finite and ideally small domain. Other approaches that allow the analysis of real-world C software product lines use special-purpose analyses, preventing the reuse of existing analysis infrastructures and ignoring the progress made by the static analysis community. This work presents VarAlyzer , a novel static analysis approach for software product lines. VarAlyzer first transforms preprocessor constructs to plain C while preserving their variability and semantics. It then solves any given distributive analysis problem on transformed product lines in a variability-aware manner. VarAlyzer ’s analysis results are annotated with feature constraints that encode in which configurations each result holds. Our experiments with 95 compilation units of OpenSSL show that applying VarAlyzer enables one to conduct inter-procedural, flow-, field- and context-sensitive data-flow analyses on entire product lines for the first time, outperforming the product-based approach for highly-configurable systems.
TL;DR: A vulnerability assessment model to unify the vulnerability measurement standards is proposed and a static analysis tool called SmartFast is designed, which expresses the contract source code as a novel intermediate representation named SmartIR and can detect more kinds of vulnerabilities with a higher precision rate and a recall rate.
TL;DR: In this article , the authors present a literature review on static analysis techniques for Infrastructure as Code (IaC) in DevOps, and describe analysis techniques, defect categories and platforms targeted by tools.
Abstract: The increasing use of Infrastructure as Code (IaC) in DevOps leads to benefits in speed and reliability of deployment operation, but extends to infrastructure challenges typical of software systems. IaC scripts can contain defects that result in security and reliability issues in the deployed infrastructure: techniques for detecting and preventing them are needed. We analyze and survey the current state of research in this respect by conducting a literature review on static analysis techniques for IaC. We describe analysis techniques, defect categories and platforms targeted by tools in the literature.
TL;DR: Sorald as mentioned in this paper uses metaprogramming templates to transform the abstract syntax trees of programs and suggests fixes for static analysis warnings, reducing the burden on the developer from interpreting and fixing static issues, to inspecting and approving full fledged solutions.
Abstract: Previous work has shown that early resolution of issues detected by static code analyzers can prevent major costs later on. However, developers often ignore such issues for two main reasons. First, many issues should be interpreted to determine if they correspond to actual flaws in the program. Second, static analyzers often do not present the issues in a way that is actionable. To address these problems, we present Sorald: a novel system that uses metaprogramming templates to transform the abstract syntax trees of programs and suggests fixes for static analysis warnings. Thus, the burden on the developer is reduced from interpreting and fixing static issues, to inspecting and approving full fledged solutions. Sorald fixes violations of 10 rules from SonarJava, one of the most widely used static analyzers for Java. We evaluate Sorald on a dataset of 161 popular repositories on Github. Our analysis shows the effectiveness of Sorald as it fixes 65% (852/1,307) of the violations that meets the repair preconditions. Overall, our experiments show it is possible to automatically fix notable violations of the static analysis rules produced by the state-of-the-art static analyzer SonarJava.
TL;DR: In this paper, the authors present the results of two studies that investigate the impact of using static analysis to complement the performance of existing dynamic analysis tools tailored for mining Android sandboxes, in the task of identifying malicious behavior.
TL;DR: Thorough experiments show that, compared with the existing web application vulnerability detection methods, Cefuzz significantly improves the verification effect of RCE vulnerabilities, discovering 13 unknown vulnerabilities in 10 popular web CMSes.
Abstract: Current static detection technology for web application vulnerabilities relies highly on specific vulnerability patterns, while dynamic analysis technology has the problem of low vulnerability coverage. In order to improve the ability to detect unknown web application vulnerabilities, this paper proposes a PHP Remote Command/Code Execution (RCE) vulnerability directed fuzzing method. Our method is a combination of static and dynamic methods. First, we obtained the potential RCE vulnerability information of the web application through fine-grained static taint analysis. Then we performed instrumentation for the source code of the web application based on the potential RCE vulnerability information to provide feedback information for fuzzing. Finally, a loop feedback web application vulnerability automatic verification mechanism was established in which the vulnerability verification component provides feedback information, and the seed mutation component improves the vulnerability test seed based on the feedback information. On the basis of this method, the prototype system Cefuzz (Command/Code Execution Fuzzer) is implemented. Thorough experiments show that, compared with the existing web application vulnerability detection methods, Cefuzz significantly improves the verification effect of RCE vulnerabilities, discovering 13 unknown vulnerabilities in 10 popular web CMSes.
TL;DR: In this paper , the authors reformulated the static and kinematic theorems for limit analysis of masonry arches on moving supports in terms of the work done by loads.
TL;DR: A dynamic analysis mechanism for Android malware detection that modeled the syscall trace of an application as an ordered graph which enabled to infer various kinds of features in the form of centrality measures related to that syscal trace of the application.
Abstract: These days it is found that malware authors tend to create new variants of existing Android malware by using various kinds of obfuscation techniques. These kinds of obfuscated malware applications can bypass all the current antimalware products which rely on static analysis techniques to detect the malicious behavior. Hence, it is essential to develop innovative dynamic analysis mechanisms for Android malware detection. It is known that, the malicious behavior of statically obfuscated malware applications can get reflected in the system call (syscall) trace generated by them. Most of the existing syscall based mechanisms depend only on the features derived from the syscall counts for malware detection. These syscall count related features are inadequate to capture many other useful characteristics related to the syscalls in a sequence. In order to overcome this limitation, we modeled the syscall trace of an application as an ordered graph which enabled to infer various kinds of features in the form of centrality measures related to that syscall trace of the application. Then, these centrality measures are fed to an ML model to predict the malicious behavior. From the implementation results, we found that our mechanism can detect malware apps with an accuracy of 0.99.