TL;DR: WhiteFox, a white-box compiler fuzzer, leverages Large Language Models to generate high-quality test programs, exercising intricate optimizations and detecting deep logic bugs in compilers, including PyTorch Inductor, TensorFlow-XLA, and TensorFlow Lite, with 92 previously unknown bugs confirmed.
Abstract: Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences over the software supply chain. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates test programs without sufficient understanding of internal compiler behaviors. As such, they often fail to construct test programs to exercise intricate optimizations. Meanwhile, traditional white-box techniques, such as symbolic execution, are computationally inapplicable to the giant codebase of compiler systems. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and even have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, guiding LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization, with a spotlight on detecting deep logic bugs in the emerging deep learning (DL) compilers. WhiteFox adopts a multi-agent framework: (i) an LLM-based analysis agent examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) an LLM-based generation agent produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are also used as feedback to further enhance the test generation prompt on the fly. Our evaluation on the three most popular DL compilers (i.e., PyTorch Inductor, TensorFlow-XLA, and TensorFlow Lite) shows that WhiteFox can generate high-quality test programs to exercise deep optimizations requiring intricate conditions, practicing up to 8 times more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 101 bugs for the compilers under test, with 92 confirmed as previously unknown and 70 already fixed. Notably, WhiteFox has been recently acknowledged by the PyTorch team, and is in the process of being incorporated into its development workflow. Finally, beyond DL compilers, WhiteFox can also be adapted for compilers in different domains, such as LLVM, where WhiteFox has already found multiple bugs.
TL;DR: Boosting compiler testing by injecting real-world code significantly improves the expressiveness of existing test generators and leads to the discovery of numerous previously unknown bugs.
Abstract: We introduce a novel approach for testing optimizing compilers with code from real-world applications. The main idea is to construct well-formed programs by fusing multiple code snippets from various real-world projects. The key insight is backed by the fact that the large volume of real-world code exercises rich syntactical and semantic language features, which current engineering-intensive approaches like random program generators are hard to fully support. To construct well-formed programs from real-world code, our approach works by (1) extracting real-world code at the granularity of function, (2) injecting function calls into seed programs, and (3) leveraging dynamic execution information to maintain the semantics and build complex data dependencies between injected functions and the seed program. With this idea, our approach complements the existing generators by boosting their expressiveness via fusing real-world code in a semantics-preserving way. We implement our idea in a tool, Creal, to test C compilers. In a nine-month testing period, we have reported 132 bugs to GCC and LLVM, two of the most popular and well-tested C compilers. At the time of writing, 121 of them have been confirmed as unknown bugs, and 101 of them have been fixed. Most of these bugs were miscompilations, and many were recognized as long-latent and critical. Our evaluation results evidently demonstrate the significant advantage of using real-world code to stress-test compilers. We believe this idea will benefit the general compiler testing direction and will be directly applicable to other compilers.
TL;DR: NAQC offers long coherence times, scalability, and multi-qubit gate support. To fully utilize these capabilities, new software tools are required. Close collaboration between tool developers and hardware experts is essential to ensure that software tools adhere to physical constraints.
Abstract: Abstract Neutral Atom Quantum Computing (NAQC) emerges as a promising hardware platform primarily due to its long coherence times and scalability. Additionally, NAQC offers computational advantages encompassing potential long-range connectivity, native multi-qubit gate support, and the ability to physically rearrange qubits with high fidelity. However, for the successful operation of a NAQC processor, one additionally requires new software tools to translate high-level algorithmic descriptions into a hardware executable representation, taking maximal advantage of the hardware capabilities. Realizing new software tools requires a close connection between tool developers and hardware experts to ensure that the corresponding software tools obey the corresponding physical constraints. This work aims to provide a basis to establish this connection by investigating the broad spectrum of capabilities intrinsic to the NAQC platform and its implications on the compilation process. To this end, we first review the physical background of NAQC and derive how it affects the overall compilation process by formulating suitable constraints and figures of merit. We then provide a summary of the compilation process and discuss currently available software tools in this overview. Finally, we present selected case studies and employ the discussed figures of merit to evaluate the different capabilities of NAQC and compare them between two hardware setups.
TL;DR: LLM-driven testsuite for compiler validation generates thousands of tests using LLMs to validate OpenACC directives. The results show that LLMs can be effectively used to automate test generation and improve compiler validation.
Abstract: Large language models (LLMs) are a new and powerful tool for a wide span of applications involving natural language and demonstrate impressive code generation abilities. The goal of this work is to automatically generate tests and use these tests to validate and verify compiler implementations of a directive-based parallel programming paradigm, OpenACC. To do so, in this paper, we explore the capabilities of state-of-the-art LLMs, including open-source LLMs - Meta's Codellama, Phind's fine-tuned version of Codellama, Deepseek's Deepseek Coder and closed-source LLMs - OpenAI's GPT-3.5-Turbo and GPT-4-Turbo. We further fine-tune the open-source LLMs and GPT-3.5-Turbo using our own testsuite dataset along with using the OpenACC specification. We also explored these LLMs using various prompt engineering techniques that include code template, template with retrieval-augmented generation (RAG), one-shot example, one-shot with RAG, expressive prompt with code template and RAG. This paper highlights our findings from over 5000 tests generated via all the above mentioned methods. Our contributions include: (a) exploring the capabilities of the latest and relevant LLMs for code generation, (b) investigating fine-tuning and prompt methods, and (c) analyzing the outcome of LLMs generated tests including manually analysis of representative set of tests. We found the LLM Deepseek-Coder-33b-Instruct produced the most passing tests followed by GPT-4-Turbo.
Shashank Sonkar, X Chen, M. van Le, N.-S. Liu, Debshila Basu Mallick, Richard G. Baraniuk
18 Mar 2024
TL;DR: High-quality synthetic datasets are essential for LLM-based ITS. However, complex calculations present a challenge for GPT-4. This paper introduces a novel stateful prompt design that incorporates Python code soliloquies to address this limitation. The approach significantly enhances the accuracy and computational reliability of the model.
Abstract: High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems (ITS) that utilize a Large Language Model (LLM) backend. Synthetic student-teacher dialogues, generated using advanced GPT-4 models, are a common strategy for creating these datasets. However, subjects like physics that entail complex calculations pose a challenge. While GPT-4 presents impressive language processing capabilities, its limitations in fundamental mathematical reasoning curtail its efficacy for such subjects. To tackle this limitation, we introduce in this paper an innovative stateful prompt design. Our design orchestrates a mock conversation where both student and tutorbot roles are simulated by GPT-4. Each student response triggers an internal monologue, or 'code soliloquy' in the GPT-tutorbot, which assesses whether its subsequent response would necessitate calculations. If a calculation is deemed necessary, it scripts the relevant Python code and uses the Python output to construct a response to the student. Our approach notably enhances the quality of synthetic conversation datasets, especially for subjects that are calculation-intensive. The preliminary Subject Matter Expert evaluations reveal that our Higgs model, a fine-tuned LLaMA model, effectively uses Python for computations, which significantly enhances the accuracy and computational reliability of Higgs' responses.
TL;DR: Compiler autotuning through multiple phase learning significantly outperforms existing techniques by leveraging lightweight learning and a novel particle swarm algorithm to predict and tune optimization flag combinations.
Abstract: Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve the runtime performance (e.g., small execution time) of the compiler program. Due to the large number of optimization flags and their combination, it is difficult for compiler users to manually tune compiler optimization flags. In the literature, a number of autotuning techniques have been proposed, which tune optimization flags for a compiled program by comparing its actual runtime performance with different optimization flag combinations. Due to the huge search space and heavy actual runtime cost, these techniques suffer from the widely recognized efficiency problem. To reduce the heavy runtime cost, in this article we propose a lightweight learning approach that uses a small number of actual runtime performance data to predict the runtime performance of a compiled program with various optimization flag combinations. Furthermore, to reduce the search space, we design a novel particle swarm algorithm that tunes compiler optimization flags with the prediction model. To evaluate the performance of the proposed approach, CompTuner, we conduct an extensive experimental study on two popular C compilers, GCC and LLVM, with two widely used benchmarks, cBench and PolyBench. The experimental results show that CompTuner significantly outperforms the six compared techniques, including the state-of-the-art technique BOCA.
TL;DR: High performance compiler for very large-scale surface code computations, translating quantum circuits to surface code operations based on lattice surgery. Supports error correction workflow, customizable circuit layouts, quantum benchmarking, and resource estimation. Can process millions of gates using a streaming pipeline at high speed.
Abstract: We present the first high performance compiler for very large scale quantum error correction: it translates an arbitrary quantum circuit to surface code operations based on lattice surgery. Our compiler offers an end to end error correction workflow implemented by a pluggable architecture centered around an intermediate representation of lattice surgery instructions. Moreover, the compiler supports customizable circuit layouts, can be used for quantum benchmarking and includes a quantum resource estimator. The compiler can process millions of gates using a streaming pipeline at a speed geared towards real-time operation of a physical device. We compiled within seconds 80 million logical surface code instructions, corresponding to a high precision Clifford+T implementation of the 128-qubit Quantum Fourier Transform (QFT).
Utkarsh Utkarsh, Valentin Churavy, Yongxin Ma, Tim Besard, Prakitr Srisuma, Tim Gymnich, Adam R. Gerlach, Alan Edelman, George Barbastathis, Richard D. Braatz, Chris Rackauckas
TL;DR: High-performance, vendor-agnostic ODE/SDE solver library for GPUs, enabling fast and efficient numerical solution of complex scientific models on multiple platforms.
Abstract: We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by the user. Our approach achieves state-of-the-art performance compared to hand-optimized CUDA-C++ kernels while performing 20–100× faster than the vectorizing map (vmap) approach implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel, and Apple GPUs demonstrates performance portability and vendor-agnosticism. We show composability with MPI to enable distributed multi-GPU workflows. The implemented solvers are fully featured – supporting event handling, automatic differentiation, and incorporation of datasets via the GPU's texture memory – allowing scientists to take advantage of GPU acceleration on all major current architectures without changing their model code and without loss of performance. We distribute the software as an open-source library, DiffEqGPU.jl.
Chris Cummins, Volker Seeker, Dejan Grubisic, Jérémy Rapin, Jonas Gehring, Gabriel Synnaeve, Hugh Leather
27 Jun 2024
TL;DR: Researchers introduce Meta Large Language Model Compiler (LLM Compiler), a pre-trained model for code optimization tasks, trained on 546 billion tokens of LLVM-IR and assembly code, achieving 77% of autotuning search potential and 45% disassembly round trip accuracy.
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of code and compiler optimization remains underexplored. Training LLMs is resource-intensive, requiring substantial GPU hours and extensive data collection, which can be prohibitive. To address this gap, we introduce Meta Large Language Model Compiler (LLM Compiler), a suite of robust, openly available, pre-trained models specifically designed for code optimization tasks. Built on the foundation of Code Llama, LLM Compiler enhances the understanding of compiler intermediate representations (IRs), assembly language, and optimization techniques. The model has been trained on a vast corpus of 546 billion tokens of LLVM-IR and assembly code and has undergone instruction fine-tuning to interpret compiler behavior. LLM Compiler is released under a bespoke commercial license to allow wide reuse and is available in two sizes: 7 billion and 13 billion parameters. We also present fine-tuned versions of the model, demonstrating its enhanced capabilities in optimizing code size and disassembling from x86_64 and ARM assembly back into LLVM-IR. These achieve 77% of the optimising potential of an autotuning search, and 45% disassembly round trip (14% exact match). This release aims to provide a scalable, cost-effective foundation for further research and development in compiler optimization by both academic researchers and industry practitioners.
TL;DR: Single-pass compilers for WebAssembly baseline tiers improve startup time while maintaining code quality.
Abstract: Compilers face an intrinsic tradeoff between compilation speed and code quality. The tradeoff is particularly stark in a dynamic setting where JIT compilation time contributes to application runtime. Many systems now employ multiple compilation tiers, where one tier offers fast compile speed while another has much slower compile speed but produces higher quality code. With proper heuristics on when to use each, the overall performance is better than using either compiler in isolation. At the introduction of WebAssembly into the Web platform in 2017, most engines employed optimizing compilers and pre-compiled entire modules before execution. Yet since that time, all Web engines have introduced new "baseline" compiler tiers for Wasm to improve startup time. Further, many new non-web engines have appeared, some of which also employ simple compilers. In this paper, we demystify single-pass compilers for Wasm, explaining their internal algorithms and tradeoffs, as well as providing a detailed empirical study of those employed in production. We show the design of a new single-pass compiler for a research Wasm engine that integrates with an in-place interpreter and host garbage collector using value tags, while also supporting flexible instrumentation. In experiments, we measure the effectiveness of optimizations targeting value tags and find, somewhat surprisingly, that the runtime overhead can be reduced to near zero. We also assess the relative compile speed and execution time of six baseline compilers and place these baseline compilers in a two-dimensional tradeoff space with other execution tiers for Wasm.
Ettore Tiotto, Víctor L. Pérez, Whitney Tsang, Lukáš Sommer, Julian Oppermann, Victor Lomüller, Mehdi Goli, James Brodman
2 Mar 2024
TL;DR: An MLIR-based SYCL compiler is designed and implemented to address challenges in SYCL compiler design due to the loss of high-level structure and semantics and the inability to reason about host and device code simultaneously. The approach enables powerful device code optimizations and analyses across host and device code, resulting in speedups of up to 4.3x on benchmark applications compared to two LLVM-based SYCL implementations.
Abstract: Similar to other programming models, compilers for SYCL, the open programming model for heterogeneous computing based on C++, would benefit from access to higher-level intermediate representations. The loss of high-level structure and semantics caused by premature lowering to low-level intermediate representations and the inability to reason about host and device code simultaneously present major challenges for SYCL compilers. The MLIR compiler framework, through its dialect mechanism, allows to model domain-specific, high-level intermediate representations and provides the necessary facilities to address these challenges. This work therefore describes practical experience with the design and implementation of an MLIR-based SYCL compiler. By modeling key elements of the SYCL programming model in host and device code in the MLIR dialect framework, the presented approach enables the implementation of powerful device code optimizations as well as analyses across host and device code. Compared to two LLVM-based SYCL implementations, this yields speedups of up to 4.3x on a collection of SYCL benchmark applications. Finally, this work also discusses challenges encountered in the design and implementation and how these could be addressed in the future.
TL;DR: Refined input leads to degraded output in some cases due to unexpected interactions and phase ordering issues. This work identifies and quantifies such issues and develops a testing method to identify and fix them.
Abstract: To optimize a program, a compiler needs precise information about it. Significant effort is dedicated to improving the ability of compilers to analyze programs, with the expectation that more information results in better optimization. But this assumption does not always hold: due to unexpected interactions between compiler components and phase ordering issues, sometimes more information leads to worse optimization. This can lead to wasted research and engineering effort whenever compilers cannot efficiently leverage additional information. In this work, we systematically examine the extent to which additional information can be detrimental to compilers. We consider two types of information: dead code, i.e., whether a program location is unreachable, and value ranges, i.e., the possible values a variable can take at a specific program location. Given a seed program, we refine it with additional information and check whether this degrades the output. Based on this approach, we develop a fully automated and effective testing method for identifying such issues, and through an extensive evaluation and analysis, we quantify their existence and prevalence in widely used compilers. In particular, we have reported 59 cases in GCC and LLVM, of which 55 have been confirmed or fixed so far, highlighting the practical relevance and value of our findings. This work’s fresh perspective opens up a new direction in understanding and improving compilers.
TL;DR: Unsolvable loop analysis: Automatically synthesizing invariants for unsolvable loops. The problem of automatically generating invariants for unsolvable loops is challenging, but this paper presents a technique for synthesizing invariants for such loops.
Abstract: Abstract Automatically generating invariants, key to computer-aided analysis of probabilistic and deterministic programs and compiler optimisation, is a challenging open problem. Whilst the problem is in general undecidable, the goal is settled for restricted classes of loops. For the class of solvable loops, introduced by Rodríguez-Carbonell and Kapur (in: Proceedings of the ISSAC, pp 266–273, 2004), one can automatically compute invariants from closed-form solutions of recurrence equations that model the loop behaviour. In this paper we establish a technique for invariant synthesis for loops that are not solvable, termed unsolvable loops. Our approach automatically partitions the program variables and identifies the so-called defective variables that characterise unsolvability. Herein we consider the following two applications. First, we present a novel technique that automatically synthesises polynomials from defective monomials, that admit closed-form solutions and thus lead to polynomial loop invariants. Second, given an unsolvable loop, we synthesise solvable loops with the following property: the invariant polynomials of the solvable loops are all invariants of the given unsolvable loop. Our implementation and experiments demonstrate both the feasibility and applicability of our approach to both deterministic and probabilistic programs.
TL;DR: The newly developed mzML and imzML libraries enable the processing of mass spectrometry data in Julia, offering high performance and scalability for large-scale MS-Omics and MS imaging data processing workflows.
Abstract: Julia combines the virtues of high-level and low-level programming languages: The code is human-readable, and the performance of the created binaries competes with machine-orientated compilers. Thus, Julia is popular in "Big Data" sciences. Reading mass spectrometry (MS) data with Julia was impossible until now due to missing libraries. Here, we present a Julia library for importing mass spectrometry (MS) data in HUPO standard mzML and imzML formats and demonstrate its function with direct and ambient ionization MS, liquid chromatography-MS, and MS imaging data on standard platforms (Windows, Linux, and Mac OS). The processing speed of Julia for reading imzML MS imaging files was up to 214 times faster than the comparable code in R. Julia can remove bottlenecks for computationally demanding tasks in large-scale MS-Omics and MS imaging data processing workflows and supports their agile development. In addition, time-critical and complex data evaluation tasks become possible, such as following the real-time monitoring of biological processes and pattern recognition in large MS imaging projects. Our mzML/imzML libraries and code examples are available under the terms of the MIT license from https://github.com/CINVESTAV-LABI/julia_mzML_imzML.
TL;DR: Tarski as mentioned in this paper is an open-source software package, written in C/C++, that provides operations like formula simplification and quantifier elimination for Tarski formulas.
TL;DR: VeriPipe is a near-zero-cost compiler/architecture codesign scheme for soft error resilience that achieves negligible overhead while ensuring reliable data execution.
Abstract: Among existing schemes for soft error resilience, acoustic-sensor-based detection stands out owing to its ability to prevent silent data corruption at low hardware cost. However, the state-of-the-art work not only incurs a considerable run-time overhead but also complicates the processor pipeline with intrusive microarchitectural modifications, hindering its practical deployment in real silicon. To this end, this paper presents VeriPipe, a near-zero-cost compiler/architecture codesign scheme for soft error resilience. VeriPipe compiler partitions input program to a series of regions (epochs) statically, while VeriPipe hardware verifies if they are error-free dynamically. In particular, VeriPipe achieves a simple yet efficient region-level verification where each region goes through a three-stage (Execute, Verify, and Commit) verification pipeline to ensure the absence of soft errors before proceeding to the next region. In particular, VeriPipe hardware overlaps the Verify stage of each region with the Execute stage of the next region, thereby effectively hiding the Verify delay. Experiments with 43 applications from SPEC2006/2017/NPB-CPP/SPLASH3/DoE Mini-Apps highlight the negligible overheads of VeriPipe, i.e., an average of 1% run-time overhead and a storage overhead of only 3 registers and 1 countdown timer.
Y.-L. Li, Dongwei Xiao, Zhibo Liu, Qi Pang, Shuai Wang
12 Jul 2024
TL;DR: MT-MPC is introduced, a metamorphic testing framework specifically designed for MPC compilers to effectively uncover erroneous compilations, and proposes three metamorphic relations (MRs) that are tailored for MPC programs to mutate high-level MPC programs.
Abstract: The demanding need to perform privacy-preserving computations among multiple data owners has led to the prosperous development of secure multi-party computation (MPC) protocols. MPC offers protocols for parties to jointly compute a function over their inputs while keeping those inputs private. To date, MPC has been widely adopted in various real-world, privacy-sensitive sectors, such as healthcare and finance. Moreover, to ease the adoption of MPC, industrial and academic MPC compilers have been developed to automatically translate programs describing arbitrary MPC procedures into low-level MPC executables. Compiling high-level descriptions into high-efficiency MPC executables is challenging: the compilation often involves converting high-level languages into several intermediate representations (IR), e.g., arithmetic or boolean circuits, optimizing the computation/communication cost, and picking proper MPC protocols (and underlying virtual machines) for a particular task and threat model. Various optimizations and heuristics are employed during the compilation procedure to improve the efficiency of the generated MPC executables. Despite the prosperous adoption of MPC compilers by industrial vendors and academia, a principled and systematic understanding of the correctness of MPC compilers does not yet exist. To fill this critical gap, this paper introduces MT-MPC, a metamorphic testing (MT) framework specifically designed for MPC compilers to effectively uncover erroneous compilations. Our approach proposes three metamorphic relations (MRs) that are tailored for MPC programs to mutate high-level MPC programs (compiler inputs). We then examine if MPC compilers yield semantics-equivalent MPC executables regarding the original and mutated MPC programs by comparing their execution results. Real-world MPC compilers exhibit a high level of engineering quality. Nevertheless, we detected 4,772 inputs that can result in erroneous compilations in three popular MPC compilers available on the market. While the discovered error-triggering inputs do not cause the MPC compilers to crash directly, they can lead to the generation of incorrect MPC executables, jeopardizing the underlying dependability of the computation. With substantial manual effort and help from the MPC compiler developers, we uncovered thirteen bugs in these MPC compilers by debugging them using the error-triggering inputs. Our proposed testing frameworks and findings can be used to guide developers in their efforts to improve MPC compilers.
TL;DR: EDPM is a domain-specific language for performance monitoring in C/C++ programs that simplifies and optimizes the process of gathering performance data. It reduces the number of lines of code required for instrumentation, enables flexible configurations of regions, and integrates seamlessly with existing software processes.
Abstract: The utilization of performance monitoring probes is a valuable tool for programmers to gather performance data. However, the manual insertion of these probes can result in an increase in code size, code obfuscation, and an added burden of learning different APIs associated with performance monitoring tools. To mitigate these issues, EDPM, an embedded domain-specific language, was developed to provide a higher level of abstraction for annotating regions of code that require instrumentation in C and C++ programs. This paper presents the design and implementation of EDPM and compares it to the well-known tool PAPI, in terms of required lines of code, flexibility in configuring regions, and performance overhead. The results of this study demonstrate that EDPM is a low-resolution profiling tool that offers a reduction in required lines of code and enables programmers to express various configurations of regions. Furthermore, the design of EDPM is such that its pragmas are ignored by the standard compiler, allowing for seamless integration into existing software processes without disrupting build systems or increasing the size of the executable. Additionally, the design of the EDPM pre-compiler allows for the extension of available performance counters while maintaining a high level of abstraction for programmers. Therefore, EDPM offers a promising solution to simplify and optimize performance monitoring in C and C++ programs.
TL;DR: Merging architectural design and robotic planning using interactive agent-based modelling for collective robotic construction leads to more designer control, adjustment for tolerances, and potential for architectural reconfiguration.
Abstract: Abstract Most research on collective robotic construction (CRC) separates the architectural design and robotic path planning phases of the overall construction process. Specifically, a structure is designed and afterwards sent to a planner or compiler that returns instructions for the assembly of the structure with the robots at hand. Although this has led to the assembly of spatially complex structures, it obscures the planning process, making it inaccessible to the architect. Considering that one potential of CRC is that the architect can perform as more than a designer of static structures, this paper showcases how agent-based modelling can collapse the architectural design and robotic planning phases for CRC. As such the overall construction workflow is upended, leading to more designer control, adjustment for tolerances in the construction process, a more general understanding of the processes, and the potential for architectural reconfiguration when working with CRC systems. This is demonstrated through the presentation of an agent-based model for assembling a planar structure using a previously developed CRC system.
TL;DR: This paper investigates numerical deviations introduced by the TVM compiler in deep learning models, proposing TracNe, an approach to reveal and isolate such deviations, and evaluating its effectiveness on 69 models, outperforming existing techniques in detection and isolation.
Abstract: Deep learning (DL) compilers are crucial for deploying DL models and speeding up their inferences. Meanwhile, they may introduce numerical deviations, and finally undefined or unexpected behaviours, into DL models. Many efforts have been spent on studying DL compilers’ logic bugs, whilst researchers often overlook numerical deviations introduced by DL compilers. This paper studies hazards and root causes of numerical deviations introduced by Apache’s TVM, a state-of-the-art, open-sourced DL compiler. This paper further proposes TracNe, an approach composed of an MEGA searcher and a tracer, to reveal and isolate compiler-introduced numerical deviations. Given a DL model, the MEGA searcher searches for deviation-triggering inputs and checks whether the model suffers from numerical deviations. The tracer performs a semantic-based matching between the models before and after compilation, isolating an erroneous scope in the compiled model. We evaluate TracNe on 60 synthesis and 9 industrial-edge models. The results show that TracNe reveals 5.6× more deviation-prone models than two typical search algorithms (MCMC and DEMC); it also localizes 64% more deviations than PLiner, a state-of-the-art isolation technique, while reducing 94.6% of isolation time of Pliner.
Mengxiao Zhang, Yongqiang Tian, Zhenyang Xu, Yiwen Dong, Shin Hwei Tan, C. P. Sun
11 Sep 2024
TL;DR: This paper proposes LPR, a novel technique that leverages large language models (LLMs) to perform language-specific program reduction for multiple languages, achieving better results than state-of-the-art Vulcan and C-Reduce in terms of program size and efficiency.
Abstract: Program reduction is a widely used technique to facilitate debugging compilers by automatically minimizing programs that trigger compiler bugs.Existing program reduction techniques are either generic to a wide range of languages (such as Perses and Vulcan) or specifically optimized for one certain language by exploiting language-specific knowledge (e.g., C-Reduce).However, synergistically combining both generality across languages and optimality to a specific language in program reduction is yet to be explored.This paper proposes LPR, the first LLMs-aided technique leveraging LLMs to perform language-specific program reduction for multiple languages.The key insight is to utilize both the language generality of program reducers such as Perses and the languagespecific semantics learned by LLMs.Concretely, language-generic program reducers can efficiently reduce programs into a small size that is suitable for LLMs to process; LLMs can effectively transform programs via the learned semantics to create new reduction opportunities for the language-generic program reducers to further reduce the programs.Our thorough evaluation on 50 benchmarks across three programming languages (i.e., C, Rust and JavaScript) has demonstrated LPR's practicality and superiority over Vulcan, the state-of-the-art language-generic program reducer.For effectiveness, LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs on benchmarks in C, Rust and JavaScript, separately.Moreover, LPR and Vulcan have the potential to complement each other.For the C language for which C-Reduce is optimized, by applying Vulcan to the output produced by LPR, we can attain program sizes that are on par with those achieved by C-Reduce.For efficiency perceived by users, LPR is more efficient when reducing large and complex programs, taking 10.77%, 34.88%, 36.96% less time than Vulcan to finish all the benchmarks in C, Rust and JavaScript, separately. CCS CONCEPTS• Software and its
Qi Zhan, Xing Hu, Z. Y. Li, Xin Xia, David Lo, Shanping Li
12 Apr 2024
TL;DR: PS3 is a new approach to precisely test the presence of a patch in a large-scale software system based on semantic-level symbolic signature.
Abstract: During software development, vulnerabilities have posed a significant threat to users. Patches are the most effective way to combat vulnerabilities. In a large-scale software system, testing the presence of a security patch in every affected binary is crucial to ensure system security. Identifying whether a binary has been patched for a known vulnerability is challenging, as there may only be small differences between patched and vulnerable versions. Existing approaches mainly focus on detecting patches that are compiled in the same compiler options. However, it is common for developers to compile programs with very different compiler options in different situations, which causes inaccuracy for existing methods. In this paper, we propose a new approach named PS3, referring to precise patch presence test based on semantic-level symbolic signature. PS3 exploits symbolic emulation to extract signatures that are stable under different compiler options. Then PS3 can precisely test the presence of the patch by comparing the signatures between the reference and the target at semantic level.
TL;DR: Isolating compiler bugs by generating effective witness programs with large language models (LLM4CBI) effectively isolates bugs by leveraging LLMs to generate test programs and employs new components to overcome challenges associated with prompt formulation and selection.
Abstract: Compiler bugs pose a significant threat to safety-critical applications, and promptly as well as effectively isolating these bugs is crucial for assuring the quality of compilers. However, the limited availability of debugging information on reported bugs complicates the compiler bug isolation task. Existing compiler bug isolation approaches typically convert the problem into a test program mutation problem, but they are still limited by ineffective mutation strategies or high human effort requirements. Drawing inspiration from the recent progress of pre-trained Large Language Models (LLMs), such as ChatGPT, in code generation, we propose a new approach named LLM4CBI to utilize LLMs to generate effective test programs for compiler bug isolation. However, using LLMs directly for test program mutation may not yield the desired results due to the challenges associated with formulating precise prompts and selecting specialized prompts. To overcome the challenges, three new components are designed in LLM4CBI. First, LLM4CBI utilizes a program complexity-guided prompt production component, which leverages data and control flow analysis to identify the most valuable variables and locations in programs for mutation. Second, LLM4CBI employs a memorized prompt selection component, which adopts reinforcement learning to select specialized prompts for mutating test programs continuously. Third, a test program validation component is proposed to select specialized feedback prompts to avoid repeating the same mistakes during the mutation process. Compared with the state-of-the-art approaches (DiWi and RecBi) over 120 real bugs from the two most popular compilers, namely GCC and LLVM, our evaluation demonstrates the advantages of LLM4CBI: It can isolate 69.70%/21.74% and 24.44%/8.92% more bugs than DiWi and RecBi within Top-1/Top-5 ranked results. Additionally, we demonstrate that the LLMs component (i.e., GPT-3.5) used in LLM4CBI can be easily replaced by other LLMs while still achieving reasonable results in comparison to related studies.
Abstract: Formally verified compilers and formally verified static analyzers are a solution to the problem that certain industries face when they have to demonstrate to authorities that the object code they run truly corresponds to its source code and that it satisfies certain properties. From a scientific and technological point of view, they are a challenge: not only a number of nontrivial invariants and algorithms must be proved to be correct, but also the implementation must be reasonably effective so that the tools operate within reasonable time. Many optimizations in compilers rely on static analysis, and thus a formally verified compiler entails formally verified static analyses.In this article, we explain some difficulties, possible solutions, design choices and trade-offs pertaining to verified static analysis, in particular when the solution of the analysis is expressed as some form of tree, map or set.
TL;DR: Modern HLS tools require hardware design knowledge and non-trivial design space exploration. This research proposes a multi-level approach to bridge the gap between HLS and high-level frameworks, improving productivity and performance.
Abstract: Abstract High-Level Synthesis (HLS) tools simplify the design of hardware accelerators by automatically generating Verilog/VHDL code starting from a general-purpose software programming language. Because of the mismatch between the requirements of hardware descriptions and the characteristics of input languages, HLS tools still require hardware design knowledge and non-trivial design space exploration, which might be an obstacle for domain scientists seeking to accelerate applications written, for example, in Python-based programming frameworks. This research proposes a modern approach based on multi-level compiler technologies to bridge the gap between HLS and high-level frameworks, and to use domain-specific abstractions to solve domain-specific problems. The key enabling technology is the Multi-Level Intermediate Representation (MLIR), a framework that supports building reusable compiler infrastructure. The proposed approach uses MLIR to introduce new optimizations at appropriate levels of abstraction outside the HLS tool while still relying on years of HLS research in the low-level hardware generation steps; users and developers of HLS tools can thus increase their productivity, obtain accelerators with higher performance, and not be limited by the features of a specific (possibly closed-source) backend. The presented tools and techniques were designed, implemented, and tested to synthesize machine learning algorithms, but they are broadly applicable to any input specification written in a language that has a translation to MLIR. Generated accelerators can be deployed on Field Programmable Gate Arrays or Application-Specific Integrated Circuits, and they can reach high energy efficiency without any manual optimization of the code.
Yue Zhang, Melih Sirlanci, Ruoyu Wang, Zhiqiang Lin
2 Dec 2024
TL;DR: This study empirically investigates the impact of 282 compiler flags on dynamic symbolic execution (DSE) performance, analyzing 2,978,976 binary programs, and finds that most optimizations slow down DSE, but some reduce instructions and paths, improving performance.
Abstract: Compiler optimizations intend to transform a program into a semantic-equivalent one with improved performance, but it is unclear how these optimizations may impact the performance of dynamic symbolic execution (DSE) on binary code. To systematically understand the impact of compiler optimizations on two popular DSE techniques (i.e., symbolic exploration and symbolic tracing), this paper presents an empirical study that quantifies 209 GCC compilation flags and 73 Clang compilation flags to reveal both positive and negative optimizations to DSE. Our data set contains 992 unique test cases, which are produced from 3,449 source files in the GCC test suite. After analyzing 2,978,976 binary programs that we compiled with two compilers and various compilation flags, we found that although some optimizations make DSE faster, most optimizations will actually slow down DSE. Our analysis further reveals root causes behind these impacts. The most positive impacts that optimizations have on DSE come from the reduction of the number of instructions and program paths, whereas negative impacts are caused by a series of unexpected behaviors, including increased numbers of instructions or program paths, library function inlining preventing DSE engines from using function summaries, and arithmetic optimizations leading to more sophisticated constraints. Being the first in-depth analysis on why compiler flags influence the performance of DSE, this project sheds light on program transformations that can be applied before performing DSE tasks for better performance.