TL;DR: This paper introduces L A, a novel Hoare-style program logic that supports partial and total correctness, derives contracts for arbitrary control flow, and allows flexible decomposition strategy, while avoiding approximations and invariants, for various instruction set architectures.
Abstract: Enabling Hoare-style reasoning for low-level code is attractive since it opens the way to regain structure and modularity in a domain where structure is essentially absent. The field, however, has not yet arrived at a fully satisfactory solution, in the sense of avoiding restrictions on control flow (important for compiler optimization), controlling access to intermediate program points (important for modularity), and supporting total correctness. Proposals in the literature support some of these properties, but a solution that meets them all is yet to be found. We introduce the novel Hoare-style program logic L A , which interprets postconditions relative to program points when these are first encountered. The logic supports both partial and total correctness, derives contracts for arbitrary control flow, and allows one to freely choose decomposition strategy during verification while avoiding step-indexed approximations and global invariants. The logic can be instantiated for a variety of concrete instruction set architectures and intermediate languages. The rules of L A have been verified in the interactive theorem prover HOL4 and integrated with the toolbox HolBA for semi-automated program verification, which supports the ARMv6, ARMv8 and RISC-V instruction sets.
Mohamed Gamal Talaat, Mostafa M. Hassan, Mohamed Sayed, Ishrat Ahmed
14 Dec 2025
TL;DR: GrammLLM generates comprehensive test cases for compiler validation by analyzing BNF grammar, constructing a graph-based model, and leveraging Large Language Models to propose and implement exhaustive scenarios, ensuring thorough coverage and highlighting corner cases.
Abstract: We present a method for generating comprehensive test cases of a specific language-construct by leveraging grammar rules defined in a Language Reference Manual (LRM). The approach begins by analyzing Backus-Naur Form (BNF) grammar representation, then constructing a graphical traversal that captures all relevant paths for the targeted feature. This graph-based model illustrates every node and transitional path governed by the feature's grammar rules, enabling a clear visualization of feature behaviour and potential edge cases. We then employ a Large Language Model (LLM) to automatically propose scenarios that exhaustively cover these paths. Finally, these scenarios are passed through another Large Language Model (LLM) to implement these scenarios. Our method ensures thorough coverage and highlights corner cases that can be overlooked by traditional testing approaches. The resulting flow not only streamlines the testing and validation process for programming language features but also provides a flexible framework for verifying new or evolving functionalities.
Xin Sun, Ming Zhong, Lulin Wang, Fang Lv, Xianbo He
24 Oct 2025
TL;DR: Researchers propose AutoTD, a framework that automates target description file code completion dataset generation for fine-tuning large language models, achieving substantial accuracy improvements in compiler backend development tasks with reduced manual effort.
Abstract: Mainstream compilers like LLVM and GCC rely on target description files (TDFs) written in domain-specific languages (DSLs) such as TableGen and Machine Description to encode hardware-specific information. However, developing and maintaining TDFs remains a time-consuming and error-prone process due to their complex syntax and lack of automation tools. While large language models (LLMs) have shown strong performance in code completion tasks, they struggle with TDFs due to limited DSLs exposure in pretraining code corpora.To efficiently leverage LLMs for TDF development, we propose AutoTD, a DSL-agnostic framework that automates the construction of TDF code completion datasets for fine-tuning LLMs. Given DSL syntax rules and TDF code as input, AutoTD performs syntax rules parsing, tokenization, and target-specific value extraction to generate high-quality training dataset tailored for TDF code completion. Experiments show that four LLMs fine-tuned on AutoTD-generated datasets achieve substantial accuracy improvements in TDF code completion tasks. Notably, the fine-tuned QWen2.5-Coder-1.5B outperforms six larger-scale LLMs (6.7B – 14B parameter sizes), highlighting AutoTD’s effectiveness in enhancing compiler backend development efficiency and enabling LLMs to handle TDF code completion tasks with less manual effort.
Abstract: Compiler pass auto-tuning is critical for enhancing software performance, yet finding the optimal pass sequence for a specific program is an NP-hard problem. Traditional, general-purpose optimization flags like -O3 and -Oz adopt a one-size-fits-all approach, often failing to unlock a program's full performance potential. To address this challenge, we propose a novel Hybrid, Knowledge-Guided Evolutionary Framework. This framework intelligently guides online, personalized optimization using knowledge extracted from a large-scale offline analysis phase. During the offline stage, we construct a comprehensive compilation knowledge base composed of four key components: (1) Pass Behavioral Vectors to quantitatively capture the effectiveness of each optimization; (2) Pass Groups derived from clustering these vectors based on behavior similarity; (3) a Synergy Pass Graph to model beneficial sequential interactions; and (4) a library of Prototype Pass Sequences evolved for distinct program types. In the online stage, a bespoke genetic algorithm leverages this rich knowledge base through specially designed, knowledge-infused genetic operators. These operators transform the search by performing semantically-aware recombination and targeted, restorative mutations. On a suite of seven public datasets, our framework achieves an average of 11.0% additional LLVM IR instruction reduction over the highly-optimized opt -Oz baseline, demonstrating its state-of-the-art capability in discovering personalized, high-performance optimization sequences.
TL;DR: A fluctuation-guided adaptive random compiler for Hamiltonian simulation is proposed, adapting sampling probabilities to system dynamics, achieving higher fidelity, and prioritizing sensitive terms, with overload reduced by classical shadows in discrete-variable, continuous-variable, and hybrid-variable systems.
Abstract: Stochastic methods offer an effective way to suppress coherent errors in quantum simulation. In particular, the randomized compilation protocol may reduce circuit depth by randomly sampling Hamiltonian terms rather than following the deterministic Trotter-Suzuki sequence. However, its fixed sampling distribution does not adapt to the dynamics of the system, limiting its accuracy. In this work, we propose a fluctuation-guided adaptive algorithm that adaptively updates sampling probabilities based on fluctuations of Hamiltonian terms to achieve higher simulation fidelity. Remarkably, the protocol renders an intuitive physical understanding: Hamiltonian terms with greater sensitivity to the state evolution should be prioritized during sampling. The overload of measuring fluctuations necessary for updating the sampling probability is affordable, and can be further largely reduced by classical shadows. We demonstrate the effectiveness of the method with numeral simulations across discrete-variable, continuous-variable and hybrid-variable systems.
Florian Rupprecht, Jason Kai, B. Shrestha, Steven Giavasis, Ting Xu, Tristan Glatard, Michael P. Milham, Gregory Kiar
30 Jul 2025
TL;DR: Styx, a compiler, generates language-native wrapper functions from tool metadata, enabling seamless integration of command-line tools in data science ecosystems, with NiWrap providing a proof-of-concept implementation for neuroimaging tools in Python, R, and TypeScript.
Abstract: In numerous scientific domains, established tools have often been developed with complex command-line interfaces. Such is the case for brain imaging and bioinformatics, making the use of powerful legacy tools in modern workflow paradigms challenging. We present (i) Styx, a compiler for generating language-native wrapper functions from static tool metadata, leading to seamless integration of command-line tools within the data science ecosystem. Alongside Styx, we have created (ii) NiWrap, a collection of more than 1900 neuroimaging command-line function descriptions as a proof-of-concept implementation. These interfaces, available in Python, R, and TypeScript (available at https://github.com/styx-api ), significantly reduce the complexity of writing and interpreting software pipelines, particularly when composing workflows across packages with distinct API standards. The compiler architecture of Styx facilitates maintainability and portability across computing environments. As with all metadata-dependent infrastructure, creating sufficient metadata annotations remains a barrier to adoption. Accordingly, NiWrap demonstrates approaches that lower this barrier through direct source code extraction and LLM-assisted documentation parsing. Together, Styx and NiWrap offer a sustainable solution for interfacing diverse command-line tools with modern data science ecosystems. This modular approach enhances reproducibility and efficiency in pipeline development while ensuring portability across computing environments and programming languages.
TL;DR: IntraJ, a Java code analysis framework, provides interactive analysis results directly in the editor, leveraging Reference Attribute Grammars for on-demand evaluation, and achieves a response time of under 0.1 seconds for most compilation units.
Abstract: Abstract Static analysis tools play a crucial role in software development by detecting bugs and vulnerabilities. However, running these tools separately from the code editing process often causes developers to switch contexts, which can reduce productivity. Previous work has shown how Reference Attribute Grammars (RAGs) can be used for declarative implementation of competitive tooling for intraprocedural control-flow and dataflow analysis of Java source code, embodied in the tool IntraJ . In this paper, we demonstrate how IntraJ can be leveraged to provide interactive analysis results directly in the editor, similar to compile-time error detection, relying on automatic on-demand evaluation of RAGs. We discuss the architecture of IntraJ , and demonstrate how it can be integrated into the development process in three different ways: in the command line, in an editor integration based on the Language Server Protocol, and in an integration with the debugging tool CodeProber . We showcase the extensibility of IntraJ by illustrating how new client analyzes and language constructs can be added to the framework through RAG specifications. Finally, we evaluate the interactive performance of IntraJ on a set of real-world Java benchmarks, demonstrating that IntraJ can provide interactive feedback to developers, achieving a response time of under 0.1 seconds for most compilation units.
Mengyao Chen, Pengyan Yan, Lin Han, Haoran Li, Cuixia Wang
3 Jan 2025
TL;DR: This study optimizes thread-level parallelism in GCC compiler's OpenMP implementation by merging parallel regions, reducing overhead, and improving performance by 20% through experimental validation using NPB 3.4-OMP test suite.
Abstract: The automatic OpenMP implementation in the current GCC compiler adopts the fork-join model, where frequent creation and convergence of thread groups result in significant management control overhead. This paper studies methods to reduce thread group creation and convergence to improve the efficiency of automatic OpenMP programs. A universal optimization method for merging parallel regions is proposed in this paper to address the fork-join model. Through variable attribute modifications, handling of serial statements, and synchronization operation optimization, adjacent continuous parallel regions are merged into a larger parallel region to reduce parameter passing within the parallel region and lower the overhead caused by the creation and destruction of parallel regions. This work is implemented based on the GCC 10.3.0 compiler and experimentally validated using the NPB 3.4-OMP test suite, achieving an average overall performance improvement of 20%. The experimental results demonstrate the effectiveness and generality of the method proposed in this paper. This method can effectively enhance the runtime efficiency of OpenMP programs, serve as a reference for optimizing the implementation of OpenMP programs, and provide support for thread dynamic merging techniques in AI compilation.
Abstract: <p>This report summarises our findings on the maturity of Flang, the Fortran compiler of the LLVM project.</p> <p>We have analysed two important aspects for the DARE project: the current OpenMP support, and then the general compliance and correctness of the implementation based on existing Fortran testsuites.</p> <p>We can conclude that Flang delivers a reasonable level of compliance in the language but the OpenMP support is still a bit lacking, specially for features that go beyond OpenMP 2.5. However the support is slightly scattered and is influenced by how difficult it is to implement OpenMP functionalities in the existing LLVM infrastructure.</p> <p>This work has received funding from the DARE SGA1 Project, from the European High-Performance Computing Joint Undertaking (JU) under Grant Agreement No 101202459 and from PCI2024-161687-3 Project funded by MICIU/AEI/10.13039/501100011033 and European´s Unión “NextGenerationEU”/PRTR”.</p>
Abstract: This thesis covers the extension of the existing Rapid codebase to support native compilation of Idris programs which can make full use of multiple processor cores. After giving an outline of the current architecture I describe the required changes to the compiler and runtime system. Idris is a high-level, general purpose, functional programming language supporting dependent and linear types, focusing on fast compilation and execution. Dependent types are a promising approach for several use cases, from increasing confidence in the correctness of selected code sections to formally verified proofs of complete programs. Rapid is a compiler backend plugging into the main Idris compiler, capable of generating native machine code and aiming for full compatibility with existing programs written in Idris. Rapid’s runtime system provides the generated code with interfaces to the OS and includes a generational tracing garbage collector for automatic memory management. Performance measurements of CPU-bound workloads indicate good efficiency at thread counts typical for personal computing devices. While the marginal performance gains decrease with higher core counts, opportunities for further improvements are presented.
Abstract: This deposit contains the supplementary Alloy model (toy_compilomorphism.als) for the paper "Compilomorphism: A Principled Approach to Validating Compilers for Structured Data Pipelines." The model provides a simplified, illustrative encoding of the core compilomorphism concepts. To run the model: 1. Download and install the Alloy Analyzer (alloytools.org) and Java.2. Open `toy_compilomorphism.als` in the Alloy Analyzer.3. Execute the `check` and `run` commands found at the end of the file. Detailed instructions, including expected outcomes for each command, are provided in the README.md file included in this deposit.
TL;DR: This study introduces MLIRTracer, a top-down fuzzing approach for MLIR, addressing limitations of random fuzzing by systematically traversing the hierarchical code space and prioritizing bug-prone areas, detecting 73 bugs with 61 already resolved.
Abstract: MLIR is a new way of creating compiler infrastructures that can be easily reused and extended. Current MLIR fuzzing methods focus primarily on test case generation or mutation using randomly selected passes. However, they often overlook the hierarchical structure of MLIR, resulting in inefficiencies in bug detection, especially for issues triggered by downstream dialects. Random testing lacks a focused approach to exploring the code space, resulting in wasted resources on normal components and overlooking bug-prone areas. To address these limitations, we introduce MLIRTracer, a top-down fuzzing approach that targets the highest level of MLIR programs (tosa IR) with a directed testing strategy. Our method systematically traverses the hierarchical code space of MLIR, from tosa IR to the lower levels, while prioritizing tests of bug-prone areas through directed exploration. MLIRTracer has successfully detected 73 bugs, with 61 already resolved by the MLIR developers.
TL;DR: Researchers propose a unified approach to modeling compilation as a reduction system on open terms in multi-language semantics, eliminating duplication and providing insights into compiler correctness, secure compilation, and type preservation.
Abstract: Modeling interoperability between programs in different languages is a key problem when modeling verified and secure compilation, which has been successfully addressed using multi-language semantics. Unfortunately, existing models of compilation using multi-language semantics define two variants of each compiler pass: a syntactic translation on open terms to model compilation, and a run-time translation of closed terms at multi-language boundaries to model interoperability. In this talk, I discuss work-in-progress approach to uniformly model a compiler entirely as a reduction system on open term in a multi-language semantics, rather than as a syntactic translation. This simultaneously defines the compiler and the interoperability semantics, reducing duplication. It also provides interesting semantic insights. Normalization of the cross-language redexes performs ahead-of-time (AOT) compilation. Evaluation in the multi-language models just-in-time (JIT) compilation. Confluence of multi-language reduction implies compiler correctness, and part of the secure compilation proof (full abstraction), enabling focus on the difficult part of the proof. Subject reduction of the multi-language reduction implies type-preservation of the compiler.
Abstract: Context: Just-in-Time (JIT) compilers are able to specialize the code they generate according to a continuous profiling of the running programs. This gives them an advantage when compared to Ahead-of-Time (AoT) compilers that must choose the code to generate once for all. Inquiry: Is it possible to improve the performance of AoT compilers by adding Dynamic Binary Modification (DBM) to the executions? Approach: We added to the Hopc AoT JavaScript compiler a new optimization based on DBM to the inline cache (IC), a classical optimization dynamic languages use to implement object property accesses efficiently. Knowledge: Reducing the number of memory accesses as the new optimization does, does not shorten execution times on contemporary architectures. Grounding: The DBM optimization we have implemented is fully operational on x86_64 architectures. We have conducted several experiments to evaluate its impact on performance and to study the reasons of the lack of acceleration. Importance: The (negative) result we present in this paper sheds new light on the best strategy to be used to implement dynamic languages. It tells that the old days were removing instructions or removing memory reads always yielded to speed up is over. Nowadays, implementing sophisticated compiler optimizations is only worth the effort if the processor is not able by itself to accelerate the code. This result applies to AoT compilers as well as JIT compilers.
Abstract: During the 2024 Pwn2Own competition, security researcher Manfred Paul demonstrated two "critical severity" JavaScript vulnerabilities that could allow an attacker to execute arbitrary code remotely on Mozilla Firefox browsers [4, 5]. Although this was a competition, similar bugs are regularly exploited in the wild [21]. The fact that these vulnerabilities are regularly exploited by attackers showcases weaknesses in Firefox’s security model. We hope to tighten this security model by sandboxing the Firefox JIT compiler using software fault isolation (SFI) techniques. Our research hopes to make it impossible for an attacker to utilize arbitrary read/write JIT bugs to achieve remote code execution. We were able to accomplish this with a performance degradation of ∼13.7%. Implementing the heap and stack masking portions of the sandbox only required modifying 4000 lines of code across 72 files.
TL;DR: This thesis introduces LibraryX, a framework for cross-library-call optimization, enabling scientific applications to use standard domain-specific libraries while replacing them with optimized implementations during execution without source code modification through semantic capture and code generation.
Abstract: Developing scientific computing applications that are both maintainable and achieve good performance is a challenging task. At the software level, software design principles in crease productivity but obfuscate the ability to easily discover performance opportunities. This is exacerbated by the complexity of modern hardware systems, which require deep hardware knowledge to achieve good performance. This leaves application developers with two main options for performance critical operations, use domain specific software libraries to write their applications, or ask a performance expert to optimize their application. The library approach has the benefit of providing usability with good performance, but leaves performance on the table, as there are opportunities to optimize across the library call boundary. Unfortunately, a compiler cannot easily find these because library calls are treated as black boxes. A performance expert can provide the best performance, but has to write specialized code removing the library calls and any usability. To address the gap between writing productive software and achieving optimized performance, this thesis introduces LibraryX, a framework for cross-library-call optimization. Using LibraryX scientific applications can be written using standard domain specific li braries which will be replaced with an optimized implementation during execution with out source code modification. This is done through a combination of library call semantic capture, optimized code generation, and runtime compilation. LibraryX is able to recognize the semantics of library calls or what operation the library call is performing. The computation semantics is then sent to the SPIRAL code generation system for analysis and optimization, producing an optimized implementation. This implementation is then executed in place of the original library call sequence.We showcase the high level design of the LibraryX framework, specifically showing how it can be used for a few key domains within scientific computing. These domains include spectral methods, graph analytics/sparse linear algebra, and structured grids. We demonstrate how LibraryX uses various library capture mechanisms for each domain and how the SPIRAL code generation system can optimize specific library call sequences for each domain. This enables LibraryX to cross not only the library call boundary, but also the library domain boundary, allowing developers to use different domain libraries simultaneously. We then showcase how LibraryX can be extended to support multi-accelerator systems by plugging into a runtime system called IRIS. We finally show how LibraryX can be extended to support legacy applications written in Fortran and act as a hardware accelerator offloading system.
TL;DR: This paper presents a type-directed approach to calculating abstract machines and compilers, starting from an intrinsically typed interpreter, and applies it to derive a compiler for a simple expression language and an optimizing evaluator for the simply typed lambda calculus.
Abstract: Abstract This paper explores a principled approach to calculating abstract machines and associated compilers, starting from an intrinsically typed interpreter. After deriving a compiler for a simple expression language in some detail, the first steps of this calculation are repeated to derive an optimizing evaluator for the simply typed lambda calculus.
TL;DR: This paper proposes a novel secure compilation framework that lifts security guarantees of Spectre countermeasures from weaker to stronger speculative semantics, enabling comprehensive security analysis of 9 countermeasures against 5 Spectre attack classes.
Abstract: Mainstream compilers implement different countermeasures to prevent specific classes of speculative execution attacks. Unfortunately, these countermeasures either lack formal guarantees or come with proofs restricted to speculative semantics capturing only a subset of the speculation mechanisms supported by modern CPUs, thereby limiting their practical applicability. Ideally, these security proofs should target a speculative semantics capturing the effects of all speculation mechanisms implemented in modern CPUs. However, this is impractical and requires new secure compilation proofs to support additional speculation mechanisms. In this paper, we address this problem by proposing a novel secure compilation framework that allows lifting the security guarantees provided by Spectre countermeasures from weaker speculative semantics (ignoring some speculation mechanisms) to stronger ones (accounting for the omitted mechanisms) without requiring new secure compilation proofs. Using our lifting framework, we performed the most comprehensive security analysis of Spectre countermeasures implemented in mainstream compilers to date. Our analysis spans 9 different countermeasures against 5 classes of Spectre attacks, which we proved secure against a speculative semantics accounting for 5 different speculation mechanisms. Our analysis highlights that fence-based and retpoline-based countermeasures can be securely lifted to the strongest speculative semantics under study. In contrast, countermeasures based on speculative load hardening cannot be securely lifted to semantics supporting indirect jump speculation.