TL;DR: Experimental results over a range of benchmarks including large real-world programs such as the compiler itself and the ML-Kit indicate that the compile-time cost of flow analysis and closure conversion is extremely small, and that the dispatches and coercions inserted by the algorithm are dynamically infrequent.
Abstract: This paper presents a new closure conversion algorithm for simply-typed languages. We have have implemented the algorithm as part of MLton, a whole-program compiler for Standard ML (SML). MLton first applies all functors and eliminates polymorphism by code duplication to produce a simply-typed program. MLton then performs closure conversion to produce a first-order, simply-typed program. In contrast to typical functional language implementations, MLton performs most optimizations on the first-order language, after closure conversion. There are two notable contributions of our work:
1. The translation uses a general flow-analysis framework which includes OCFA. The types in the target language fully capture the results of the analysis. MLton uses the analysis to insert coercions to translate between different representations of a closure to preserve type correctness of the target language program.
2. The translation is practical. Experimental results over a range of benchmarks including large real-world programs such as the compiler itself and the ML-Kit [25] indicate that the compile-time cost of flow analysis and closure conversion is extremely small, and that the dispatches and coercions inserted by the algorithm are dynamically infrequent.
TL;DR: This paper gives a formal presentation of contification in MLton, a whole-program optimizing Standard ML compiler, as well as a new algorithm based on the dominator tree of a program's call graph that is optimal.
Abstract: Contification is a compiler optimization that turns a function that always returns to the same place into a continuation. Compilers for functional languages use contification to expose the control-flow information that is required by many optimizations, including traditional loop optimizations. This paper gives a formal presentation of contification in MLton, a whole-program optimizing Standard ML compiler. We present two existing algorithms for contification in our framework, as well as a new algorithm based on the dominator tree of a program's call graph. We prove that the dominator algorithm is optimal. We present benchmark results on realistic SML programs demonstrating that contification has minimal overhead on compile time and significantly improves run time.
TL;DR: The proposed compilation technique can produce self-adjusting programs whose performance is consistent with the asymptotic bounds and experimental results obtained via manual rewriting (up to a constant factor) and is validated by extending Standard ML and by integrating the transformation into MLton, a whole-program optimizing compiler for Standard ML.
Abstract: Self-adjusting programs respond automatically and efficiently to input changes by tracking the dynamic data dependences of the computation and incrementally updating the output as needed. In order to identify data dependences, previously proposed approaches require the user to make use of a set of monadic primitives. Rewriting an ordinary program into a self-adjusting program with these primitives, however, can be difficult and error-prone due to various monadic and proper-usage restrictions, some of which cannot be enforced statically. Previous work therefore suggests that self-adjusting computation would benefit from direct language and compiler support.In this paper, we propose a language-based technique for writing and compiling self-adjusting programs from ordinary programs. To compile self-adjusting programs, we use a continuation-passing style (cps) transformation to automatically infer a conservative approximation of the dynamic data dependences. To prevent the inferred, approximate dependences from degrading the performance of change propagation, we generate memoized versions of cps functions that can reuse previous work even when they are invoked with different continuations. The approach offers a natural programming style that requires minimal changes to existing code, while statically enforcing the invariants required by self-adjusting computation.We validate the feasibility of our proposal by extending Standard ML and by integrating the transformation into MLton, a whole-program optimizing compiler for Standard ML. Our experiments indicate that the proposed compilation technique can produce self-adjusting programs whose performance is consistent with the asymptotic bounds and experimental results obtained via manual rewriting (up to a constant factor).
TL;DR: Benchmark results indicate parasitic threads enable construction of scalable and efficient message-passing parallel programs in MLton, a whole-program optimizing compiler and runtime for Standard ML.
Abstract: Message-passing is an attractive thread coordination mechanism because it cleanly delineates points in an execution when threads communicate, and unifies synchronization and communication: a sender is allowed to proceed only when a receiver willing to accept the data being sent is available and vice versa. To enable greater performance, however, asynchronous or non-blocking extensions are usually provided that allow senders and receivers to proceed even if a matching partner is unavailable. Lightweight threads with synchronous message-passing can be used to encapsulate asynchronous message-passing operations, although such implementations have greater thread management costs that can negatively impact scalability and performance.This paper introduces parasitic threads, a novel mechanism for expressing asynchronous computation, that combines the efficiency of a non-declarative solution with the ease of use provided by languages with first-class channels and lightweight threads. A parasitic thread is a lightweight data structure that encapsulates an asynchronous computation using the resources provided by a host thread. Parasitic threads need not execute cooperatively, impose no restrictions on the computations they encapsulate, or the communication actions they perform, and impose no additional burden on thread scheduling mechanisms.We describe an implementation of parasitic threads in MLton, a whole-program optimizing compiler and runtime for Standard ML. Benchmark results indicate parasitic threads enable construction of scalable and efficient message-passing parallel programs.
TL;DR: This paper presents a safe and efficient checkpointing mechanism for Concurrent ML that can be used to recover from transient faults, and introduces a new linguistic abstraction, called stabilizers, that permits the specification of per-thread monitors and the restoration of globally consistent checkpoints.
Abstract: Transient faults that arise in large-scale software systems can often be repaired by reexecuting the code in which they occur. Ascribing a meaningful semantics for safe reexecution in multithreaded code is not obvious, however. For a thread to reexecute correctly a region of code, it must ensure that all other threads that have witnessed its unwanted effects within that region are also reverted to a meaningful earlier state. If not done properly, data inconsistencies and other undesirable behavior might result. However, automatically determining what constitutes a consistent global checkpoint is not straightforward because thread interactions are a dynamic property of the program. In this paper, we present a safe and efficient checkpointing mechanism for Concurrent ML (CML) that can be used to recover from transient faults. We introduce a new linguistic abstraction, called stabilizers, that permits the specification of per-thread monitors and the restoration of globally consistent checkpoints. Safe global states are computed through lightweight monitoring of communication events among threads (e.g., message-passing operations or updates to shared variables). We present a formal characterization of its design, and provide a detailed description of its implementation within MLton, a whole-program optimizing compiler for Standard ML. Our experimental results on microbenchmarks as well as several realistic, multithreaded, server-style CML applications, including a web server and a windowing toolkit, show that the overheads to use stabilizers are small, and lead us to conclude that they are a viable mechanism for defining safe checkpoints in concurrent functional programs.