Practical automatic loop specialization
Taewook Oh,Hanjun Kim,Nick P. Johnson,Jae W. Lee,David I. August +4 more
- 16 Mar 2013
- Vol. 41, Iss: 1, pp 419-430
TL;DR: Invariant-induced Pattern based Loop Specialization (IPLS), the first fully-automatic specialization technique designed for everyday use on real applications, profiles the values of instructions that depend solely on invariants and recognizes repeating patterns across multiple iterations of hot loops.
read more
Abstract: Program specialization optimizes a program with respect to program invariants, including known, fixed inputs. These invariants can be used to enable optimizations that are otherwise unsound. In many applications, a program input induces predictable patterns of values across loop iterations, yet existing specializers cannot fully capitalize on this opportunity. To address this limitation, we present Invariant-induced Pattern based Loop Specialization (IPLS), the first fully-automatic specialization technique designed for everyday use on real applications. Using dynamic information-flow tracking, IPLS profiles the values of instructions that depend solely on invariants and recognizes repeating patterns across multiple iterations of hot loops. IPLS then specializes these loops, using those patterns to predict values across a large window of loop iterations. This enables aggressive optimization of the loop; conceptually, this optimization reconstructs recurring patterns induced by the input as concrete loops in the specialized binary. IPLS specializes real-world programs that prior techniques fail to specialize without requiring hints from the user. Experiments demonstrate a geomean speedup of 14.1% with a maximum speedup of 138% over the original codes when evaluated on three script interpreters and eleven scripts each.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 5. IPLS uses object-relative memory profiling to generate repeatable, symbolic names for relocatable addresses. Variable INn cnt maintains the invocation count of the instruction INn. 
Figure 1. A static input script induces a repeating pattern in variable OPC. The interpreter can be specialized with respect to the input script by exploiting this repetition: (a) program, static and dynamic inputs, (b) trace of recurring values across loop iterations, (c) loop iterations stitched into a specialized loop, (d) final specialized code. 
Figure 6. Meta-level loops/traces detection extracts a graph which resembles a control-flow graph in which loops are identified. 
Table 2. Ratio of dynamic instruction count of the original program to that of the specialized program for Lua-5.2.0. Larger numbers indicate a greater reduction in dynamic instructions. 
Figure 8. Whole-program speedup with three interpreters: Lua, Perl, and Python, and 11 input scripts for each. 
Table 3. Unexpected exits from the specialized loop as a fraction of the number of iterations running in a specialized loop.
Citations
REDSPY: Exploring Value Locality in Software
Shasha Wen,Milind Chabbi,Xu Liu +2 more
- 04 Apr 2017
TL;DR: REDSPY pinpointed dramatically high volume of redundancies in programs that were optimization targets for decades, such as SPEC CPU2006 suite, Rodinia benchmark, and NWChem---a production computational chemistry code, and was able to eliminate redundancies that resulted in significant speedups.
43
Redundant loads: a software inefficiency indicator
Pengfei Su,Shasha Wen,Hailong Yang,Milind Chabbi,Xu Liu +4 more
- 25 May 2019
TL;DR: LoadSpy as mentioned in this paper identifies and quantifies redundant load operations in programs and associates the redundancies with program execution contexts and scopes to focus developers' attention on problematic code, which is often a symptom of many redundant operations.
28
•Posted Content
Redundant Loads: A Software Inefficiency Indicator
TL;DR: LoadSpy is developed, a whole-program profiler to pinpoint redundant memory load operations, which are often a symptom of many redundant operations in programs, and optimize several well-known benchmarks and real-world applications, yielding significant speedups.
What every scientific programmer should know about compiler optimizations
Jialiang Tan,Shuyin Jiao,Milind Chabbi,Xu Liu +3 more
- 29 Jun 2020
TL;DR: This paper investigates an important compiler optimization---dead and redundant operation elimination and shows that modern compilers miss several optimization opportunities, in fact they even introduce some inefficiencies, which require programmers to refactor the source code.
21
I/O Optimisation and elimination via partial evaluation
Christopher S.F. Smowton
- 01 Jan 2014
TL;DR: A new, more accurate partial evaluation system that can specialise programs, written in low-level languages including C and C++, that interact with the operating system to read external data and can achieve significant runtime improvements with little manual assistance is presented.
15
References
LLVM: a compilation framework for lifelong program analysis & transformation
Chris Lattner,Vikram Adve +1 more
- 20 Mar 2004
TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.
Program Analysis and Specialization for the C Programming Language
Lars Ole Andersen,Peter Lee +1 more
- 01 Jan 2005
TL;DR: This thesis presents an automatic partial evaluator for the Ansi C programming language, and proves that partial evaluation at most can accomplish linear speedup, and develops an automatic speedup analysis.
1.1K
Secure program execution via dynamic information flow tracking
G. Edward Suh,Jae W. Lee,David Zhang,Srinivas Devadas +3 more
- 07 Oct 2004
TL;DR: This work presents a simple architectural mechanism called dynamic information flow tracking that can significantly improve the security of computing systems with negligible performance overhead and is transparent to users or application programmers.
Trace-based just-in-time type specialization for dynamic languages
Andreas Gal,Brendan Eich,Mike Shaver,Dustin Anderson,David Mandelin,Mohammad R. Haghighat,Blake Kaplan,Graydon Hoare,Boris Zbarsky,Jason Orendorff,Jesse Ruderman,Edwin Smith,Rick Reitmaier,Michael Bebenita,Mason Chang,Michael Franz +15 more
- 15 Jun 2009
TL;DR: This work presents an alternative compilation technique for dynamically-typed languages that identifies frequently executed loop traces at run-time and then generates machine code on the fly that is specialized for the actual dynamic types occurring on each path through the loop.
Tracing the meta-level: PyPy's tracing JIT compiler
Carl Friedrich Bolz,Antonio Cuni,Maciej Fijalkowski,Armin Rigo +3 more
- 06 Jul 2009
TL;DR: This paper shows how to guide tracing JIT compilers to greatly improve the speed of bytecode interpreters, and how to unroll the bytecode dispatch loop, based on two kinds of hints provided by the implementer of thebytecode interpreter.