Practical automatic loop specialization

doi:10.1145/2451116.2451161

Open AccessProceedings Article10.1145/2451116.2451161

Practical automatic loop specialization

Taewook Oh, +4 more

- 16 Mar 2013

- Vol. 41, Iss: 1, pp 419-430

23

TL;DR: Invariant-induced Pattern based Loop Specialization (IPLS), the first fully-automatic specialization technique designed for everyday use on real applications, profiles the values of instructions that depend solely on invariants and recognizes repeating patterns across multiple iterations of hot loops.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 5. IPLS uses object-relative memory profiling to generate repeatable, symbolic names for relocatable addresses. Variable INn cnt maintains the invocation count of the instruction INn.

Figure 1. A static input script induces a repeating pattern in variable OPC. The interpreter can be specialized with respect to the input script by exploiting this repetition: (a) program, static and dynamic inputs, (b) trace of recurring values across loop iterations, (c) loop iterations stitched into a specialized loop, (d) final specialized code.

Figure 6. Meta-level loops/traces detection extracts a graph which resembles a control-flow graph in which loops are identified.

Table 2. Ratio of dynamic instruction count of the original program to that of the specialized program for Lua-5.2.0. Larger numbers indicate a greater reduction in dynamic instructions.

Figure 8. Whole-program speedup with three interpreters: Lua, Perl, and Python, and 11 input scripts for each.

Table 3. Unexpected exits from the specialized loop as a fraction of the number of iterations running in a specialized loop.

Citations

Proceedings Article•10.1145/3037697.3037729

REDSPY: Exploring Value Locality in Software

Shasha Wen, +2 more

- 04 Apr 2017

TL;DR: REDSPY pinpointed dramatically high volume of redundancies in programs that were optimization targets for decades, such as SPEC CPU2006 suite, Rodinia benchmark, and NWChem---a production computational chemistry code, and was able to eliminate redundancies that resulted in significant speedups.

...read moreread less

43

•Proceedings Article•10.1109/ICSE.2019.00103

Redundant loads: a software inefficiency indicator

Pengfei Su, +4 more

- 25 May 2019

TL;DR: LoadSpy as mentioned in this paper identifies and quantifies redundant load operations in programs and associates the redundancies with program execution contexts and scopes to focus developers' attention on problematic code, which is often a symptom of many redundant operations.

...read moreread less

28

•Posted Content

Redundant Loads: A Software Inefficiency Indicator

Pengfei Su, +4 more

- 14 Feb 2019

- arXiv: Performance

TL;DR: LoadSpy is developed, a whole-program profiler to pinpoint redundant memory load operations, which are often a symptom of many redundant operations in programs, and optimize several well-known benchmarks and real-world applications, yielding significant speedups.

...read moreread less

23

Proceedings Article•10.1145/3392717.3392754

What every scientific programmer should know about compiler optimizations

Jialiang Tan, +3 more

- 29 Jun 2020

TL;DR: This paper investigates an important compiler optimization---dead and redundant operation elimination and shows that modern compilers miss several optimization opportunities, in fact they even introduce some inefficiencies, which require programmers to refactor the source code.

...read moreread less

21

I/O Optimisation and elimination via partial evaluation

Christopher S.F. Smowton

- 01 Jan 2014

TL;DR: A new, more accurate partial evaluation system that can specialise programs, written in low-level languages including C and C++, that interact with the operating system to read external data and can achieve significant runtime improvements with little manual assistance is presented.

...read moreread less

15

...

Expand

References

•Proceedings Article•10.5555/977395.977673

LLVM: a compilation framework for lifelong program analysis & transformation

Chris Lattner, +1 more

- 20 Mar 2004

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

5.4K

Program Analysis and Specialization for the C Programming Language

Lars Ole Andersen, +1 more

- 01 Jan 2005

TL;DR: This thesis presents an automatic partial evaluator for the Ansi C programming language, and proves that partial evaluation at most can accomplish linear speedup, and develops an automatic speedup analysis.

...read moreread less

1.1K

•Proceedings Article•10.1145/1024393.1024404

Secure program execution via dynamic information flow tracking

G. Edward Suh, +3 more

- 07 Oct 2004

TL;DR: This work presents a simple architectural mechanism called dynamic information flow tracking that can significantly improve the security of computing systems with negligible performance overhead and is transparent to users or application programmers.

...read moreread less

842

Proceedings Article•10.1145/1542476.1542528

Trace-based just-in-time type specialization for dynamic languages

Andreas Gal, +15 more

- 15 Jun 2009

TL;DR: This work presents an alternative compilation technique for dynamically-typed languages that identifies frequently executed loop traces at run-time and then generates machine code on the fly that is specialized for the actual dynamic types occurring on each path through the loop.

...read moreread less

414

Proceedings Article•10.1145/1565824.1565827

Tracing the meta-level: PyPy's tracing JIT compiler

Carl Friedrich Bolz, +3 more

- 06 Jul 2009

TL;DR: This paper shows how to guide tracing JIT compilers to greatly improve the speed of bytecode interpreters, and how to unroll the bytecode dispatch loop, based on two kinds of hints provided by the implementer of thebytecode interpreter.

...read moreread less

310