Journal Article10.1109/12.931894
An architectural framework for runtime optimization
M.C. Merten,A.R. Trick,Ronald D. Barnes,Erik M. Nystrom,Christopher Neith George,John Gyllenhaal,Wen-mei W. Hwu,Wen-mei W. Hwu +7 more
67
TL;DR: A new hardware mechanism for generating and deploying runtime optimized code that resides in the retirement stage of the processor pipeline, accepts an instruction execution stream as input, and produces instruction profiles and sets of linked, optimized traces as output.
read more
Abstract: Wide-issue processors continue to achieve higher performance by exploiting greater instruction-level parallelism. Dynamic techniques such as out-of-order execution and hardware speculation have proven effective at increasing instruction throughput. Runtime optimization promises to provide an even higher level of performance by adaptively applying aggressive code transformations on a larger scope. This paper presents a new hardware mechanism for generating and deploying runtime optimized code. The mechanism can be viewed as a filtering system that resides in the retirement stage of the processor pipeline, accepts an instruction execution stream as input, and produces instruction profiles and sets of linked, optimized traces as output. The code deployment mechanism uses an extension to the branch prediction mechanism to migrate execution into the new code without modifying the original code. These new components do not add delay to the execution of the program except during short bursts of reoptimization. This technique provides a strong platform for runtime optimization because the hot execution regions are extracted, optimized, and written to main memory for execution and because these regions persist across context switches. The current design of the framework supports a suite of optimizations, including partial function inlining (even into shared libraries), code straightening optimizations, loop unrolling, and peephole optimizations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Dissertation
Efficient, transparent, and comprehensive runtime code manipulation
Derek L. Bruening,Saman Amarasinghe +1 more
- 01 Jan 2004
TL;DR: D DynamoRIO is presented, a fully-implemented runtime code manipulation system that supports code transformations on any part of a program, while it executes, with zero to thirty percent time and memory overhead on both Windows and Linux.
Managing multi-configuration hardware via dynamic working set analysis
Ashutosh S. Dhodapkar,James E. Smith +1 more
- 01 May 2002
TL;DR: When applied to reconfigurable instruction caches, an algorithm that identifies recurring phases achieves power savings and performance similar to the best algorithm reported to date, but with orders-of-magnitude savings in the number of re-tunings.
Characterizing and predicting program behavior and its variability
Evelyn Duesterwald,Calin Cascaval,Sandhya Dwarkadas +2 more
- 27 Sep 2003
TL;DR: This work argues that program behavior variability requires adaptive systems to be predictive rather than reactive, and introduces a new class of predictors that use one metric to predict another, thus making possible an efficient coupling of multiple predictors.
240
Managing multi-configuration hardware via dynamic working set analysis
Ashutosh S. Dhodapkar,James E. Smith +1 more
- 25 Jun 2003
TL;DR: Managing multi-configuration hardware via dynamic working set analysis achieves significant power savings and performance improvements by dynamically adjusting configurations based on working set analysis.
Transition phase classification and prediction
Jeremy Lau,S. Schoenmackers,Brad Calder +2 more
- 12 Feb 2005
TL;DR: This paper describes an adaptive system that dynamically adjusts classification thresholds and splits phases with poor homogeneity, and improves phase prediction accuracy by applying confidence to phase prediction, and develops architectures that can accurately predict the outcome of the next phase change.
References
Trace cache: a low latency approach to high bandwidth instruction fetching
Eric Rotenberg,Steve Bennett,James E. Smith +2 more
- 02 Dec 1996
TL;DR: It is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.
Continuous profiling: where have all the cycles gone?
Jennifer M. Anderson,Lance M. Berc,Jeffrey Dean,Sanjay Ghemawat,Monika Henzinger,Shun-Tak Albert Leung,Richard L. Sites,Mark T. Vandevoorde,Carl A. Waldspurger,William E. Weihl +9 more
- 01 Oct 1997
TL;DR: The Digital Continuous Profiling Infrastructure is a sampling-based profiling system designed to run continuously on production systems, supporting multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel.
Optimally profiling and tracing programs
Thomas Ball,James R. Larus +1 more
TL;DR: Algorithms for inserting monitoring code to profile and trace programs and show that edge profiling with edge counters works well in practice because it is simple and efficient and finds optimal counter placements in most cases.
DAISY: dynamic compilation for 100% architectural compatibility
Kemal Ebcioglu,Erik R. Altman +1 more
- 01 May 1997
TL;DR: The architectural requirements for such a VLIW, to deal with issues including self-modifying code, precise exceptions, and aggressive reordering of memory references in the presence of strong MP consistency and memory mapped I/O are discussed.
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
Jeffrey Dean,James W. Hicks,Carl A. Waldspurger,William E. Weihl,George Z. Chrysos +4 more
- 01 Dec 1997
TL;DR: An inexpensive hardware implementation of ProfileMe is described, a variety of software techniques to extract useful profile information from the hardware are outlined, and several ways in which this information can provide valuable feedback for programmers and optimizers are explained.
Related Papers (5)
Kemal Ebcioglu,Erik R. Altman +1 more
- 01 May 1997
Ashutosh S. Dhodapkar,James E. Smith +1 more
- 03 Dec 2003