Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation
Chi-Keung Luk,Todd C. Mowry +1 more
- 01 May 1999
- Vol. 27, Iss: 2, pp 88-99
TL;DR: Experimental results demonstrate that the aggressive layout optimizations enabled by memory forwarding can result in significant speedups---more than twofold in some cases---by reducing the number of cache misses, improving the effectiveness of prefetching, and conserving memory bandwidth.
read more
Abstract: By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache conflicts and false sharing. Unfortunately, it is extremely difficult to guarantee that such optimizations are safe in practice on today's machines, since accurately updating all pointers to an object requires perfect alias information, which is well beyond the scope of the compiler for languages such as C. To overcome this limitation, we propose a technique called memory forwarding which effectively adds a new layer of indirection within the memory system whenever necessary to guarantee that data relocation is always safe. Because actual forwarding rarely occurs (it exists as a safety net), the mechanism can be implemented as an exception in modern superscalar processors. Our experimental results demonstrate that the aggressive layout optimizations enabled by memory forwarding can result in significant speedups---more than twofold in some cases---by reducing the number of cache misses, improving the effectiveness of prefetching, and conserving memory bandwidth.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Patent
Profile-driven data layout optimization
Gerald Dwayne Kuch,Brian C. Beckman,Jason L. Zander +2 more
- 29 Mar 2001
TL;DR: In this article, the data members of different groups can be placed in separately loadable units of memory in the memory system and more frequently referenced data members, including those that tend to be referenced at times close to each other, reside at neighboring locations in memory system.
86
•Book
A Primer on Hardware Prefetching
Babak Falsafi,Thomas F. Wenisch +1 more
- 01 Jun 2014
TL;DR: This primer offers an overview of the various classes of hardware prefetchers for instructions and data proposed in the research literature, and presents examples of techniques incorporated into modern microprocessors.
77
Improving Locality for Adaptive Irregular Scientific Codes
Hwansoo Han,Chau-Wen Tseng +1 more
- 10 Aug 2000
TL;DR: A cost model is developed which can be employed to calculate an efficient optimization frequency and may be applied dynamically instrumenting the program to measure execution time per time-step iteration and shows locality optimization may be used to improve performance even for adaptive codes.
Access pattern based local memory customization for low power embedded systems
Peter Grun,Nikil Dutt,Alexandru Nicolau +2 more
- 13 Mar 2001
TL;DR: This approach customizes the local memory architecture matching the diverse access patterns and locality types present in the application, to reduce the main memory bandwidth requirement, and significantly improve power consumption, without sacrificing performance.
42
Recursive data structure profiling
Easwaran Raman,David I. August +1 more
- 12 Jun 2005
TL;DR: A method for collecting RDS profile without requiring any high-level program representation or type information is described, which achieves this with manageable space and time overhead on a mixture of pointer intensive benchmarks from the SPEC, Olden and other benchmark suites.
References
Graph-Based Algorithms for Boolean Function Manipulation
TL;DR: In this paper, the authors present a data structure for representing Boolean functions and an associated set of manipulation algorithms, which have time complexity proportional to the sizes of the graphs being operated on, and hence are quite efficient as long as the graphs do not grow too large.
A data locality optimizing algorithm
Michael Wolf,Monica S. Lam +1 more
- 01 May 1991
TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.
The cache performance and optimizations of blocked algorithms
Monica D. Lam,Edward E. Rothberg,Michael E. Wolf +2 more
- 01 Apr 1991
TL;DR: It is shown that the degree of cache interference is highly sensitive to the stride of data accesses and the size of the blocks, and can cause wide variations in machine performance for different matrix sizes.
VIS: A System for Verification and Synthesis
Robert K. Brayton,Gary D. Hachtel,Alberto Sangiovanni-Vincentelli,Fabio Somenzi,Adnan Aziz,Szu-Tsung Cheng,Stephen A. Edwards,Sunil P. Khatri,Yuji Kukimoto,Abelardo Pardo,Shaz Qadeer,Rajeev Kumar Ranjan,Shaker Sarwary,Thomas R. Shiple,Gitanjali Swamy,Tiziano Villa +15 more
- 03 Aug 1996
TL;DR: VIS provides the capability to check the combinational equivalence of two designs and provides traditional verification in the form of a cycle-based simulator that uses BDD techniques.
Efficient, context-sensitive pointer analysis for c programs
Monica S. Lam,Robert P. Wilson +1 more
- 01 Jan 1998
TL;DR: In this article, a partial transfer function (PTF) describes the behavior of a procedure assuming that certain alias relationships hold when it is called, and can reuse a PTF in many calling contexts as long as the aliases among the inputs to the procedure are the same.
603
Related Papers (5)
Brad Calder,Chandra Krintz,Simmi John,Todd Austin +3 more
- 01 Oct 1998
John L. Hennessy,David A. Patterson +1 more
- 01 Dec 1989
Mark Oskin,Frederic T. Chong,Timothy Sherwood +2 more
- 16 Apr 1998