Book Chapter10.1007/978-1-4471-3501-2_8
A Quantitative Algorithm for Data Locality Optimization
François Bodin,William Jalby,Daniel Windheiser,Christine Eisenbeis +3 more
- 01 Jan 1992
- pp 119-145
35
TL;DR: A register allocation algorithm and a cache usage optimization algorithm based on the reference window concept which can be effectively implemented in a compiler system are described.
read more
Abstract: In this paper, we consider the problem of optimizing register allocation and cache behavior for loop array references. We exploit techniques developed initially for data locality estimation and improvement. First we review the concept of “reference window” that serves as our basic tool for both data locality evaluation and management. Then we study how some loop restructuring techniques (interchanging, tiling, ...) can help to improve data locality. We describe a register allocation algorithm and a cache usage optimization algorithm based on the window concept which can be effectively implemented in a compiler system. Experimental speedup measurements on a RISC processor, the IBM RS/6000, give evidence of the efficiency of our technique.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Skewed associativity improves program performance and enhances predictability
François Bodin,André Seznec +1 more
TL;DR: In this paper, the authors show that the four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms, and therefore, execution time is faster and much more predictable than with conventional caches.
49
•Journal Article
Skewed associativity improves program performance and enhances predictability
François Bodin,André Seznec +1 more
TL;DR: It is shown that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms, which means that execution time is faster and much more predictable than with conventional caches.
43
Low power memory storage and transfer organization for the MPEG-4 full pel motion estimation on a multimedia processor
TL;DR: This paper estimates that a software reference implementation of an MPEG-4 video encoder typically requires five Gtransfers/s to main memory for a simple profile level L2, and applies the ACROPOLIS methodology to relieve this data access bottleneck, arriving at an implementation which needs a factor 65 less background accesses.
Fortran-S: A Fortran interface for shared virtual memory architectures
François Bodin,L. Kervella,Thierry Priol +2 more
- 01 Dec 1993
TL;DR: A programming environment for distributed memory parallel computers, consisting of a Fortran 77 compiler enhanced with directives to specify parallelism, is introduced and preliminary results obtained with the first prototype of the compiler are presented.
39
Managing pages in shared virtual memory systems: getting the compiler into the game
Elana D. Granston,Harry A. G. Wijshoff +1 more
- 01 Aug 1993
TL;DR: The issue of compiler involvement in areas ranging from loop transformations and scheduling issues, to data layout strategies, page placement decisions, access pattern analysis, and use of run time system directives are discussed.
36
References
•Book
Compilers: Principles, Techniques, and Tools
Alfred V. Aho,Ravi Sethi,Jeffrey D. Ullman +2 more
- 01 Jan 1986
TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.
9.7K
A data locality optimizing algorithm
Michael Wolf,Monica S. Lam +1 more
- 01 May 1991
TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.
The cache performance and optimizations of blocked algorithms
Monica D. Lam,Edward E. Rothberg,Michael E. Wolf +2 more
- 01 Apr 1991
TL;DR: It is shown that the degree of cache interference is highly sensitive to the stride of data accesses and the size of the blocks, and can cause wide variations in machine performance for different matrix sizes.
Advanced compiler optimizations for supercomputers
David Padua,Michael Wolfe +1 more
TL;DR: Compilers for vector or multiprocessor computers must have certain optimization features to successfully generate parallel code to be able to operate on parallel systems.
Dependence graphs and compiler optimizations
David J. Kuck,Robert H. Kuhn,David Padua,Bruce Leasure,Michael Wolfe +4 more
- 26 Jan 1981
TL;DR: This paper defines such graphs and discusses two kinds of transformations, simple rewriting transformations that remove dependence arcs and abstraction transformations that deal more globally with a dependence graph.
752
Related Papers (5)
Michael Wolf,Monica S. Lam +1 more
- 01 May 1991
Jeanne Ferrante,Vivek Sarkar,W. Thrash +2 more
- 07 Aug 1991
M. Wolfe
- 01 Aug 1989