Journal Article10.1002/CPE.851
Recurrence analysis for effective array prefetching in Java
TL;DR: A new unified compile‐time analysis for software prefetching arrays and linked structures in Java is described to hide memory latency and it is shown that the additional loop transformations and careful scheduling of prefetches from previous work are not always necessary for modern architectures and Java programs.
read more
Abstract: SUMMARY Java is an attractive choice for numerical, as well as other, algorithms due to the software engineering benefits of object-oriented programming. Because numerical programs often use large arrays that do not fit in the cache, they to suffer from poor memory performance. To hide memory latency, we describe a new unified compile-time analysis for software prefetching arrays and linked structures in Java. Our previous work uses data-flow analysis to discover linked data structure accesses. We generalize our prior approach to identify loop induction variables as well, which we call recurrence analysis. Our algorithm schedules prefetches for all array references that contain induction variables. We evaluate our technique using a simulator of an out-of-order superscalar processor running a set of array-based Java programs. Across all our programs, prefetching reduces execution time by a geometric mean of 23%, and the largest improvement is 58%. We also evaluate prefetching on a PowerPC processor, and we show that prefetching reduces execution time by a geometric mean of 17%. Because our analysis is much simpler and quicker than previous techniques, it is suitable for including in a just-in-time compiler. Traditional software prefetching algorithms for C and Fortran use locality analysis and sophisticated loop transformations. We further show that the additional loop transformations and careful scheduling of prefetches from previous work are not always necessary for modern architectures and Java programs.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Starc: static analysis for efficient repair of complex data
Bassem Elkarablieh,Sarfraz Khurshid,Duy Vu,Kathryn S. McKinley +3 more
- 21 Oct 2007
TL;DR: STARC uses static analysis to repair data structures with tens of thousands of nodes, up to 100 times larger than prior work, and efficiency is probably not practical for very large data structures in deployed systems, but opens a promising direction for future work.
Prefetching in functional languages
Sam Ainsworth,Timothy M. Jones +1 more
- 16 Jun 2020
TL;DR: This work adds language primitives for software-prefetching to the OCaml language to exploit this, and observes significant performance improvements a variety of micro- and macro-benchmarks.
2
Starc: static analysis for efficient repair of complex data
Bassem Elkarablieh,Sarfraz Khurshid,Duy Vu,Kathryn S. McKinley +3 more
TL;DR: STARC uses static analysis to repair complex data structures with tens of thousands of nodes, identifying recurrent fields and local constraints to guide efficient and effective repair, outperforming prior work by up to 100 times in experimental results.
References
•Book
Compilers: Principles, Techniques, and Tools
Alfred V. Aho,Ravi Sethi,Jeffrey D. Ullman +2 more
- 01 Jan 1986
TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.
9.7K
Efficiently computing static single assignment form and the control dependence graph
TL;DR: In this article, the authors present new algorithms that efficiently compute static single assignment forms and control dependence graphs for arbitrary control flow graphs using the concept of {\em dominance frontiers} and give analytical and experimental evidence that these data structures are usually linear in the size of the original program.
Prefetching using Markov predictors
Doug Joseph,Dirk Grunwald +1 more
- 01 May 1997
TL;DR: The Markov prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing computer designs and reduces the overall execution stalls due to instruction and data memory operations by an average of 54% for various commercial benchmarks while only using two thirds the memory of a demand-fetch cache organization.
679
An effective on-chip preloading scheme to reduce data access penalty
Jean-Loup Baer,Tien-Fu Chen +1 more
- 01 Aug 1991
TL;DR: In this article, a new hardware prefetching scheme based on the prediction of the execution of the instruction stream and associated operand references is proposed. But this scheme requires the use of a reference prediction table and its associated logic.
499
Data prefetch mechanisms
TL;DR: To be effective, prefetching must be implemented in such a way that prefetches are timely, useful, and introduce little overhead, and secondary effects such as cache pollution and increased memory bandwidth requirements must be taken into consideration.
341
Related Papers (5)
David Bernstein,Doron Cohen,Dror E. Maydan +2 more
- 30 Nov 1994
Robert Hundt,Sandya Mannarswamy,Dhruva R. Chakrabarti +2 more
- 26 Mar 2006