Book Chapter10.1007/3-540-44849-7_5
Cache-oblivious algorithms
Charles E. Leiserson
- 28 May 2003
- pp 5-5
607
TL;DR: It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal across a multilevel cache hierarchy, and it is shown that the assumption of optimal replacement made by the ideal-cache model can be simulated efficiently by LRU replacement.
read more
Abstract: Computers with multiple levels of caching have traditionally required techniques such as data blocking in order for algorithms to exploit the cache hierarchy effectively. These "cache-aware" algorithms must be properly tuned to achieve good performance using so-called "voodoo" parameters which depend on hardware properties, such as cache size and cache-line length.
Surprisingly, however, for a variety of problems - including matrix multiplication, FFT, and sorting - asymptotically optimal "cache-oblivious" algorithms do exist that contain no voodoo parameters. They perform an optimal amount of work and move data optimally among multiple levels of cache. Since they need not be tuned, cache-oblivious algorithms are more portable than traditional cache-aware algorithms.
We employ an "ideal-cache" model to analyze these algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal across a multilevel cache hierarchy. We also show that the assumption of optimal replacement made by the ideal-cache model can be simulated efficiently by LRU replacement. We also provide some empirical results on the effectiveness of cache-oblivious algorithms in practice.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The Design and Implementation of FFTW3
Matteo Frigo,Steven G. Johnson +1 more
- 24 Jan 2005
TL;DR: It is shown that such an approach can yield an implementation of the discrete Fourier transform that is competitive with hand-optimized libraries, and the software structure that makes the current FFTW3 version flexible and adaptive is described.
External memory algorithms and data structures: dealing with massive data
TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
789
X-Stream: edge-centric graph processing using streaming partitions
Amitabha Roy,Ivo Mihailovic,Willy Zwaenepoel +2 more
- 03 Nov 2013
TL;DR: X-Stream is novel in using an edge-centric rather than a vertex-centric implementation of this model, and streaming completely unordered edge lists rather than performing random access, and competes favorably with existing systems for graph processing.
Linear work suffix array construction
TL;DR: A generalized algorithm, DC, that allows a space-efficient implementation and, moreover, supports the choice of a space--time tradeoff and is asymptotically faster than all previous suffix tree or array construction algorithms.
481
CloudIQ: a framework for processing base stations in a data center
Sourjya Bhaumik,Shoban Preeth Chandrabose,Manjunath Kashyap Jataprolu,Gautam Kumar,Anand Muralidhar,Paul Anthony Polakos,Vikram Srinivasan,Thomas Woo +7 more
- 22 Aug 2012
TL;DR: It is shown that the centralized architecture can potentially result in savings of at least 22 % in compute resources by exploiting the variations in the processing load across base stations, and a framework is designed that has two objectives: partitioning the set of base stations into groups that are simultaneously processed on a shared homogeneous compute platform for a given statistical guarantee.
274
Related Papers (5)
A. Yoon,P.P. Khargonekar +1 more
- 21 Jun 1998
Shu-Chien Huang,Yung-Nien Sun +1 more
- 04 May 1998
Farzad Farnoud,Olgica Milenkovic +1 more
- 03 Oct 2011