Open AccessJournal Article
Skewed associativity improves program performance and enhances predictability
François Bodin,André Seznec +1 more
43
TL;DR: It is shown that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms, which means that execution time is faster and much more predictable than with conventional caches.
read more
Abstract: Performance tuning becomes harder as computer technology advances. One of the factors is the increasing complexity of memory hierarchies. Most modern machines now use at least one level of cache memory. To reduce execution stalls, cache misses must be very low. Software techniques used to improve locality have been developed for numerical codes, such as loop blocking and copying. Unfortunately, the behavior of direct mapped and set associative caches is still erratic when large data arrays are accessed. Execution time can vary drastically for the same loop kernel depending on uncontrolled factors such as array leading size. The only software method available to improve execution time stability is the copying of frequently used data, which is costly in execution time. Users are not usually cache organization experts. They are not aware of such phenomena and have no control over it. In this paper, we show that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms. As a result, execution time is faster and much more predictable than with conventional caches. It is therefore possible to use larger block sizes in blocked algorithms, which will further reduce blocking overhead costs.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Using Randomized Caches in Probabilistic Real-Time Systems
Eduardo Quinones,Emery D. Berger,Guillem Bernat,Francisco J. Cazorla +3 more
- 01 Jul 2009
TL;DR: This paper presents simulation-based results on representative examples that illustrate the problem of performance anomalies with standard cache replacement policies and shows that, by eliminating dependencies on access history, randomized replacement greatly reduces the risk of these cache-based performance anomalies, enabling probabilistic worst-case execution time analysis.
Exploiting the cache capacity of a single-chip multi-core processor with execution migration
P. Michaud
- 14 Feb 2004
TL;DR: The affinity algorithm, a method for distributing cache lines automatically on several caches, is introduced, and it is shown that on working-sets exhibiting a property called "splittability", it is possible to trade cache misses for migrations.
Cost-effective flow table designs for high-speed routers: architecture and performance evaluation
Jun Xu,Mukesh Singhal +1 more
TL;DR: The software-based design, adapted from the hash table data structure, employs a practical and effective technique to solve the garbage collection problem caused by the expired flows and the performance evaluation results from both trace-driven simulation and statistical analysis demonstrate that both designs are cost-effective for their targeted router models.
26
Eliminating conflict misses using prime number-based cache indexing
TL;DR: An in-depth analysis of the pathological behavior of cache hashing functions is presented and two new hashing functions are proposed, prime modulo and odd-multiplier displacement, that are resistant to pathological behavior and yet are able to eliminate the worst-case conflict behavior in the L2 cache are proposed.
Eliminating intra-warp conflict misses in GPU
Bin Wang,Zhuo Liu,Xinning Wang,Weikuan Yu +3 more
- 09 Mar 2015
TL;DR: Through an in-depth analysis of GPU access patterns, it is found that column-majored strided accesses are likely to incur high intra-warp concentration and a Full Permutation (FUP) based indexing method is proposed that adapts to both large and medium strides in this pattern.
17
References
A data locality optimizing algorithm
Michael Wolf,Monica S. Lam +1 more
- 01 May 1991
TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.
Skewed-associative Caches
André Seznec,François Bodin +1 more
- 14 Jun 1993
TL;DR: In order to improve cache hit ratios, set-associative caches are used in some of the new superscalar microprocessors.
A strategy for array management in local memory
TL;DR: This paper shows how to compute approximations of window sets defined by Gannon, Jalby, and Gallivan, which allows derivation of a global strategy of data management for local memories which may be combined efficiently with various parallelization and/or vectorization optimizations.
A Quantitative Algorithm for Data Locality Optimization
François Bodin,William Jalby,Daniel Windheiser,Christine Eisenbeis +3 more
- 01 Jan 1992
TL;DR: A register allocation algorithm and a cache usage optimization algorithm based on the reference window concept which can be effectively implemented in a compiler system are described.
35
Randomization and Associativity in the Design of Placement-Insensitive Caches
Michael Schlansker,Robert Shaw,Sivaram Sivaramakrishnan +2 more
- 01 Jan 1993
TL;DR: A pseudo-random hash function is presented and used to randomize addresses into cache sets and a counting technique is used to determine miss ratios, demonstrating a close relationship between analysis and at least one real application.
27
Related Papers (5)
Xavier Vera,Björn Lisper,Jingling Xue +2 more
- 10 Jun 2003
Anthony LaMarca,Richard E. Ladner +1 more
- 01 Jan 1996
James D. Fix,Richard E. Ladner +1 more
- 01 Jan 2002