Skewed associativity improves program performance and enhances predictability

Open AccessJournal Article

Skewed associativity improves program performance and enhances predictability

François Bodin, +1 more

- 01 Sep 1997

- IEEE Transactions on Software Engineerin...

- Vol. 23, Iss: 9, pp 530-544

43

TL;DR: It is shown that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms, which means that execution time is faster and much more predictable than with conventional caches.

Abstract: Performance tuning becomes harder as computer technology advances. One of the factors is the increasing complexity of memory hierarchies. Most modern machines now use at least one level of cache memory. To reduce execution stalls, cache misses must be very low. Software techniques used to improve locality have been developed for numerical codes, such as loop blocking and copying. Unfortunately, the behavior of direct mapped and set associative caches is still erratic when large data arrays are accessed. Execution time can vary drastically for the same loop kernel depending on uncontrolled factors such as array leading size. The only software method available to improve execution time stability is the copying of frequently used data, which is costly in execution time. Users are not usually cache organization experts. They are not aware of such phenomena and have no control over it. In this paper, we show that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms. As a result, execution time is faster and much more predictable than with conventional caches. It is therefore possible to use larger block sizes in blocked algorithms, which will further reduce blocking overhead costs.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/ECRTS.2009.30

Using Randomized Caches in Probabilistic Real-Time Systems

Eduardo Quinones, +3 more

- 01 Jul 2009

TL;DR: This paper presents simulation-based results on representative examples that illustrate the problem of performance anomalies with standard cache replacement policies and shows that, by eliminating dependencies on access history, randomized replacement greatly reduces the risk of these cache-based performance anomalies, enabling probabilistic worst-case execution time analysis.

...read moreread less

55

Proceedings Article•10.1109/HPCA.2004.10026

Exploiting the cache capacity of a single-chip multi-core processor with execution migration

P. Michaud

- 14 Feb 2004

TL;DR: The affinity algorithm, a method for distributing cache lines automatically on several caches, is introduced, and it is shown that on working-sets exhibiting a property called "splittability", it is possible to trade cache misses for migrations.

...read moreread less

39

Journal Article•10.1109/TC.2002.1032627

Cost-effective flow table designs for high-speed routers: architecture and performance evaluation

Jun Xu, +1 more

- 01 Sep 2002

- IEEE Transactions on Computers

TL;DR: The software-based design, adapted from the hash table data structure, employs a practical and effective technique to solve the garbage collection problem caused by the expired flows and the performance evaluation results from both trace-driven simulation and statistical analysis demonstrate that both designs are cost-effective for their targeted router models.

...read moreread less

26

•Journal Article•10.1109/TC.2005.79

Eliminating conflict misses using prime number-based cache indexing

Mazen Kharbutli, +2 more

- 01 May 2005

- IEEE Transactions on Computers

TL;DR: An in-depth analysis of the pathological behavior of cache hashing functions is presented and two new hashing functions are proposed, prime modulo and odd-multiplier displacement, that are resistant to pathological behavior and yet are able to eliminate the worst-case conflict behavior in the L2 cache are proposed.

...read moreread less

24

•Proceedings Article•10.5555/2755753.2755911

Eliminating intra-warp conflict misses in GPU

Bin Wang, +3 more

- 09 Mar 2015

TL;DR: Through an in-depth analysis of GPU access patterns, it is found that column-majored strided accesses are likely to incur high intra-warp concentration and a Full Permutation (FUP) based indexing method is proposed that adapts to both large and medium strides in this pattern.

...read moreread less

17

...

Expand

References

•Proceedings Article•10.1145/113445.113449

A data locality optimizing algorithm

Michael Wolf, +1 more

- 01 May 1991

TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.

...read moreread less

1.4K

•Book Chapter•10.1007/3-540-56891-3_24

Skewed-associative Caches

André Seznec, +1 more

- 14 Jun 1993

TL;DR: In order to improve cache hit ratios, set-associative caches are used in some of the new superscalar microprocessors.

...read moreread less

70

•Journal Article•10.1007/BF01582075

A strategy for array management in local memory

Christine Eisenbeis, +3 more

- 25 Feb 1994

- Mathematical Programming

TL;DR: This paper shows how to compute approximations of window sets defined by Gannon, Jalby, and Gallivan, which allows derivation of a global strategy of data management for local memories which may be combined efficiently with various parallelization and/or vectorization optimizations.

...read moreread less

48

Book Chapter•10.1007/978-1-4471-3501-2_8

A Quantitative Algorithm for Data Locality Optimization

François Bodin, +3 more

- 01 Jan 1992

TL;DR: A register allocation algorithm and a cache usage optimization algorithm based on the reference window concept which can be effectively implemented in a compiler system are described.

...read moreread less

35

Randomization and Associativity in the Design of Placement-Insensitive Caches

Michael Schlansker, +2 more

- 01 Jan 1993

TL;DR: A pseudo-random hash function is presented and used to randomize addresses into cache sets and a counting technique is used to determine miss ratios, demonstrating a close relationship between analysis and at least one real application.

...read moreread less

27

Skewed associativity improves program performance and enhances predictability

Chat with Paper

AI Agents for this Paper

Citations

Using Randomized Caches in Probabilistic Real-Time Systems

Exploiting the cache capacity of a single-chip multi-core processor with execution migration

Cost-effective flow table designs for high-speed routers: architecture and performance evaluation

Eliminating conflict misses using prime number-based cache indexing

Eliminating intra-warp conflict misses in GPU

References

A data locality optimizing algorithm

Skewed-associative Caches

A strategy for array management in local memory

A Quantitative Algorithm for Data Locality Optimization

Randomization and Associativity in the Design of Placement-Insensitive Caches

Related Papers (5)

Data cache locking for higher program predictability

V-P cache: a storage efficient virtual cache organization

Caches and algorithms

Cache performance analysis of algorithms

An algorithm for deciding minimal cache sizes in real-time systems