Proceedings Article10.1109/PACT.2005.38
Trace cache sampling filter
Michael Behar,Avi Mendelson,Avinoam Kolodny +2 more
- 17 Sep 2005
- pp 255-266
17
TL;DR: It is shown that the sampling filter improves trace cache and overall system performance, while reducing power dissipation, and reduces the overall misses in the first level of cache hierarchy.
read more
Abstract: This paper presents a new technique for efficient usage of small trace caches. A trace cache can significantly increase the performance of wide out-of-order processors, but to be effective, the size of the trace cache should be large. Power and timing considerations indicate that a small trace cache is desirable, with special mechanisms to increase its effectiveness despite the limited size. Hence several authors have proposed various filtering methods to select "good traces" for keeping in the trace cache, from among the general population of traces. This paper presents a new filtering technique, which is based on sampling. Our new technique suggests that instead of building all the traces and trying to select the good ones among them, it is more efficient to make a preliminary selection of traces. This selection is based on a random sampling approach. We show that the sampling filter improves trace cache and overall system performance, while reducing power dissipation. The sampling filter reduces admission of traces that are not used prior to their eviction from the cache, and prolongs the percentage of time a trace is in its live phase during its stay in the cache. Moreover, the sampling filter reduces duplication between the trace cache and the instruction cache and thus reduces the overall misses in the first level of cache hierarchy.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Yuejian Xie,Gabriel H. Loh +1 more
- 20 Jun 2009
TL;DR: This work proposes a new cache management approach that combines dynamic insertion and promotion policies to provide the benefits of cache partitioning, adaptive insertion, and capacity stealing all with a single mechanism.
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy
Gabriel H. Loh
- 12 Dec 2009
TL;DR: This work proposes a cache where each set is organized as multiple logical FIFO or queue structures that simultaneously provide performance isolation between threads as well as reduce the number of entries occupied by dead lines.
90
L1 Cache Filtering Through Random Selection of Memory References
Yoav Etsion,Dror G. Feitelson +1 more
- 15 Sep 2007
TL;DR: A simple probabilistic filtering mechanism based on random sampling to identify and select the frequently used blocks is suggested and it is shown that a 16K direct-mapped LI cache, augmented with a fully-associative 2K filter, achieves on average over 10% more instructions per cycle than a regular 16 K, 4-way set-association cache, and even ~5% more IPC than a 32 K,4-way cache.
30
A register-file approach for row buffer caches in die-stacked DRAMs
Gabriel H. Loh
- 03 Dec 2011
TL;DR: This work proposes a “file-managed” row buffer cache (FM-RB$) approach inspired by traditional register allocation and peep-hole optimization ideas from compiler design that can deliver performance benefits beyond a conventional cache approach.
20
CATCH: a mechanism for dynamically detecting Cache-Content-Duplication and its application to instruction caches
Marios Kleanthous,Yiannakis Sazeides +1 more
- 10 Mar 2008
TL;DR: It is shown that CCD is a frequent phenomenon and that an idealized duplication- detection mechanism for instruction caches has the potential to increase performance of an out-of-order processor, with a 2-way eight instruction per block 16 KB instruction cache, often by more than 5% and up to 20%.
References
CACTI: an enhanced cache access and cycle time model
TL;DR: In this paper, an analytical model for the access and cycle times of on-chip direct-mapped and set-associative caches is presented, where the inputs to the model are the cache size, block size, and associativity, as well as array organization and process parameters.
885
Cache decay: exploiting generational behavior to reduce cache leakage power
Stefanos Kaxiras,Zhigang Hu,Margaret Martonosi +2 more
- 01 May 2001
TL;DR: This paper discusses policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused, and proposes adaptive policies that effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.
The microarchitecture of the Pentium 4 processor
G. Hinton
- 01 Jan 2001
TL;DR: The main features and functions of the NetBurst microarchitecture of Intel’s new flagship Pentium 4 processor are described, including its new form of instruction cache called the Execution Trace Cache.
671
Trace cache: a low latency approach to high bandwidth instruction fetching
Eric Rotenberg,Steve Bennett,James E. Smith +2 more
- 02 Dec 1996
TL;DR: It is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.
Related Papers (5)
Anant Agarwal
- 01 Dec 1987
Yutao Zhong,Steven Dropsho,Chen Ding +2 more
- 27 Sep 2003
Jacqueline Chame,Michel Dubois +1 more
- 01 Jun 1993