Trace cache sampling filter

doi:10.1109/PACT.2005.38

Proceedings Article10.1109/PACT.2005.38

Trace cache sampling filter

Michael Behar, +2 more

- 17 Sep 2005

- pp 255-266

17

TL;DR: It is shown that the sampling filter improves trace cache and overall system performance, while reducing power dissipation, and reduces the overall misses in the first level of cache hierarchy.

Abstract: This paper presents a new technique for efficient usage of small trace caches. A trace cache can significantly increase the performance of wide out-of-order processors, but to be effective, the size of the trace cache should be large. Power and timing considerations indicate that a small trace cache is desirable, with special mechanisms to increase its effectiveness despite the limited size. Hence several authors have proposed various filtering methods to select "good traces" for keeping in the trace cache, from among the general population of traces. This paper presents a new filtering technique, which is based on sampling. Our new technique suggests that instead of building all the traces and trying to select the good ones among them, it is more efficient to make a preliminary selection of traces. This selection is based on a random sampling approach. We show that the sampling filter improves trace cache and overall system performance, while reducing power dissipation. The sampling filter reduces admission of traces that are not used prior to their eviction from the cache, and prolongs the percentage of time a trace is in its live phase during its stay in the cache. Moreover, the sampling filter reduces duplication between the trace cache and the instruction cache and thus reduces the overall misses in the first level of cache hierarchy.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/1555754.1555778

PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Yuejian Xie, +1 more

- 20 Jun 2009

TL;DR: This work proposes a new cache management approach that combines dynamic insertion and promotion policies to provide the benefits of cache partitioning, adaptive insertion, and capacity stealing all with a single mechanism.

...read moreread less

366

Proceedings Article•10.1145/1669112.1669139

Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy

Gabriel H. Loh

- 12 Dec 2009

TL;DR: This work proposes a cache where each set is organized as multiple logical FIFO or queue structures that simultaneously provide performance isolation between threads as well as reduce the number of entries occupied by dead lines.

...read moreread less

90

•Proceedings Article•10.1109/PACT.2007.44

L1 Cache Filtering Through Random Selection of Memory References

Yoav Etsion, +1 more

- 15 Sep 2007

TL;DR: A simple probabilistic filtering mechanism based on random sampling to identify and select the frequently used blocks is suggested and it is shown that a 16K direct-mapped LI cache, augmented with a fully-associative 2K filter, achieves on average over 10% more instructions per cycle than a regular 16 K, 4-way set-association cache, and even ~5% more IPC than a 32 K,4-way cache.

...read moreread less

30

Proceedings Article•10.1145/2155620.2155662

A register-file approach for row buffer caches in die-stacked DRAMs

Gabriel H. Loh

- 03 Dec 2011

TL;DR: This work proposes a “file-managed” row buffer cache (FM-RB$) approach inspired by traditional register allocation and peep-hole optimization ideas from compiler design that can deliver performance benefits beyond a conventional cache approach.

...read moreread less

20

Proceedings Article•10.1145/1403375.1403720

CATCH: a mechanism for dynamically detecting Cache-Content-Duplication and its application to instruction caches

Marios Kleanthous, +1 more

- 10 Mar 2008

TL;DR: It is shown that CCD is a frequent phenomenon and that an idealized duplication- detection mechanism for instruction caches has the potential to increase performance of an out-of-order processor, with a 2-way eight instruction per block 16 KB instruction cache, often by more than 5% and up to 20%.

...read moreread less

16

...

Expand

References

•Proceedings Article

Wattch: a framework for architectural-level power analysis and optimizations

Brooks, +2 more

- 01 Jan 2000

926

Journal Article•10.1109/4.509850

CACTI: an enhanced cache access and cycle time model

Steven J. E. Wilton, +1 more

- 01 May 1996

- IEEE Journal of Solid-state Circuits

TL;DR: In this paper, an analytical model for the access and cycle times of on-chip direct-mapped and set-associative caches is presented, where the inputs to the model are the cache size, block size, and associativity, as well as array organization and process parameters.

...read moreread less

885

Proceedings Article•10.1145/379240.379268

Cache decay: exploiting generational behavior to reduce cache leakage power

Stefanos Kaxiras, +2 more

- 01 May 2001

TL;DR: This paper discusses policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused, and proposes adaptive policies that effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.

...read moreread less

733

The microarchitecture of the Pentium 4 processor

G. Hinton

- 01 Jan 2001

TL;DR: The main features and functions of the NetBurst microarchitecture of Intel’s new flagship Pentium 4 processor are described, including its new form of instruction cache called the Execution Trace Cache.

...read moreread less

671

•Proceedings Article•10.5555/243846.243854

Trace cache: a low latency approach to high bandwidth instruction fetching

Eric Rotenberg, +2 more

- 02 Dec 1996

TL;DR: It is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.

...read moreread less

645