Tag RAM

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/605397.605420•

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

[...]

Changkyu Kim¹, Doug Burger¹, Stephen W. Keckler¹•Institutions (1)

University of Texas at Austin¹

1 Oct 2002

TL;DR: This paper proposes physical designs for these Non-Uniform Cache Architectures (NUCAs) and extends these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache.

...read moreread less

Abstract: Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a line's physical location within the cache. Consequently, cache access times will become a continuum of latencies rather than a single discrete latency. This non-uniformity can be exploited to provide faster access to cache lines in the portions of the cache that reside closer to the processor. In this paper, we evaluate a series of cache designs that provides fast hits to multi-megabyte cache memories. We first propose physical designs for these Non-Uniform Cache Architectures (NUCAs). We extend these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchy--while using less silicon area--by 13%, and comes within 13% of an ideal minimal hit latency solution.

...read moreread less

831 citations

Patent•

Unified re-map and cache-index table with dual write-counters for wear-leveling of non-volatile flash RAM mass storage

[...]

Ricardo H. Bruce, Rolando H. Bruce, Earl T. Cohen, Allan J. Christie

25 Aug 1997

TL;DR: In this article, a unified re-map table in a RAM is used to arbitrarily remap all logical addresses from a host system to physical addresses of flash-memory devices, and wear-leveling is performed on a block being written when both total and incremental counts exceed system-wide total and incrementally thresholds.

...read moreread less

Abstract: A flash-memory system provides solid-state mass storage as a replacement to a hard disk. A unified re-map table in a RAM is used to arbitrarily re-map all logical addresses from a host system to physical addresses of flash-memory devices. Each entry in the unified re-map table contains a physical block address (PBA) of the flash memory allocated to the logical address, and a cache valid bit and a cache index. When the cache valid bit is set, the data is read or written to a line in the cache pointed to by the cache index. A separate cache tag RAM is not needed. When the cache valid bit is cleared, the data is read from the flash memory block pointed to by the PBA. Two write count values are stored with the PBA in the table entry. A total-write count indicates a total number of writes to the flash block since manufacture. An incremental-write count indicates the number of writes since the last wear-leveling operation that moved the block. Wear-leveling is performed on a block being written when both total and incremental counts exceed system-wide total and incremental thresholds. The incremental-write count is cleared after a block is wear-leveled, but the total-write count is never cleared. The incremental-write count prevents moving a block again immediately after wear-leveling. The thresholds are adjusted as the system ages to provide even wear.

...read moreread less

592 citations

Proceedings Article•

The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

[...]

Yuanyuan Zhou, James Philbin, Kai Li

25 Jun 2001

402 citations

Proceedings Article•10.1145/339647.339685•

Reconfigurable caches and their application to media processing

[...]

Parthasarathy Ranganathan¹, Sarita V. Adve², Norman P. Jouppi•Institutions (2)

Rice University¹, University of Illinois at Urbana–Champaign²

1 May 2000

TL;DR: A new reconfigurable cache design is proposed that enables the cache SRAM arrays to be dynamically divided into multiple partitions that can be used for different processor activities.

...read moreread less

Abstract: High performance general-purpose processors are increasingly being used for a variety of application domains - scientific, engineering, databases, and more recently, media processing. It is therefore important to ensure that architectural features that use a significant fraction of the on-chip transistors are applicable across these different domains. For example, current processor designs often devote the largest fraction of on-chip transistors (up to 80%) to caches. Many workloads, however, do not make effective use of large caches; e.g., media processing workloads which often have streaming data access patterns and large working sets.This paper proposes a new reconfigurable cache design. This design enables the cache SRAM arrays to be dynamically divided into multiple partitions that can be used for different processor activities. These activities can benefit applications that would otherwise not use the storage allocated to large conventional caches. Our design involves relatively few modifications to conventional cache design, and analysis using a modification of the CACTI analytical model shows a small impact on cache access time. We evaluate one representative use of reconfigurable caches - instruction reuse for media processing. We find this use gives IPC improvements ranging from 1.04X to 1.20X in simulation across eight media processing benchmarks.

...read moreread less

328 citations

Proceedings Article•10.5555/563998.564007•

Reducing set-associative cache energy via way-prediction and selective direct-mapping

[...]

Michael D. Powell¹, Amit Agarwal¹, T. N. Vijaykumar¹, Babak Falsafi², Kaushik Roy¹ - Show less +1 more•Institutions (2)

Purdue University¹, Carnegie Mellon University²

1 Dec 2001

TL;DR: Two previously-proposed techniques, way-prediction and selective direct-mapping, are applied to reducing L1 cache dynamic energy while maintaining high performance, and caches achieve the energy-delay of sequential access while maintaining the performance of parallel access.

...read moreread less

Abstract: Set-associative caches achieve low miss rates for typical applications but result in significant energy dissipation. Set-associative caches minimize access time by probing all the data ways in parallel with the tag lookup, although the output of only the matching way is used. The energy spent accessing the other ways is wasted Eliminating the wasted energy by performing the data lookup sequentially following the tag lookup substantially increases cache access time, and is unacceptable for high-performance L1 caches. In this paper, we apply two previously-proposed techniques, way-prediction and selective direct-mapping, to reducing L1 cache dynamic energy while maintaining high performance. The techniques predict the matching way and probe only the predicted way and not all the ways, achieving energy savings. While these techniques were originally proposed to improve set-associative cache access times, this is the first paper to apply them to reducing cache energy. We evaluate the effectiveness of these techniques in reducing L1 d-cache, L1 i-cache, and overall processor energy. Using these techniques, our caches achieve the energy-delay of sequential access while maintaining the performance of parallel access. Relative to parallel access L1 i- and d-caches, the techniques achieve overall processor energy-delay reduction of 8%, while perfect way-prediction with no performance degradation achieves 10% reduction. The performance degradation of the techniques is less than 3%, compared to an aggressive,.1-cycle, 4-way, parallel access cache.

...read moreread less

321 citations

...

Expand

Year	Papers
2019	3
2018	2
2017	20
2016	32
2015	34
2014	59

Topic Tools

Papers published on a yearly basis

Papers

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Unified re-map and cache-index table with dual write-counters for wear-leveling of non-volatile flash RAM mass storage

The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Reconfigurable caches and their application to media processing

Reducing set-associative cache energy via way-prediction and selective direct-mapping

Related Topics (5)

Performance Metrics