Topic

Cache coherence

About: Cache coherence is a research topic. Over the lifetime, 2910 publications have been published within this topic receiving 76672 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/285930.285997•

Memory consistency and event ordering in scalable shared-memory multiprocessors

[...]

Kourosh Gharachorloo¹, Daniel E. Lenoski¹, James Laudon¹, Phillip B. Gibbons¹, Anoop Gupta¹, John L. Hennessy¹ - Show less +2 more•Institutions (1)

Stanford University¹

1 May 1990

TL;DR: A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced and is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization.

...read moreread less

Abstract: Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.

...read moreread less

1,275 citations

Journal Article•10.1109/2.121510•

The Stanford Dash multiprocessor

[...]

Daniel E. Lenoski¹, James Laudon¹, Kourosh Gharachorloo¹, Wolf-Dietrich Weber¹, Abhinav Gupta¹, John L. Hennessy¹, Mark Horowitz¹, Monica S. Lam¹ - Show less +4 more•Institutions (1)

Stanford University¹

01 Mar 1992-IEEE Computer

TL;DR: The directory architecture for shared memory (Dash) as discussed by the authors allows shared data to be cached, significantly reducing the latency of memory accesses and yielding higher processor utilization and higher overall performance, and a distributed directory-based protocol that provides cache coherence without compromising scalability.

...read moreread less

Abstract: The overall goals and major features of the directory architecture for shared memory (Dash) are presented. The fundamental premise behind the architecture is that it is possible to build a scalable high-performance machine with a single address space and coherent caches. The Dash architecture is scalable in that it achieves linear or near-linear performance growth as the number of processors increases from a few to a few thousand. This performance results from distributing the memory among processing nodes and using a network with scalable bandwidth to connect the nodes. The architecture allows shared data to be cached, significantly reducing the latency of memory accesses and yielding higher processor utilization and higher overall performance. A distributed directory-based protocol that provides cache coherence without compromising scalability is discussed in detail. The Dash prototype machine and the corresponding software support are described. >

...read moreread less

1,060 citations

Proceedings Article•10.1145/605397.605420•

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

[...]

Changkyu Kim¹, Doug Burger¹, Stephen W. Keckler¹•Institutions (1)

University of Texas at Austin¹

1 Oct 2002

TL;DR: This paper proposes physical designs for these Non-Uniform Cache Architectures (NUCAs) and extends these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache.

...read moreread less

Abstract: Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a line's physical location within the cache. Consequently, cache access times will become a continuum of latencies rather than a single discrete latency. This non-uniformity can be exploited to provide faster access to cache lines in the portions of the cache that reside closer to the processor. In this paper, we evaluate a series of cache designs that provides fast hits to multi-megabyte cache memories. We first propose physical designs for these Non-Uniform Cache Architectures (NUCAs). We extend these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchy--while using less silicon area--by 13%, and comes within 13% of an ideal minimal hit latency solution.

...read moreread less

831 citations

Proceedings Article•10.1145/121132.121159•

Implementation and performance of Munin

[...]

John B. Carter¹, John K. Bennett¹, Willy Zwaenepoel¹•Institutions (1)

Rice University¹

1 Sep 1991

TL;DR: This work evaluates the implementation of Munin and describes the execution of two Munin programs that achieve performance within ten percent of message passing implementations of the same programs.

...read moreread less

Abstract: Munin is a distributed shared memory (DSM) system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, shared program variables are annotated with their expected access pattern, and these annotations are then used by the runtime system to choose a consistency protocol best suited to that access pattern. Release consistency allows Munin to mask network latency and reduce the number of messages required to keep memory consistent. Munin's multiprotocol release consistency is implemented in software using a delayed update queue that buffers and merges pending outgoing writes. A sixteen-processor prototype of Munin is currently operational. We evaluate its implementation and describe the execution of two Munin programs that achieve performance within ten percent of message passing implementations of the same programs. Munin achieves this level of performance with only minor annotations to the shared memory programs.

...read moreread less

772 citations

Journal Article•10.1007/BF01762111•

Competitive snoopy caching

[...]

Anna R. Karlin¹, Mark S. Manasse, Larry Rudolph², Daniel D. Sleator³•Institutions (3)

Stanford University¹, Hebrew University of Jerusalem², Carnegie Mellon University³

01 Nov 1988-Algorithmica

TL;DR: This work presents new on-line algorithms to be used by the caches of snoopy cache multiprocessor systems to decide which blocks to retain and which to drop in order to minimize communication over the bus.

...read moreread less

Abstract: In a snoopy cache multiprocessor system, each processor has a cache in which it stores blocks of data. Each cache is connected to a bus used to communicate with the other caches and with main memory. Each cache monitors the activity on the bus and in its own processor and decides which blocks of data to keep and which to discard. For several of the proposed architectures for snoopy caching systems, we present new on-line algorithms to be used by the caches to decide which blocks to retain and which to drop in order to minimize communication over the bus. We prove that, for any sequence of operations, our algorithms' communication costs are within a constant factor of the minimum required for that sequence; for some of our algorithms we prove that no on-line algorithm has this property with a smaller constant.

...read moreread less

733 citations

...

Expand

Performance Metrics

2,998

Papers

31,187

Citations

No. of papers in the topic in previous years
Year	Papers
2025	8
2024	4
2023	26
2022	40
2021	46
2020	54

Cache coherence

Topic Tools

Papers published on a yearly basis

Papers

Memory consistency and event ordering in scalable shared-memory multiprocessors

The Stanford Dash multiprocessor

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Implementation and performance of Munin

Competitive snoopy caching

Related Topics (5)

Performance Metrics