Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches

doi:10.1109/ICPP.2008.29

Proceedings Article10.1109/ICPP.2008.29

Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches

Lei Jin, +1 more

- 09 Sep 2008

- pp 487-494

10

TL;DR: A dynamic cache management scheme is proposed that determines the home cache slice and cache bin for memory pages without any static program information that adapts to multiprogrammed workloads' behavior well and performs significantly better than both the private caching scheme and the shared caching scheme.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/HPCA.2011.5749731

CloudCache: Expanding and shrinking private caches

Hyunjin Lee, +2 more

- 12 Feb 2011

TL;DR: This work proposes a novel scalable cache management framework called CloudCache that creates dynamically expanding and shrinking L2 caches for working threads with fine-grained hardware monitoring and control and demonstrates that CloudCache significantly improves performance of a wide range of workloads when all or a subset of cores are occupied.

...read moreread less

78

Proceedings Article•10.1109/PACT.2009.14

SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors

Lei Jin, +1 more

- 12 Sep 2009

TL;DR: SOS, the authors' software-oriented distributed shared cache management approach, infers a program’s data affinity hints through a novel machine learning based analysis of its L2 cache access behavior, and achieves an average speedup of 10% and up to 23% over the shared cache scheme.

...read moreread less

28

Proceedings Article•10.1145/1944862.1944889

Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

Mohammad Hammoud, +2 more

- 24 Jan 2011

TL;DR: Cache Equalizer decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences, a novel distributed cache management scheme for large-scale chip multiprocessors (CMPs).

...read moreread less

9

Proceedings Article•10.1109/AHS.2011.5963917

MAESTRO: Orchestrating predictive resource management in future multicore systems

Sangyeun Cho, +1 more

- 06 Jun 2011

TL;DR: A case is made for a novel framework called MAESTRO which predictively manages system resources in shared-memory parallel computing platforms built with advanced multicore processors.

...read moreread less

4

Proceedings Article•10.1109/ICCRD.2011.5763907

Private cache partitioning: A method to reduce the off-chip missrate of concurrently executing applications in Chip-Multiprocessors

Li Hao, +3 more

- 11 Mar 2011

TL;DR: Private Cache Partitioning is presented, a low-overhead, runtime mechanism that partitions all of the private low level caches which are organized as a large shared cache by a distributed directory.

...read moreread less

1

References

•Book

Computer Architecture: A Quantitative Approach

John L. Hennessy, +1 more

- 01 Dec 1989

TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.

...read moreread less

12.6K

The Landscape of Parallel Computing Research: A View from Berkeley

Krste Asanovic, +10 more

- 18 Dec 2006

TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.

...read moreread less

2.4K

Journal Article•10.1109/2.982917

SimpleScalar: an infrastructure for computer system modeling

Todd Austin, +2 more

- 01 Feb 2002

- IEEE Computer

TL;DR: The SimpleScalar tool set provides an infrastructure for simulation and architectural modeling that can model a variety of platforms ranging from simple unpipelined processors to detailed dynamically scheduled microarchitectures with multiple-level memory hierarchies.

...read moreread less

1.8K

Proceedings Article•10.1109/MICRO.2006.49

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Moinuddin K. Qureshi, +1 more

- 09 Dec 2006

TL;DR: In this article, the authors propose a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources.

...read moreread less

1.1K

•Proceedings Article•10.1145/605397.605420

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Changkyu Kim, +2 more

- 01 Oct 2002

TL;DR: This paper proposes physical designs for these Non-Uniform Cache Architectures (NUCAs) and extends these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache.

...read moreread less

831

...

Expand

Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches

Chat with Paper

AI Agents for this Paper

Citations

CloudCache: Expanding and shrinking private caches

SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors

Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

MAESTRO: Orchestrating predictive resource management in future multicore systems

Private cache partitioning: A method to reduce the off-chip missrate of concurrently executing applications in Chip-Multiprocessors

References

Computer Architecture: A Quantitative Approach

The Landscape of Parallel Computing Research: A View from Berkeley

SimpleScalar: an infrastructure for computer system modeling

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Related Papers (5)

A Flexible Two-Layer Buffer Caching Scheme for Shared Storage Cache

A Prediction Based CMP Cache Migration Policy

Capturing dynamic memory reference behavior with adaptive cache topology

SDC: a software defined cache for efficient data indexing

A Phase Behavior Aware Dynamic Cache Partitioning Scheme for CMPs