DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations

doi:10.1145/2694344.2694356

Proceedings Article10.1145/2694344.2694356

DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations

Hyojin Sung, +1 more

- 14 Mar 2015

- Vol. 50, Iss: 4, pp 545-559

32

TL;DR: DeNovoSync is proposed, a technique to support arbitrary synchronization in DeNovo using a novel combination of registration of all synchronization reads with a judicious hardware backoff to limit unnecessary registrations, and shows comparable or up to 22% lower execution time and up to 58% lower network traffic.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/2749469.2750374

Stash: have your scratchpad and cache it too

Rakesh Komuravelli, +7 more

- 13 Jun 2015

TL;DR: It is shown that the stash provides better performance and energy than a cache and a scratchpad, while enabling new use cases for heterogeneous systems, and proposes an efficient heterogeneous memory system where specialized memory components are tightly coupled in a unified and coherent address space.

...read moreread less

77

Proceedings Article•10.1145/2830772.2830821

Efficient GPU synchronization without scopes: saying no to complex consistency models

Matthew D. Sinclair, +2 more

- 05 Dec 2015

TL;DR: This work applies the DeNovo coherence protocol to GPUs and compares it with conventional GPU coherence under the DRF and HRF consistency models, and shows that the complexity of the HRF model is neither necessary nor sufficient to obtain high performance.

...read moreread less

71

Proceedings Article•10.1109/ISCA.2018.00031

Spandex: a flexible interface for efficient heterogeneous coherence

Johnathan Alsop, +2 more

- 02 Jun 2018

TL;DR: Spandex is introduced, an improved coherence interface based on the simple and scalable DeNovo coherence protocol that directly interfaces devices with diverse coherence properties and memory demands, enabling each device to communicate in a manner appropriate for its specific access properties.

...read moreread less

50

•Proceedings Article•10.5555/3195638.3195669

Lazy release consistency for GPUs

Johnathan Alsop, +3 more

- 15 Oct 2016

TL;DR: This work proposes to adapt lazy release consistency - previously only proposed for homogeneous CPU systems - to a heterogeneous system, and uses a DeNovo-like mechanism to track ownership of synchronization variables, lazily performing coherence actions only when a synchronization variable changes locations.

...read moreread less

43

Proceedings Article•10.1145/3079856.3080206

Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems

Matthew D. Sinclair, +2 more

- 24 Jun 2017

TL;DR: A new model is introduced, Data-Race-Free-Relaxed (DRFrlx), that extends DRF0 to provide SC-centric semantics for the common use cases of relaxed atomics, and is evaluated in CPU-GPU systems for these use cases.

...read moreread less

43

...

Expand

References

Proceedings Article•10.1145/223982.223990

The SPLASH-2 programs: characterization and methodological considerations

Steven Cameron Woo, +4 more

- 01 May 1995

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.

...read moreread less

4.1K

Journal Article•10.1109/2.982916

Simics: A full system simulation platform

Peter S. Magnusson, +8 more

- 01 Feb 2002

- IEEE Computer

TL;DR: Simics is a platform for full system simulation that can run actual firmware and completely unmodified kernel and driver code, and it provides both functional accuracy for running commercial workloads and sufficient timing accuracy to interface to detailed hardware models.

...read moreread less

2.3K

•Journal Article•10.1145/1105734.1105747

Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

Milo M. K. Martin, +8 more

- 01 Nov 2005

- ACM Sigarch Computer Architecture News

TL;DR: The Wisconsin Multifacet Project has created a simulation toolset to characterize and evaluate the performance of multiprocessor hardware systems commonly used as database and web servers as mentioned in this paper, which includes a set of timing simulator modules for modeling the timing of the memory system and microprocessors.

...read moreread less

1.6K

Multifacets General Execution-Driven Multiprocessor Simulator (GEMS) Toolset

M. M. Martin

- 01 Jan 2005

TL;DR: The Wisconsin Multifacet Project has created a simulation toolset to characterize and evaluate the performance of multiprocessor hardware systems commonly used as database and web servers and has released a set of timing simulator modules for modeling the timing of the memory system and microprocessors.

...read moreread less

1.4K

Journal Article•10.1109/2.546611

Shared memory consistency models: a tutorial

Sarita V. Adve, +1 more

- 01 Dec 1996

- IEEE Computer

TL;DR: This work describes an alternative, programmer-centric view of relaxed consistency models that describes them in terms of program behavior, not system optimizations, and most of these models emphasize the system optimizations they support.

...read moreread less

1.2K