Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures

doi:10.1145/1250662.1250668

Proceedings Article10.1145/1250662.1250668

Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures

Weirong Zhu, +3 more

- 09 Jun 2007

- Vol. 35, Iss: 2, pp 35-45

117

TL;DR: The Synchronization State Buffer is proposed, a scalable architectural design for fine-grain synchronization that efficiently performs synchronizations between concurrent threads that records and manages the states of frequently synchronized data using modest hardware support.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/3145812

Approximate Communication: Techniques for Reducing Communication Bottlenecks in Large-Scale Parallel Systems

Filipe Betzel, +5 more

- 10 Jan 2018

- ACM Computing Surveys

TL;DR: Compression and approximate value prediction show great promise for reducing the communication bottleneck in bandwidth-constrained applications, while relaxed synchronization is found to provide large speedups for select error-tolerant applications, but suffers from limited general applicability and unreliable output degradation guarantees.

...read moreread less

122

•Proceedings Article•10.1109/HPCA51647.2021.00031

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

Christina Giannoula, +9 more

- 01 Feb 2021

TL;DR: SynCron as discussed by the authors is an end-to-end synchronization solution for near-data-processing (NDP) systems that adds low-cost hardware support near memory for synchronization acceleration, and avoids the need for hardware cache coherence support.

...read moreread less

92

•Proceedings Article•10.1145/2751205.2751232

Fine-Grained Synchronizations and Dataflow Programming on GPUs

Ang Li, +3 more

- 08 Jun 2015

TL;DR: This paper proposes a novel approach for fine-grained inter-thread synchronizations on the shared memory of modern GPUs, and demonstrates its performance, and applies it to Needleman-Wunsch - a 2D wavefront application involving massive cross-loop data dependencies.

...read moreread less

59

Journal Article•10.1007/S11390-009-9295-3

Godson-T: An Efficient Many-Core Architecture for Parallel Program Executions

Dongrui Fan, +11 more

- 06 Nov 2009

- Journal of Computer Science and Technolo...

TL;DR: This work proposes a many-core architecture, Godson-T, which features a region-based cache coherence protocol, asynchronous data transfer agents and hardware-supported synchronization mechanisms, to provide full potential for the high efficiency of the on-chip resource utilization.

...read moreread less

46

Book Chapter•10.1007/978-3-642-11515-8_4

Low-Overhead, high-speed multi-core barrier synchronization

John Sartori, +1 more

- 25 Jan 2010

TL;DR: Three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs are presented and evaluated, providing low latency comparable to that of dedicated hardware networks at a fraction of the cost.

...read moreread less

44

...

Expand

References

Proceedings Article•10.1145/165123.165164

Transactional memory: architectural support for lock-free data structures

Maurice Herlihy, +1 more

- 01 May 1993

TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.

...read moreread less

2.5K

•Journal Article•10.1145/103727.103729

Algorithms for scalable synchronization on shared-memory multiprocessors

John Mellor-Crummey, +1 more

- 01 Feb 1991

- ACM Transactions on Computer Systems

TL;DR: The principal conclusion is that contention due to synchronization need not be a problemin large-scale shared-memory multiprocessors, and the existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides protection against so-called “dance hall” architectures.

...read moreread less

1.2K

Proceedings Article•10.1145/264107.264206

The SGI Origin: a ccNUMA highly scalable server

James Laudon, +1 more

- 01 May 1997

TL;DR: The motivation for building the Origin 2000 is discussed and the architecture and implementation of the multiprocessor is described, and performance results are presented for the NAS Parallel Benchmarks V2.2 and the SPLASH2 applications.

...read moreread less

923

•Proceedings Article•10.1109/HPCA.2006.1598134

LogTM: log-based transactional memory

Kevin E. Moore, +4 more

- 27 Feb 2006

TL;DR: This paper presents a new implementation of transactional memory, log-based transactionalMemory (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place.

...read moreread less

785

Journal Article•10.1109/TPDS.2004.8

Hazard pointers: safe memory reclamation for lock-free objects

Maged M. Michael

- 01 Jun 2004

- IEEE Transactions on Parallel and Distri...

TL;DR: Hazard pointers is presented, a memory management methodology that allows memory reclamation for arbitrary reuse and offers a lock-free solution for the ABA problem using only practical single-word instructions and guaranteeing continuous progress and availability, even in the presence of thread failures and arbitrary delays.

...read moreread less

633