About: Advanced Synchronization Facility is a research topic. Over the lifetime, 7 publications have been published within this topic receiving 442 citations.
TL;DR: Measurements on a wide range of benchmarks indicate that the overheads traditionally associated with software transactional memories can be significantly reduced with the help of ASF.
Abstract: AMD's Advanced Synchronization Facility (ASF) is an x86 instruction set extension proposal intended to simplify and speed up the synchronization of concurrent programs. In this paper, we report our experiences using ASF for implementing transactional memory. We have extended a C/C++ compiler to support language-level transactions and generate code that takes advantage of ASF. We use a software fallback mechanism for transactions that cannot be committed within ASF (e.g., because of hardware capacity limitations). Our evaluation uses a cycle-accurate x86 simulator that we have extended with ASF support. Building a complete ASF-based software stack allows us to evaluate the performance gains that a user-level program can obtain from ASF. Our measurements on a wide range of benchmarks indicate that the overheads traditionally associated with software transactional memories can be significantly reduced with the help of ASF.
TL;DR: An out-of-order hardware design to implement ASF on a future AMD processor is developed and the experimental results show that the combined use of the L1 cache and the LS unit is very helpful for the performance robustness of ASF-based lock free data structures, and that the selective use of speculative accesses enables transactional programs to scale with limited ASF hardware resources.
Abstract: Advanced Synchronization Facility (ASF) is an AMD64 hardware extension for lock-free data structures and transactional memory. It provides a speculative region that atomically executes speculative accesses in the region. Five new instructions are added to demarcate the region, use speculative accesses selectively, and control the speculative hardware context. Programmers can use speculative regions to build flexible multi-word atomic primitives with no additional software support by relying on the minimum guarantee of available ASF hardware resources for lock-free programming. Transactional programs with high-level TM language constructs can either be compiled directly to the ASF code or be linked to software TM systems that use ASF to accelerate transactional execution. In this paper we develop an out-of-order hardware design to implement ASF on a future AMD processor and evaluate it with an in-house simulator. The experimental results show that the combined use of the L1 cache and the LS unit is very helpful for the performance robustness of ASF-based lock free data structures, and that the selective use of speculative accesses enables transactional programs to scale with limited ASF hardware resources.
TL;DR: Several new hybrid TM algorithms are presented that can execute HTM and STM transactions concurrently and can thus provide good performance over a large spectrum of workloads and are evaluated based on AMD's Advanced Synchronization Facility.
Abstract: Transactional memory (TM) is a speculative shared-memory synchronization mechanism used to speed up concurrent programs. Most current TM implementations are software-based (STM) and incur noticeable overheads for each transactional memory access. Hardware TM proposals (HTM) address this issue but typically suffer from other restrictions such as limits on the number of data locations that can be accessed in a transaction.In this paper, we present several new hybrid TM algorithms that can execute HTM and STM transactions concurrently and can thus provide good performance over a large spectrum of workloads. The algorithms exploit the ability of some HTMs to have both speculative and nonspeculative (nontransactional) memory accesses within a transaction to decrease the transactions' runtime overhead, abort rates, and hardware capacity requirements. We evaluate implementations of these algorithms based on AMD's Advanced Synchronization Facility, an x86 instruction set extension proposal that has been shown to provide a sound basis for HTM.
TL;DR: An initial performance simulation and usability study of ASF’s application to a lock-free data structure (a singly linked list) and to accelerating a state-of-the-art STM system, TinySTM, indicate that ASF can significantly increase the throughput and scaling behavior of these workloads.
Abstract: In this paper, we report on a new CPU-architecture extension proposal, named Advanced Synchronization Facility (ASF), which is geared toward accelerating and easing lock-free programming and software transactional memory (STM). We present an initial performance simulation and usability study of ASF’s application to a lock-free data structure (a singly linked list) and to accelerating a state-of-the-art STM system, TinySTM. Our results indicate that ASF can significantly increase the throughput and scaling behavior of these workloads: Single-thread performance increased by up to 15 %, and the factor of scaling to eight CPUs increased by up to 20 %.
TL;DR: This paper exploits support for immediate non-transactional stores in the AMD Advanced Synchronization Facility to build a mechanism for communication among transactions, and explores which forms of nesting are possible, and identifies constraints on nesting that are a consequence of how BEHTM is designed.
Abstract: The guiding design principle behind best-effort hardware transactional memory (BEHTM) is simplicity of implementation and verification. Only minimal modifications to the base processor architecture are allowed, thereby reducing the burden of verification and long-term support. In exchange, the hardware can support only relatively simple multiword atomic operations, and must fall back to a software run-time for any operation that exceeds the abilities of the hardware.This paper demonstrates that BEHTM simplicity does not prohibit advanced and complex transactional behaviors. We exploit support for immediate non-transactional stores in the AMD Advanced Synchronization Facility to build a mechanism for communication among transactions. While our system allows arbitrary communication patterns, we focus on a design point where each transaction communicates with a system-wide manager thread. The API for the manager thread allows BEHTM transactions to delegate unsafe operations (such as system calls) to helper threads, and also enables the creation of nested parallel transactions. This paper also explores which forms of nesting are possible, and identifies constraints on nesting that are a consequence of how BEHTM is designed.