Proceedings Article10.1145/2555243.2555252
Efficient deterministic multithreading without global barriers
Kai Lu,Xu Zhou,Tom Bergan,Xiaoping Wang +3 more
- 06 Feb 2014
- Vol. 49, Iss: 8, pp 287-300
TL;DR: This paper implemented a DMT system based on an execution model called deterministic lazy release consistency (DLRC), which guarantees that programs execute deterministically even when they contain data races, and evaluated it using 16 parallel applications.
read more
Abstract: Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend on its inputs only, which can totally solve the above problem. However, current DMT implementations suffer from a common inefficiency: they use frequent global barriers to enforce a deterministic ordering on memory accesses. In this paper, we eliminate that inefficiency using an execution model we call deterministic lazy release consistency (DLRC). Our execution model uses the Kendo algorithm to enforce a deterministic ordering on synchronization, and it uses a deterministic version of the lazy release consistency memory model to propagate memory updates across threads. Our approach guarantees that programs execute deterministically even when they contain data races. We implemented a DMT system based on these ideas (RFDet) and evaluated it using 16 parallel applications. Our implementation targets C/C++ programs that use POSIX threads. Results show that RFDet gains nearly 2x speedup compared with DThreads-a start-of-the-art DMT system.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Code-pointer integrity
Volodymyr Kuznetsov,Laszlo Szekeres,Mathias Payer,George Candea,R. C. Sekar,Dawn Song +5 more
- 06 Oct 2014
TL;DR: This chapter describes code-pointer integrity (CPI), a new design point that guarantees the integrity of all code pointers in a program and thereby prevents all control-flow hijack attacks that exploit memory corruption errors, including attacks that bypass control- flow integrity mechanisms, such as control-flows bending.
SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security
Sanjeev Das,Jan Werner,Manos Antonakakis,Michalis Polychronakis,Fabian Monrose +4 more
- 19 May 2019
TL;DR: A year-long effort to study the best practices for obtaining accurate measurement of events using performance counters, understand the challenges and pitfalls of using HPCs in various settings, and explore ways to obtain consistent and accurate measurements across different settings and architectures, and empirically evaluated how failure to accommodate for various subtleties in the use of HPS can undermine the effectiveness of security applications.
187
LASER: Light, Accurate Sharing dEtection and Repair
Liang Luo,Akshitha Sriraman,Brooke Fugate,Shiliang Hu,Gilles Pokam,Chris J. Newburn,Joseph Devietti +6 more
- 12 Mar 2016
TL;DR: The Light, Accurate Sharing dEtection and Repair (LASER) system is presented, which leverages new performance counter capabilities available on Intel's Haswell architecture that identify the source of expensive cache coherence events.
Remix: online detection and repair of cache contention for the JVM
Ariel Eizenberg,Shiliang Hu,Gilles Pokam,Joseph Devietti +3 more
- 02 Jun 2016
TL;DR: Remix is a modified version of the Oracle HotSpot JVM which can detect cache contention bugs and repair false sharing at runtime and incurs no statistically-significant performance overhead on other benchmarks that do not exhibit cache contention, making Remix practical for always-on use.
33
Taming Parallelism in a Multi-Variant Execution Environment
Stijn Volckaert,Bart Coppens,Bjorn De Sutter,Koen De Bosschere,Per Larsen,Michael Franz +5 more
- 23 Apr 2017
TL;DR: An MVEE-specific synchronization scheme is developed that lets us execute a set of multithreaded variants in lockstep without causing benign divergence, which makes MVEEs a viable defense for a far greater range of realistic workloads.
31
References
The SPLASH-2 programs: characterization and methodological considerations
Steven Cameron Woo,Moriyoshi Ohara,Evan Torrie,Jaswinder Pal Singh,Anoop Gupta +4 more
- 01 May 1995
TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.
The PARSEC benchmark suite: characterization and architectural implications
Christian Bienia,Sanjeev Kumar,Jaswinder Pal Singh,Kai Li +3 more
- 25 Oct 2008
TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.
Evaluating MapReduce for Multi-core and Multiprocessor Systems
C. Ranger,R. Raghuraman,A. Penmetsa,Gary Bradski,Christos Kozyrakis +4 more
- 10 Feb 2007
TL;DR: It is established that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code.
The problem with threads
TL;DR: For concurrent programming to become mainstream, threads must be discarded as a programming model, and nondeterminism should be judiciously and carefully introduced where needed, and it should be explicit in programs.
Hoard: a scalable memory allocator for multithreaded applications
Emery D. Berger,Kathryn S. McKinley,Robert D. Blumofe,Paul R. Wilson +3 more
- 12 Nov 2000
TL;DR: Hoard as mentioned in this paper combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case, which is the first allocator to simultaneously solve the above problems.
Related Papers (5)
Tongping Liu,Charlie Curtsinger,Emery D. Berger +2 more
- 23 Oct 2011
Marek Olszewski,Jason Ansel,Saman Amarasinghe +2 more
- 07 Mar 2009
Joseph Devietti,Brandon Lucia,Luis Ceze,Mark Oskin +3 more
- 07 Mar 2009