Journal Article10.1145/2780584
Remote Memory Access Programming in MPI-3
Torsten Hoefler,James Dinan,Rajeev Thakur,Brian W. Barrett,Pavan Balaji,William Gropp,Keith D. Underwood +6 more
- 29 Jun 2015
- Vol. 2, Iss: 2, pp 9
TL;DR: The new RMA interface is presented and formal axiomatic models for data consistency and access semantics are specified to help users reason about details of the semantics that are hard to extract from the English prose in the standard.
read more
Abstract: The Message Passing Interface (MPI) 3.0 standard, introduced in September 2012, includes a significant update to the one-sided communication interface, also known as remote memory access (RMA). In particular, the interface has been extended to better support popular one-sided and global-address-space parallel programming models to provide better access to hardware performance features and enable new data-access modes. We present the new RMA interface and specify formal axiomatic models for data consistency and access semantics. Such models can help users reason about details of the semantics that are hard to extract from the English prose in the standard. It also fosters the development of tools and compilers, enabling them to automatically analyze, optimize, and debug RMA programs.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
FaRM: fast remote memory
Aleksandar Dragojevic,Dushyanth Narayanan,Orion Hodson,Miguel Castro +3 more
- 02 Apr 2014
TL;DR: The design and implementation of FaRM is described, a new main memory distributed computing platform that exploits RDMA to improve both latency and throughput by an order of magnitude relative to state of the art main memory systems that use TCP/IP.
•Posted Content
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries.
Maciej Besta,Emanuel Peter,Robert Gerstenberger,Marc Fischer,Michal Podstawski,Claude Barthels,Gustavo Alonso,Torsten Hoefler +7 more
TL;DR: This work presents the first survey and taxonomy of graph database systems, identifying and analyzing fundamental categories of these systems, and outlines graph database queries and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms).
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication
Grzegorz Kwasniewski,Marko Kabić,Maciej Besta,Joost VandeVondele,Raffaele Solcà,Torsten Hoefler +5 more
- 17 Nov 2019
TL;DR: COSMA as discussed by the authors is a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes, and uses the red-blue pebble game to model MMM dependencies and derive a constructive and tight sequential and parallel I/O lower bound proofs.
88
•Posted Content
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication
Grzegorz Kwasniewski,Marko Kabić,Maciej Besta,Joost VandeVondele,Raffaele Solcà,Torsten Hoefler +5 more
TL;DR: COSMA is a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes, and outperforms the established ScaLAPACK, CARMA, and CTF algorithms in all scenarios.
64
•Posted Content
Fault Tolerance for Remote Memory Access Programming Models
Maciej Besta,Torsten Hoefler +1 more
TL;DR: In this article, a model for reasoning about fault tolerance for RMA, addressing both flat and hierarchical hardware, is proposed and several highly-scalable mechanisms that provide efficient low-overhead in-memory checkpointing, transparent logging of remote memory accesses, and a scheme for transparent recovery of failed processes.
39
References
X10: an object-oriented approach to non-uniform cluster computing
Philippe Charles,Christian Grothoff,Vijay Saraswat,Christopher Michael Donawa,Allan H. Kielstra,Kemal Ebcioglu,Christoph von Praun,Vivek Sarkar +7 more
- 12 Oct 2005
TL;DR: A modern object-oriented programming language, X10, is designed for high performance, high productivity programming of NUCC systems and an overview of the X10 programming model and language, experience with the reference implementation, and results from some initial productivity comparisons between the X 10 and Java™ languages are presented.
Algorithms for scalable synchronization on shared-memory multiprocessors
TL;DR: The principal conclusion is that contention due to synchronization need not be a problemin large-scale shared-memory multiprocessors, and the existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides protection against so-called “dance hall” architectures.
Shared memory consistency models: a tutorial
TL;DR: This work describes an alternative, programmer-centric view of relaxed consistency models that describes them in terms of program behavior, not system optimizations, and most of these models emphasize the system optimizations they support.
Parallel Programmability and the Chapel Language
Bradford L. Chamberlain,David Callahan,Hans P. Zima +2 more
- 01 Aug 2007
TL;DR: A candidate list of desirable qualities for a parallel programming language is offered, and how these qualities are addressed in the design of the Chapel language is described, providing an overview of Chapel's features and how they help address parallel productivity.
Co-array Fortran for parallel programming
Robert W. Numrich,John Reid +1 more
TL;DR: The extension of Co-Array Fortran is introduced; examples to illustrate how clear, powerful, and flexible it can be; and a technical definition is provided.