MPICH

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1016/0167-8191(96)00024-5•

A high-performance, portable implementation of the MPI message passing interface standard

[...]

William Gropp¹, Ewing Lusk¹, Nathan E. Doss², Anthony Skjellum²•Institutions (2)

Argonne National Laboratory¹, Mississippi State University²

1 Sep 1996

TL;DR: The MPI Message Passing Interface (MPI) as mentioned in this paper is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.

...read moreread less

Abstract: MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum.

...read moreread less

2,420 citations

Journal Article•10.1177/1094342005051521•

Optimization of Collective Communication Operations in MPICH

[...]

Rajeev Thakur¹, Rolf Rabenseifner², William Gropp¹•Institutions (2)

Argonne National Laboratory¹, University of Stuttgart²

1 Feb 2005

TL;DR: The work on improving the performance of collective communication operations in MPICH is described, with results indicating that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.

...read moreread less

Abstract: We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI Message Passing Interface collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.

...read moreread less

1,074 citations

Posted Content•

MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

[...]

Nicholas T. Karonis¹, Brian Toonen², Ian Foster³•Institutions (3)

Northern Illinois University¹, Argonne National Laboratory², University of Chicago³

25 Jun 2002-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: MPICH-G2 as discussed by the authors is a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer.

...read moreread less

Abstract: Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues, we have developed MPICH-G2, a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer. This library extends the Argonne MPICH implementation of MPI to use services provided by the Globus Toolkit for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance-critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, both network topology and network quality-of-service mechanisms. We describe the MPICH-G2 design and implementation, present performance results, and review application experiences, including record-setting distributed simulations.

...read moreread less

638 citations

Proceedings Article•10.5555/762761.762815•

MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes

[...]

George Bosilca¹, Aurelien Bouteiller¹, Franck Cappello¹, Samir Djilali¹, Gilles Fedak¹, Cécile Germain¹, Thomas Herault¹, Pierre Lemarinier¹, Oleg Lodygensky¹, Frédéric Magniette¹, Vincent Neri¹, Anton Selikhov¹ - Show less +8 more•Institutions (1)

University of Paris¹

16 Nov 2002

TL;DR: This work presents MPICH-V, an automatic Volatility tolerant MPI environment based on uncoordinated checkpoint/roll-back and distributed message logging, and presents a detailed performance evaluation of every component and its global performance for non-trivial parallel applications.

...read moreread less

Abstract: Global Computing platforms, large scale clusters and future TeraGRID systems gather thousands of nodes for computing parallel scientific applications. At this scale, node failures or disconnections are frequent events. This Volatility reduces the MTBF of the whole system in the range of hours or minutes. We present MPICH-V, an automatic Volatility tolerant MPI environment based on uncoordinated checkpoint/roll-back and distributed message logging. MPICH-V architecture relies on Channel Memories, Checkpoint servers and theoretically proven protocols to execute existing or new, SPMD and Master-Worker MPI applications on volatile nodes. To evaluate its capabilities, we run MPICH-V within a framework for which the number of nodes, Channels Memories and Checkpoint Servers can be completely configured as well as the node Volatility. We present a detailed performance evaluation of every component of MPICH-V and its global performance for non-trivial parallel applications. Experimental results demonstrate good scalability and high tolerance to node volatility.

...read moreread less

338 citations

Journal Article•10.1002/CPE.1206•

Collective communication: theory, practice, and experience

[...]

Ernie Chan¹, Marcel Heimlich¹, Avi Purkayastha¹, Robert A. van de Geijn¹•Institutions (1)

University of Texas at Austin¹

10 Sep 2007-Concurrency and Computation: Practice and Experience

TL;DR: This paper discusses the design and high‐performance implementation of collective communications operations on distributed‐memory computer architectures and develops implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI.

...read moreread less

Abstract: SUMMARY We discuss the design and high-performance implementation of collective communications operations on distributed-memory computer architectures. Using a combination of known techniques (many of which were first proposed in the 1980s and early 1990s) along with careful exploitation of communication modes supported by MPI, we have developed implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI such as MPICH. Performance results from a large Intel Xeon/Pentium 4 (R) processor cluster are included.

...read moreread less

305 citations

...

Expand

Year	Papers
2021	11
2020	7
2019	6
2018	9
2017	3
2016	10

Topic Tools

Papers published on a yearly basis

Papers

A high-performance, portable implementation of the MPI message passing interface standard

Optimization of Collective Communication Operations in MPICH

MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes

Collective communication: theory, practice, and experience

Related Topics (5)

Performance Metrics