Quantifying the potential task-based dataflow parallelism in MPI applications

doi:10.1007/978-3-642-23400-2_5

Book Chapter10.1007/978-3-642-23400-2_5

Quantifying the potential task-based dataflow parallelism in MPI applications

Vladimir Subotic, +4 more

- 29 Aug 2011

- pp 39-51

16

TL;DR: This paper introduces a framework that a programmer can use to: 1) estimate how much his application could benefit from dataflow parallelism; and 2) find the best strategy to expose data flow parallelism in his application.

Abstract: Task-based parallel programming languages require the programmer to partition the traditional sequential code into smaller tasks in order to take advantage of the existing dataflow parallelism inherent in the applications. However, obtaining the partitioning that achieves optimal parallelism is not trivial because it depends on many parameters such as the underlying data dependencies and global problem partitioning. In order to help the process of finding a partitioning that achieves high parallelism, this paper introduces a framework that a programmer can use to: 1) estimate how much his application could benefit from dataflow parallelism; and 2) find the best strategy to expose dataflow parallelism in his application. Our framework automatically detects data dependencies among tasks in order to estimate the potential parallelism in the application. Furthermore, based on the framework, we develop an interactive approach to find the optimal partitioning of code. To illustrate this approach, we present a case study of porting High Performance Linpack from MPI to MPI/SMPSs. The presented approach requires only superficial knowledge of the studied code and iteratively leads to the optimal partitioning strategy. Finally, the environment provides visualization of the simulated MPI/SMPSs execution, thus allowing the developer to qualitatively inspect potential parallelization bottlenecks.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/CLUSTER.2017.50

Automating the Application Data Placement in Hybrid Memory Systems

Harald Servat, +5 more

- 01 Sep 2017

TL;DR: The results of the evaluation reveal that the proposal is able to identify the key objects to be promoted into fast on-package memory in order to optimize performance, leading to even surpassing hardware-based solutions.

...read moreread less

48

Book Chapter•10.1007/978-3-662-48096-0_5

Low-Overhead Detection of Memory Access Patterns and Their Time Evolution

Harald Servat, +8 more

- 24 Aug 2015

TL;DR: A performance analysis tool that reports the temporal evolution of the memory access patterns of in-production applications in order to help analysts understand the accesses to the application data structures and provides detailed insight of their memory access behavior is presented.

...read moreread less

10

Book Chapter•10.1007/978-3-319-07518-1_10

Automatic Exploration of Potential Parallelism in Sequential Applications

Vladimir Subotic, +3 more

- 22 Jun 2014

TL;DR: This work designs an environment that, given a sequential code and configuration of the target parallel architecture, iteratively runs Tareador to find an efficient parallelization strategy and proposes an autonomous algorithm based on simple metrics and a cost function that provides the programmer with sufficient information to turn that parallelized strategy into an actual parallel program.

...read moreread less

8

Book Chapter•10.1007/978-3-319-16012-2_4

Tareador: The Unbearable Lightness of Exploring Parallelism

Vladimir Subotic, +5 more

- 01 Jan 2015

TL;DR: This work proposes Tareador, a tool that helps a programmer explore various parallelization strategies and find the one that exposes the highest potential parallelism, and blueprint how it could be used together with the parallel programming model and the parallelization workflow in order to facilitate parallelized applications.

...read moreread less

5

...

Expand

References

•Book

MPI: The Complete Reference

Marc Snir, +4 more

- 01 Jan 1996

TL;DR: MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing.

...read moreread less

2.8K

•Journal Article•10.1006/JPDC.1996.0107

Cilk: An Efficient Multithreaded Runtime System

Robert D. Blumofe, +5 more

- 25 Aug 1996

- Journal of Parallel and Distributed Comp...

TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.

...read moreread less

1.7K

Proceedings Article•10.1145/106972.106991

Limits of instruction-level parallelism

David W. Wall

- 01 Apr 1991

TL;DR: The results of simulations of 18 different test programs under 375 different models of available parallelism analysis are presented, showing how simulations based on instruction traces can model techniques at the limits of feasibility and even beyond.

...read moreread less

740

•Book

Limits of instruction-level parallelism

David W. Wall

- 01 Mar 1995

TL;DR: In this paper, the authors present the results of simulations of 18 different test programs under 375 different models of available parallelism analysis, including branch prediction, register renaming and alias analysis.

...read moreread less

592

•Book