Book Chapter10.1007/978-3-642-23400-2_5
Quantifying the potential task-based dataflow parallelism in MPI applications
Vladimir Subotic,Roger Ferrer,Jose Carlos Sancho,Jesús Labarta,Mateo Valero +4 more
- 29 Aug 2011
- pp 39-51
TL;DR: This paper introduces a framework that a programmer can use to: 1) estimate how much his application could benefit from dataflow parallelism; and 2) find the best strategy to expose data flow parallelism in his application.
read more
Abstract: Task-based parallel programming languages require the programmer to partition the traditional sequential code into smaller tasks in order to take advantage of the existing dataflow parallelism inherent in the applications. However, obtaining the partitioning that achieves optimal parallelism is not trivial because it depends on many parameters such as the underlying data dependencies and global problem partitioning. In order to help the process of finding a partitioning that achieves high parallelism, this paper introduces a framework that a programmer can use to: 1) estimate how much his application could benefit from dataflow parallelism; and 2) find the best strategy to expose dataflow parallelism in his application. Our framework automatically detects data dependencies among tasks in order to estimate the potential parallelism in the application. Furthermore, based on the framework, we develop an interactive approach to find the optimal partitioning of code. To illustrate this approach, we present a case study of porting High Performance Linpack from MPI to MPI/SMPSs. The presented approach requires only superficial knowledge of the studied code and iteratively leads to the optimal partitioning strategy. Finally, the environment provides visualization of the simulated MPI/SMPSs execution, thus allowing the developer to qualitatively inspect potential parallelization bottlenecks.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Automating the Application Data Placement in Hybrid Memory Systems
Harald Servat,Antonio J. Peña,Germán Llort,Estanislao Mercadal,Hans-Christian Hoppe,Jesús Labarta +5 more
- 01 Sep 2017
TL;DR: The results of the evaluation reveal that the proposal is able to identify the key objects to be promoted into fast on-package memory in order to optimize performance, leading to even surpassing hardware-based solutions.
Programmability and portability for exascale: Top down programming methodology and tools with StarSs
Vladimir Subotic,Steffen Brinkmann,Vladimir Marjanovic,Rosa M. Badia,Rosa M. Badia,Rosa M. Badia,José Gracia,Christoph Niethammer,Eduard Ayguadé,Eduard Ayguadé,Jesús Labarta,Jesús Labarta,Mateo Valero,Mateo Valero +13 more
TL;DR: This paper focuses on the methodology and tools that complements the programming model forming a consistent development environment with the objective of simplifying the live of application developers.
16
Low-Overhead Detection of Memory Access Patterns and Their Time Evolution
Harald Servat,Harald Servat,Germán Llort,Germán Llort,Juan Gonzalez,Judit Gimenez,Judit Gimenez,Jesús Labarta,Jesús Labarta +8 more
- 24 Aug 2015
TL;DR: A performance analysis tool that reports the temporal evolution of the memory access patterns of in-production applications in order to help analysts understand the accesses to the application data structures and provides detailed insight of their memory access behavior is presented.
Automatic Exploration of Potential Parallelism in Sequential Applications
Vladimir Subotic,Eduard Ayguadé,Jesús Labarta,Mateo Valero +3 more
- 22 Jun 2014
TL;DR: This work designs an environment that, given a sequential code and configuration of the target parallel architecture, iteratively runs Tareador to find an efficient parallelization strategy and proposes an autonomous algorithm based on simple metrics and a cost function that provides the programmer with sufficient information to turn that parallelized strategy into an actual parallel program.
8
Tareador: The Unbearable Lightness of Exploring Parallelism
Vladimir Subotic,Arturo Campos,Alejandro Velasco,Eduard Ayguadé,Jesús Labarta,Mateo Valero +5 more
- 01 Jan 2015
TL;DR: This work proposes Tareador, a tool that helps a programmer explore various parallelization strategies and find the one that exposes the highest potential parallelism, and blueprint how it could be used together with the parallel programming model and the parallelization workflow in order to facilitate parallelized applications.
5
References
•Book
MPI: The Complete Reference
Marc Snir,Steve W. Otto,David W. Walker,Jack Dongarra,Steven Huss-Lederman +4 more
- 01 Jan 1996
TL;DR: MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing.
2.8K
Cilk: An Efficient Multithreaded Runtime System
Robert D. Blumofe,Christopher F. Joerg,Bradley C. Kuszmaul,Charles E. Leiserson,Keith H. Randall,Yuli Zhou +5 more
TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.
1.7K
Limits of instruction-level parallelism
David W. Wall
- 01 Apr 1991
TL;DR: The results of simulations of 18 different test programs under 375 different models of available parallelism analysis are presented, showing how simulations based on instruction traces can model techniques at the limits of feasibility and even beyond.
•Book
Limits of instruction-level parallelism
David W. Wall
- 01 Mar 1995
TL;DR: In this paper, the authors present the results of simulations of 18 different test programs under 375 different models of available parallelism analysis, including branch prediction, register renaming and alias analysis.
592
•Book
Euro-Par 2010 - Parallel Processing
Pasqua D'Ambra,Mario Rosario Guarracino,Domenico Talia +2 more
- 01 Jan 2011
74
Related Papers (5)
Kiruthika Selvamani,Tarek M. Taha +1 more
- 18 Mar 2005
Sean Rul,Hans Vandierendonck,Koen De Bosschere +2 more
- 01 Jan 2006
Frederik M. Madsen,Andrzej Filinski +1 more
- 23 Sep 2013