Journal Article10.1016/0167-8191(95)00052-6
Techniques for compiling programs on distributed memory multicomputers
PeiZong Lee
- 01 Dec 1995
- Vol. 21, Iss: 12, pp 1895-1923
17
TL;DR: This paper presents techniques for compiling programs on distributed memory parallel computers and derives a dynamic programming algorithm for data distribution, and shows how to improve the communication time by pipelining data and illustrate how to use data-dependence information for pipelined data.
read more
Abstract: It is widely accepted that distributed memory parallel computers will play an important role in solving computation-intensive problems. However, the design of an algorithm in a distributed memory system is time-consuming and error-prone, because a programmer is forced to manage both parallelism and communication. In this paper, we present techniques for compiling programs on distributed memory parallel computers. We will study the storage management of data arrays and the execution schedule arrangement of Do-loop programs on distributed memory parallel computers. First, we introduce formulas for representing data distribution of specific data arrays across processors. Then, we define communication cost for some message-passing communication operations. Next, we derive a dynamic programming algorithm for data distribution. After that, we show how to improve the communication time by pipelining data, and illustrate how to use data-dependence information for pipelining data. Jacobi's iterative algorithm and the Gauss elimination algorithm for linear systems are used to illustrate our method. We also present experimental results on a 32-node nCUBE-2 computer.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Proceedings Article
Mapping nested loop algorithms into multi-dimensional systolic arrays
PeiZong Lee,Zvi M. Kedem +1 more
- 01 Jan 1989
TL;DR: In this paper, the authors considered transforming depth p-nested for loop algorithms into q-dimensional systolic VLSI arrays where 1 > 0, where 1 < 0.
65
A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations
Jiayuan Meng,Kevin Skadron +1 more
TL;DR: A performance model is established using NVIDIA’s Tesla architecture as a case study and a framework is proposed that uses the performance model to automatically select the ghost zone size that performs best and generate appropriate code to automate this process on shared memory systems.
Automatic data and computation decomposition on distributed memory parallel computers
PeiZong Lee,Zvi M. Kedem +1 more
TL;DR: In this paper, the authors propose a method for handling computation and data synergistically to minimize the overall execution time on distributed memory parallel computers (DMPCs), based on a number of novel techniques, also presented in this article.
Efficient algorithms for data distribution on distributed memory parallel computers
TL;DR: Prune the searching space and derive efficient dynamic programming algorithms for determining effective data distribution schema to execute a sequence of Do-loops with a general structure if the communication cost due to performing this sequence of DOs is larger than a threshold value.
30
Redundant computation partition on distributed-memory systems
Li Chen,Zhaoqing Zhang,Xiaobing Feng +2 more
- 23 Oct 2002
TL;DR: The main idea is to select computation redundancy, represented by a redundant vector, properly for each partitioned loop nest in a parallel loop sequence, so as to acquire a larger parallel region.
11
References
A data locality optimizing algorithm
Michael Wolf,Monica S. Lam +1 more
- 01 May 1991
TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.
•Book
Supercompilers for parallel and vector computers
Hans P. Zima,Barbara Chapman +1 more
- 01 Jan 1990
TL;DR: This paper presents a meta-modelling architecture for supercompilers that automates the very labor-intensive and therefore time-heavy and expensive process of learning and optimization of supercomputing systems.
778
A loop transformation theory and an algorithm to maximize parallelism
Michael Wolf,Monica S. Lam +1 more
TL;DR: The loop transformation theory is applied to the problem of maximizing the degree of coarse- or fine-grain parallelism in a loop nest and it is shown that the maximum degree of parallelism can be achieved by transforming the loops into a nest of coarsest fullypermutable loop nests and wavefronting the fully permutable nests.
727
Global optimizations for parallelism and locality on scalable parallel machines
Jennifer M. Anderson,Monica S. Lam +1 more
- 01 Jun 1993
TL;DR: A compiler algorithm that automatically finds computation and data decompositions that optimize both parallelism and locality that is designed for use with both distributed and shared address space machines.
399
SUPERB: A tool for semi-automatic MIMD/SIMD parallelization☆
Hans P. Zima,Heinz-J Bast,Michael Gerndt +2 more
- 01 Jan 1988
TL;DR: The design of an interactive system for the semi-automatic transformation of FORTRAN 77 programs into parallel programs for the SUPERNUM machine is described, characterized by a powerful analysis component, a catalog of MIMD and SIMD parallelization transformations, and a flexible dialog facility.
384