Journal Article10.1006/JPDC.1996.0075
Tiling Nested Loops into Maximal Rectangular Blocks
17
TL;DR: The proposed method aimed at aggregating independent computations of a loop nest into rectangular blocks and maximizing the block sizes for maximizing parallelism is formulated as systematic procedures which can easily be implemented in a parallelizing compiler.
read more
About: This article is published in Journal of Parallel and Distributed Computing. The article was published on 15 Jun 1996. The article focuses on the topics: Nested loop join & Block (programming).
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Journal Article
Determining the Idle Time of a Tiling: New Results
TL;DR: In this paper, the authors extend the results of Hogsted, Carter, and Ferrante to all possible distributions of the tiles to processors and provide an accurate solution for all values of the rise parameter that relates the shape of the iteration space to that of tiles.
30
Tiling with limited resources
Pierre-Yves Calland,Jack Dongarra,Yves Robert +2 more
- 14 Jul 1997
TL;DR: This work derives the optimal mapping and scheduling of tiles to physical processors under some reasonable assumptions, under the context of limited computational resources, and assuming communication-computation overlap.
Automatic partitioning of parallel loops with parallelepiped-shaped tiles
F. Rastello,Yves Robert +1 more
TL;DR: An efficient algorithm to implement loop partitioning is introduced and an efficient heuristic to determine the optimal tile shape is designed, showing its usefulness using both examples of Agarwal et al. and a large collection of randomly generated data.
16
Generating efficient tiled code for distributed memory machines
Peiyi Tang,Jingling Xue +1 more
- 01 Oct 2000
TL;DR: A suite of compiler techniques for generating efficient SPMD programs to execute rectangularly tiled iteration spaces on distributed memory machines and two memory optimisations are given to reduce the amount of memory usage for skewed iteration spaces and expanded arrays, respectively.
14
Tiling on systems with communication/computation overlap
TL;DR: This work derives the optimal mapping and scheduling of tiles to physical processors under some reasonable assumptions, under the context of limited computational resources and assuming communication‐computation overlap.
References
A data locality optimizing algorithm
Michael Wolf,Monica S. Lam +1 more
- 01 May 1991
TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.
•Book
Supercompilers for parallel and vector computers
Hans P. Zima,Barbara Chapman +1 more
- 01 Jan 1990
TL;DR: This paper presents a meta-modelling architecture for supercompilers that automates the very labor-intensive and therefore time-heavy and expensive process of learning and optimization of supercomputing systems.
778
A loop transformation theory and an algorithm to maximize parallelism
Michael Wolf,Monica S. Lam +1 more
TL;DR: The loop transformation theory is applied to the problem of maximizing the degree of coarse- or fine-grain parallelism in a loop nest and it is shown that the maximum degree of parallelism can be achieved by transforming the loops into a nest of coarsest fullypermutable loop nests and wavefronting the fully permutable nests.
727
The parallel execution of DO loops
TL;DR: Methods are developed for the parallel execution of different iterations of a DO loop and practical application to the design of compilers for such computers is discussed.
Supernode partitioning
François Irigoin,R. Triolet +1 more
- 13 Jan 1988
TL;DR: A class of partitionings is presented that encompasses previous techniques and provides enough flexibility to adapt code to multiprocessors with two levels of parallelism and two level of memory.
635