Matrix Multiplication on Boolean Cubes using Generic Communication Primitives

Open Access

Matrix Multiplication on Boolean Cubes using Generic Communication Primitives

- 01 Jan 1989

- pp 108-156

35

TL;DR: Generic primitives for matrix operations as defined by the level one, two and three of the BLAS are of great value in that they make user programs much simpler, and hide most of the architectular detail of improtance for performence in the primitives.

Abstract: Generic primitives for matrix operations as defined by the level one, two and three of the BLAS are of great value in that they make user programs much simpler, and hide most of the architectular detail of improtance for performence in the primitives. We describe generic shared memory primitives such as one-to-all and all-to-all broadcasting, and one-to-all and all-to-all personalized communication, and implementations theoref thar are within a factor of two of the best known lower bounds. We describe algorithms for the multiplication of arbitrarily shaped matrices using these primitives. Of the three loops required for a standard matrix multiplication algorithm expressed in Fortran all three can be parallelised. We show that if one loop is parallelised, then the processors shall be aligned with the loops having the most elements. Depending on the initial matrix allocation data permutatuions may be required to accomplish the processor/loop alignment. This permutation id included in our analysis. We show that in parallelizing two loops the optimum aspect ratio of the processing plane is equal to the ratio of the number of matrix elements in the two loops being parallelized

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/71.642949

Efficient algorithms for all-to-all communications in multiport message-passing systems

Jehoshua Bruck, +4 more

- 01 Nov 1997

- IEEE Transactions on Parallel and Distri...

TL;DR: This work presents efficient algorithms for two all-to-all communication operations in message-passing systems: index and concatenation, both of which are based on the communication start-up time and the communication bandwidth.

...read moreread less

362

Proceedings Article•10.1145/63047.63048

What have we learnt from using real parallel machines to solve real problems

Geoffrey C. Fox

- 03 Jan 1989

TL;DR: A space-time analogy is used to classify problems and shows how a division into synchronous, loosely synchronous and asynchronous problems is helpful and isolates the asynchronous class as that for which major uncertainties as to possible parallelism exist.

...read moreread less

91

Journal Article•10.1109/71.342134

Optimal broadcast in all-port wormhole-routed hypercubes

Ching-Tien Ho, +1 more

- 01 Feb 1995

- IEEE Transactions on Parallel and Distri...

TL;DR: An optimal algorithm that broadcasts on an n-dimensional hypercube in O(n/ log/sub 2/ (n+1)) routing steps with wormhole, e-cube routing and all-port communication is given.

...read moreread less

67

Journal Article•10.1007/BF00128176

Compiling parallel programs by optimizing performance

Marina C. Chen, +2 more

- 01 Oct 1988

- The Journal of Supercomputing

TL;DR: This paper describes how Crystal, a language based on familiar mathematical notation and lambda calculus, addresses the issues of programmability and performance for parallel supercomputers and illustrates the power of its approach with benchmarks of compiled parallel code from Crystal source.

...read moreread less

67

•Journal Article•10.1137/S0097539795283619

Fast Gossiping by Short Messages

Jean-Claude Bermond, +3 more

- 01 Aug 1998

- SIAM Journal on Computing

TL;DR: This paper considers the problem of gossiping in communication networks under the restriction that communicating nodes can exchange up to a fixed number p of packets at each round, and determines the optimal number of communication rounds to perform gossiping for several classes of graphs, including Hamiltonian graphs and complete k-ary trees.

...read moreread less

60

...

Expand

Matrix Multiplication on Boolean Cubes using Generic Communication Primitives

Chat with Paper

AI Agents for this Paper

Citations

Efficient algorithms for all-to-all communications in multiport message-passing systems

What have we learnt from using real parallel machines to solve real problems

Optimal broadcast in all-port wormhole-routed hypercubes

Compiling parallel programs by optimizing performance

Fast Gossiping by Short Messages

Related Papers (5)

Solving Problems on Concurrent Processors

A survey of gossiping and broadcasting in communication networks

Methods and problems of communication in usual networks

Computing Fast Fourier Transforms On Boolean Cubes And Related Networks

Time-Efficient Maze Routing Algorithms on Reconfigurable Mesh Architectures