Top 31 papers presented at Parallel Computing in 1984

Showing papers presented at "Parallel Computing in 1984"

Journal Article•10.1016/S0167-8191(84)90413-7•

FFT algorithms for vector computers

[...]

National Center for Atmospheric Research¹

1 Aug 1984

TL;DR: Several methods for lengthening vectors are discussed, including the case of multiple and multi-dimensional transforms where M sequences of length N can be transformed as a single sequence of length MN using a 'truncated' FFT.

...read moreread less

Abstract: The adaptation of the Cooley-Tukey, the Pease and the Stockham FFT's to vector computers is discussed. Each of these algorithms computes the same result namely, the discrete Fourier transform. They differ only in the way that intermediate computations are stored. Yet it is this difference that makes one or the other more appropriate depending on the application. This difference also influences the computational efficiency on a vector computer and motivates the development of methods to improve efficiency. Each of the FFT's is defined rigorously by a short expository FORTRAN program which provides the basis for discussions about vectorization. Several methods for lengthening vectors are discussed, including the case of multiple and multi-dimensional transforms where M sequences of length N can be transformed as a single sequence of length MN using a 'truncated' FFT. The implementation of an in place FFT on a computer with memory-to-memory architecture is made possible by in place matrix-vector multiplication.

...read moreread less

184 citations

Journal Article•10.1016/S0167-8191(84)90165-0•

On some parallel banded system solvers

[...]

Jack Dongarra¹, Ahmed H. Sameh²•Institutions (2)

Argonne National Laboratory¹, University of Illinois at Urbana–Champaign²

1 Dec 1984

TL;DR: Algorithms for solving narrow banded systems and the Helmholtz difference equations that are suitable for multiprocessing systems are described and highlight the large grain parallelism inherent in the problems.

...read moreread less

Abstract: This paper describes algorithms for solving narrow banded systems and the Helmholtz difference equations that are suitable for multiprocessing systems. The organization of the algorithms highlight the large grain parallelism inherent in the problems.

...read moreread less

96 citations

Journal Article•10.1016/S0167-8191(84)90072-3•

Pseudo-random trees in Monte Carlo

[...]

Paul O. Frederickson¹, Robert Hiromoto¹, Thomas L. Jordan¹, Burton J. Smith, Tony Warnock² - Show less +1 more•Institutions (2)

Los Alamos National Laboratory¹, Cray²

1 Dec 1984

TL;DR: Pseudo-random trees can be used to give reproducibility, as well as speed, in Monte Carlo computations on parallel computers with either the SIMD architecture of the current generation of supercomputer or the MIMD architecture characteristic of the next generation.

...read moreread less

Abstract: We present the concept of a pseudo-random tree, and generalize the Lehmer pseudo-random number generator as an efficient implementation of the concept. Pseudo-random trees can be used to give reproducibility, as well as speed, in Monte Carlo computations on parallel computers with either the SIMD architecture of the current generation of supercomputer or the MIMD architecture characteristic of the next generation. Monte Carlo simulations based on pseudo-random trees are free of certain pitfalls, even for sequential computers, which can make them considerably more useful.

...read moreread less

52 citations

Journal Article•10.1016/0010-4655(84)90002-X•

A highly optimized vectorized code for Monte Carlo simulations of Su(3) lattice gauge theories

[...]

D. Barkai, K.J.M. Moriarty¹, K.J.M. Moriarty², Claudio Rebbi³•Institutions (3)

Royal Holloway, University of London¹, Dalhousie University², Brookhaven National Laboratory³

1 Mar 1984

TL;DR: New methods are introduced for improving the performance of the vectorized Monte Carlo SU(3) lattice gauge theory algorithm using the CDC CYBER 205, and the performance achieved for a 16 4 lattice on a 2-pipe system is discussed.

...read moreread less

Abstract: New methods are introduced for improving the performance of the vectorized Monte Carlo SU(3) lattice gauge theory algorithm using the CDC CYBER 205. Structure, algorithm and programming considerations are discussed. The performance achieved for a 16(4) lattice on a 2-pipe system may be phrased in terms of the link update time or overall MFLOPS rates. For 32-bit arithmetic, it is 36.3 microsecond/link for 8 hits per iteration (40.9 microsecond for 10 hits) or 101.5 MFLOPS.

...read moreread less

41 citations

Journal Article•10.1016/S0167-8191(84)90424-1•

A compact algorithm for Gaussian elimination over GF(2) implemented on highly parallel computers

[...]

Dennis Parkinson¹, Marvin C. Wunderlich²•Institutions (2)

Queen Mary University of London¹, Northern Illinois University²

1 Aug 1984

TL;DR: A method has been developed that needs no extra storage to store the history of the elimination of Gaussian elimination over GF(2), and its correctness proved.

...read moreread less

Abstract: Gaussian elimination over GF(2) is used in a number of applications including the factorisation of large integers. The boolean nature of arithmetic in GF(2) makes the task well suited to highly parallel bit-organised computers. A program to work with up to 4096 x 4096 matrices has been developed for the ICL-DAP. A method has been developed that needs no extra storage to store the history of the elimination. The algorithm is presented and its correctness proved.

...read moreread less

34 citations

Journal Article•10.1016/S0167-8191(84)90446-0•

Parallel pivoting algorithms for sparse symmetric matrices

[...]

Frans J. Peters¹•Institutions (1)

Eindhoven University of Technology¹

1 Aug 1984

TL;DR: It is shown that for dense sets of equations all the pivots must necessarily be processed one at a time; only if the set is sufficiently sparse, some pivots may be processed simultaneously.

...read moreread less

Abstract: In this paper it is investigated which pivots may be processed simultaneously when solving a set of linear equations. It is shown that for dense sets of equations all the pivots must necessarily be processed one at a time; only if the set is sufficiently sparse, some pivots may be processed simultaneously. We present parallel pivoting algorithms for MIMD computers with sufficiently many processors and a common memory. Moreover we present algorithms for MIMD computers with an arbitrary, but fixed number of processors. For both types of computers algorithms embodying an ordering strategy are given.

...read moreread less

33 citations

Journal Article•10.1016/S0167-8191(84)90229-1•

Vectorized Monte Carlo photon transport

[...]

F. W. Bobrowicz¹, J. E. Lynch¹, K. J. Fisher¹, J. E. Tabor¹•Institutions (1)

Los Alamos National Laboratory¹

1 Dec 1984

TL;DR: In this paper, the results of current research in the development of a Cray algorithm for time-dependent Monte Carlo photon radiation transport is presented, which is a fully vectorized particle vector scheme.

...read moreread less

Abstract: The results of current research in the development of a Cray algorithm for time-dependent Monte Carlo photon radiation transport is presented The method that has been developed is a fully vectorized particle-vector scheme This technique tracks groups of particles simultaneously using a vector-stack formalism based upon particle events Timing comparisons between this algorithm and the traditional single-particle approach are presented

...read moreread less

25 citations

Journal Article•10.1016/S0167-8191(84)90181-9•

A simulator for MIMD performance prediction: application to the S-1 MkIIa multiprocessor

[...]

T. S. Axelrod¹, Paul F. Dubois¹, Peter G. Eltgroth¹•Institutions (1)

Lawrence Livermore National Laboratory¹

1 Dec 1984

TL;DR: A MIMD multiprocessor simulator is described and applied to the investigation of the behavior of four problems on the S-1: The benchmark physics code SIMPLE, a conjugate gradient linear algebra problem, a simple Monte-Carlo problem, and a new method for neutron transport calculations.

...read moreread less

Abstract: We describe a MIMD multiprocessor simulator and application of that simulator to a multiprocessor of current interest, the S-1 MkIIa. The simulator runs on the CRAY-1 and is designed so that computational physics benchmarks are actually run and produce results. Simulator output from this run is fed into a second level (hardware) simulator which calculates the behavior of the multiprocessor. The simulator can simulate multiprocessors whose basic architecture is that of a few, large processors with or without data caches, sharing global memory through an interconnection switch. The simulator is applied to the investigation of the behavior of four problems on the S-1: The benchmark physics code SIMPLE, a conjugate gradient linear algebra problem, a simple Monte-Carlo problem, and a new method for neutron transport calculations.

...read moreread less

25 citations

Journal Article•10.1016/S0167-8191(84)90149-2•

A parallelized point rowwise successive over-relaxation method on a multiprocessor

[...]

Nisheeth R. Patel¹, Harry F. Jordan²•Institutions (2)

United States Department of the Army¹, University of Colorado Boulder²

1 Dec 1984

TL;DR: The present study suggests the possibility of both reducing the real time processing and increasing the scope of computational modeling in the Heterogeneous Element Processor (HEP) multiple instruction stream computer.

...read moreread less

Abstract: A parallelized point rowwise Successive Over-Relaxation (SOR) iterative algorithm is developed for the Heterogeneous Element Processor (HEP) multiple instruction stream computer. The classical point SOR method is not easily vectorizable with rowwise ordering of the grid points, but it can be effectively parallelized on a multiple instruction stream machine without suffering in computational and convergence rate. The details of the implementation including restructuring of a serial FORTRAN program and techniques needed to exploit the parallel processing architectural concept of the HEP are presented. The parallelized algorithm is analyzed in detail. The lessons learned in this study are documented and may provide some guidelines for similar future coding since new approaches and restructuring techniques are required for programming a multiple instruction stream machine, which are totally different than those required for programming an algorithm on a vector processor. To assess the capabilitiesof the parallelized algorithm it was used to solve the Laplace's equation on a rectangular field with Dirichlet boundary conditions. Computer run times are presented which indicate significant speed gain over a scalar version of the code. For a moderate to large size problem seventeen or more processes are required to make efficient use of the parallel processing hardware. Also, to demonstrate the capability of the algorithm for a realistic problem, it was used to obtain the numerical solution of a viscous incompressible fluid in a square cavity. Since point iterative relaxation schemes are at the core of many systems of elliptic as well as non-elliptic partial differential equations occuring in engineering and scientific applications, the present study suggests the possibility of both reducing the real time processing and increasing the scope of computational modeling.

...read moreread less

24 citations

Journal Article•10.1016/S0167-8191(84)90435-6•

Stability aspects in using parallel algorithms

[...]

Wolfgang Rönsch

1 Aug 1984

TL;DR: The problem of stability of parallel algorithms, which has as yet received very little attention in the literature, is discussed using arithmetic expressions as an example and results are given for some of these expressions to classify their numerical quality.

...read moreread less

Abstract: After a detailed description of the theoretical foundations of forward error analysis formulated by Stummel, the problem of stability of parallel algorithms, which has as yet received very little attention in the literature, is discussed using arithmetic expressions as an example. The stability analysis includes several summation algorithms, the evaluation of products, parallelization of alternating expressions such as the Horner expression and finite continued fractions and finally general arithmetic expressions. For the CRAY-1 timing versus stability results are given for some of these expressions to classify their numerical quality.

...read moreread less

16 citations

Journal Article•10.1016/S0167-8191(84)90133-9•

Experiences with the Denelcor HEP

[...]

Robert Hiromoto¹, Olaf M. Lubeck¹, James W. Moore¹•Institutions (1)

Los Alamos National Laboratory¹

1 Dec 1984

TL;DR: Three FORTRAN codes, each typical of a class of simulation problems at Los Alamos National Laboratory, have been converted to execute on the Denelcor HEP.

...read moreread less

Abstract: Three FORTRAN codes, each typical of a class of simulation problems at Los Alamos National Laboratory, have been converted to execute on the Denelcor HEP. The codes are (i) PIC, a particle-in-cell code; (ii) SIMPLE, a two-dimensional Lagrangian hydrodynamics program, and (iii) TRAC, a nuclear reactor simulation. The programming paradigm that was used and the algorithmic nature of the concurrency are discussed. Speedups as a function of number of processes are given.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90048-6•

Buffering for vector performance on a pipelined MIMD machine

[...]

Danny C. Sorensen¹•Institutions (1)

Argonne National Laboratory¹

1 Dec 1984

TL;DR: Empirical evidence is presented to show that up to 5.8 megaflop performance is possible from the Denelcor HEP on very regular tasks such as matrix vector products, indicating that an apparently minor refinement to the architectural design could provide very efficient vector operations in addition to the parallelism and low-overhead synchronization already offered by the HEP architecture.

...read moreread less

Abstract: A technique is presented for obtaining vector performance from a pipelined MIMD computer that does not have hardwired vector instructions. The specific computer in mind is the Denelcor HEP, but the technique might influence the use and possibly even the design of future machines with this type of architecture. This preliminary report presents the basic idea and demonstrates that it can be implemented. Buffering blocks of data to registers is used in conjunction with pipelined floating-point operations to achieve vector performance. Empirical evidence is presented to show that up to 5.8 megaflop performance is possible from the Denelcor HEP on very regular tasks such as matrix vector products. While this rate is not in the 'super-computer' range, it is certainly respectable given the hardware capabilities of the HEP (this machine is rated at 10 MIPS peak). This performance indicates that an apparently minor refinement to the architectural design could provide very efficient vector operations in addition to the parallelism and low-overhead synchronization already offered by the HEP architecture.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90012-7•

A numerical seismic 3-D migration model for vector multiprocessors

[...]

Christopher C. Hsiung¹, Werner Butscher¹•Institutions (1)

Cray¹

1 Dec 1984

TL;DR: It is demonstrated that careful algorithm design can lead to a significant speedup of the calculation when more than one processor is used, and the throughput times obtained in this study are an order of magnitude faster than some conventional approaches.

...read moreread less

Abstract: The availability of a multiprocessor vector machine, such as the CRAY X-MP, along with large, fast secondary memory such as the CRAY SSD, opens new frontiers to numerical algorithm design for 3-D simulations The 3-D seismic migration, which is of crucial importance in exploration seismology, will be studied as a model problem The numerical model discussed in this paper employs an alternating direction implicit (ADI) Crank-Nicolson scheme which takes full advantage of the parallel architecture of the underlying machine It is demonstrated that careful algorithm design can lead to a significant speedup of the calculation when more than one processor is used The throughput times obtained in this study are an order of magnitude faster than some conventional approaches

...read moreread less

Journal Article•10.1016/S0167-8191(84)90213-8•

Performance evaluation of vector implementations of combinatorial algorithms

[...]

Celso C. Ribeiro¹•Institutions (1)

The Catholic University of America¹

1 Dec 1984

TL;DR: The computational results obtained show the adequacy of the performance evaluation model and very important gains concerning computing times, showing that vector computers will be of great importance in the field of combinatorial optimization.

...read moreread less

Abstract: We study the performance and the use of vector computers for the solution of combinatorial optimization problems, particularly dynamic programming and shortest path problems. A general model for performance evaluation and vector implementations for the problems described above are studied. These implementations were done on a CRAY-1 vector computer and the computational results obtained show (i) the adequacy of the performance evaluation model and (ii) very important gains concerning computing times, showing that vector computers will be of great importance in the field of combinatorial optimization.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90293-X•

Short communication: Parallel marching Poisson solvers

[...]

Marian Vajteric¹•Institutions (1)

Slovak Academy of Sciences¹

1 Dec 1984

TL;DR: Using orthogonal decomposition properties of arising matrices, the algorithms can be formulated in terms of transformed vectors for solving Poisson equation at N^2 mesh points with complexity bound O(log N).

...read moreread less

Abstract: The paper presents parallel algorithms for solving Poisson equation at N^2 mesh points. The methods based on marching techniques are structured for efficient parallel realization. Using orthogonal decomposition properties of arising matrices, the algorithms can be formulated in terms of transformed vectors. On a MIMD computer with not more than N processors, the computations can be performed in horizontal slices with minimal synchronization requirements. Considering an SIMD machine with N^2 processors, the complexity bound O(log N) has been achieved, whereby the single marching requires 10 log N steps only.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90024-3•

Vectorized finite-element stiffness generation: tuning the Noor-Lambiotte algorithm

[...]

Matthias Kratz

1 Dec 1984

TL;DR: A variant of the Noor-Lambiotte algorithm for finite-element stiffness computations on vector machines is presented and proves equally successful for rather simple models such as plane strain or plane stress situations as well as more complicated two- or three-dimensional elements.

...read moreread less

Abstract: A variant of the Noor-Lambiotte algorithm for finite-element stiffness computations on vector machines is presented. Its considerable speed up is explained in comparison with conventional software and run-times measured on a CRAY-1 are given. The method proves equally successful for rather simple models such as plane strain or plane stress situations as well as more complicated two- or three-dimensional elements. Discussion focuses both on the exploitation of the pipeline effect and the elimination of superfluous operations.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90197-2•

A parallel algorithm for the enumeration of the spanning trees of a graph

[...]

Shao-Wen Mai¹, David J. Evans¹•Institutions (1)

Loughborough University¹

1 Dec 1984

TL;DR: A parallel algorithm for solving the problem of enumerating the spanning trees of a graph arises in several contexts such as computer-aided design and computer networks and is based on the principle of the inclusion and exclusion of sets, and not directly on the partitioning of the graph itself.

...read moreread less

Abstract: As is well known, the strategy of divide-and-conquer is widely used in problem solving. The method of partitioning is also a fundamental strategy for the design of a parallel algorithm. The problem of enumerating the spanning trees of a graph arises in several contexts such as computer-aided design and computer networks. A parallel algorithm for solving the problem is presented in this paper. It is based on the principle of the inclusion and exclusion of sets, and not directly based on the partitioning of the graph itself. The results of the preliminary experiments on a MIMD system appear promising.

...read moreread less

Journal Article•10.1016/0167-8191(88)90076-2•

A Sparse Matrix Algorithm on the Boolean Vector Machine

[...]

A R Wagner¹, L M Patrick¹•Institutions (1)

Duke University¹

1 Jan 1984

TL;DR: This paper relates the experiences in implementing a basic matrix-vector iteration algorithm for sparse matrices on the BVM and shows that a $2^{20}$ PE BVM can deliver over 1 billion $(10^9)$ useful floating point operations per second for this problem.

...read moreread less

Abstract: The Boolean Vector Machine (BVM) is a large network of extremely small processors with very small memories operating in SIMD mode using bit-serial arithmetic. Individual processors communicate via a hardware implementation of the Cube Connected Cycles (CCC) network. A prototype BVM with 2048 processing elements, each with 200 binary bits of memory, is currently being built using VLSI technology. The BVM''s bit-serial arithmetic and the small memories of individual processors are apparently a drawback to its effectiveness when applied to large numerical problems. In this paper we relate our experiences in implementing a basic matrix-vector iteration algorithm for sparse matrices on the BVM. We show that a $2^{20}$ PE BVM can deliver over 1 billion $(10^9)$ useful floating point operations per second for this problem. The algorithm is expressed in a new language (BVL) which has been defined for programming the BVM.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90402-2•

Vorton dynamics: a case study of developing a fluid dynamics model for a vector processor

[...]

Jr. M. J. Kascic

1 Aug 1984

TL;DR: The raw performance of vector processors such as the CDC CYBER-205 has been well documented, and the ability to apply this raw power to ever more complex algebraic algorithms has been reported in [9].

...read moreread less

Abstract: The raw performance of vector processors such as the CDC CYBER-205 has been well documented. The ability to apply this raw power to ever more complex algebraic algorithms has been reported in [9]. The final step in making computers of this class truly the revolutionary tools they are claimed to be is to develop whole applications that perform at a significant fraction of the raw power. This involves two distinct subclasses of problems. On the one hand, there are those pre-existing applications that must be mapped onto vector processors in such a way that not only is performance maintained, but also a (sometimes vague) set of computational boundary conditions of the user community is satisfied. On the other hand, there are those models which are developed ab initio with machines such as the CYBER-205 in mind. The development of solutions to problems in the former class involves psychology and politics as well as mathematics and computer science. We limit ourselves here to reporting on an example of the latter class, viz. a model to study a particular fluid-dynamic phenomenon, that was specifically designed with the CYBER-205 in mind.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90245-X•

Conference report: Conference on forefronts of large-scale computational problems

[...]

B. L. Buzbee¹, H. J. Raveché²•Institutions (2)

Los Alamos National Laboratory¹, National Institute of Standards and Technology²

1 Dec 1984

Journal Article•10.1016/S0167-8191(84)90277-1•

Supercomputers in Europe

[...]

Iain S. Duff¹•Institutions (1)

Argonne National Laboratory¹

1 Dec 1984

TL;DR: There are now over thirty supercomputers in use in Europe and their distribution both by geographical location and by the principal activity of each site is considered, leading to an estimate of the amount of supercomputer usage in the major areas of scientific research.

...read moreread less

Abstract: There are now over thirty supercomputers in use in Europe. In this short communication, we consider their distribution both by geographical location and by the principal activity of each site. The latter leads naturally to an estimate of the amount of supercomputer usage in the major areas of scientific research.

...read moreread less

Journal Article•10.1016/0167-8191(89)90115-4•

Parallel Solution of Arbitrarily Sparse Linear Systems

[...]

A R Wagner¹•Institutions (1)

Duke University¹

1 Jan 1984

TL;DR: Analysis of a parallel algorithm for the iterative solution of sparse linear systems suggests that a network of Processing Elements equal in number to the number of non-zero matrix entries is particularly useful.

...read moreread less

Abstract: A parallel algorithm for the iterative solution of sparse linear systems is presented. This algorithm is shown to be efficient for arbitrarily sparse matrices. Analysis of this algorithm suggests that a network of Processing Elements [PEs] equal in number to the number $R$ of non-zero matrix entries is particularly useful. If this collection of PEs is interconnected by a message-passing, or a synchronous, communication network which is fast enough, the iteration time grows as the logarithm of the number of PEs. A comparison with earlier work, which suggested that only $\sqrt{R}$ PEs are useful for this task, is also presented.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90391-0•

Numerical algorithms in computational fluid dynamics on vector computers

[...]

Wolfgang Gentzsch

1 Aug 1984

TL;DR: The vectorization of five well-known algorithms, the explicit-implicit MacCormack scheme, the implicit scheme of Beam and Warming, a boundary-layer algorithm, a Galerkin procedure and a Monte Carlo simulation, for the solution of problems in computational fluid dynamics is discussed and computation times are given.

...read moreread less

Abstract: Modern vector computers tend to favour certain classes of algorithms (e.g. explicit, Jacobi-type) while other important algorithms, such as implicit ones or Monte Carlo, in their serial versions are not very suitable for these machines. However, restructuring of serial algorithms often enables the user to exploit fully the potential of vector machines, which will often result in remarkable performance improvements. In the following contribution, the vectorization of five well-known algorithms, the explicit-implicit MacCormack scheme, the implicit scheme of Beam and Warming, a boundary-layer algorithm, a Galerkin procedure and a Monte Carlo simulation, for the solution of problems in computational fluid dynamics is discussed and computation times are given.

...read moreread less

Proceedings Article•10.5555/2902381.2902643•

A simulator for MIMD performance prediction

[...]

AxelrodT., DuboisP., EltgrothP.

1 Dec 1984

TL;DR: In this paper, a MIMD multiprocessor simulator and application of that simulator to a S-1 MkIIa is described, which runs on the CRAY-1 and is designed so that comput...

...read moreread less

Journal Article•10.1016/S0167-8191(84)90084-X•

Short communication: The minimal average latency of multiconfigurable pipelines

[...]

Jürgen Tappe¹•Institutions (1)

RWTH Aachen University¹

1 Dec 1984

TL;DR: It is shown that the average latencies of all sequences of optimal cycles through a fixed vertex in the state graph of a multiconfigurable pipeline have a common limit, provided that their initiation numbers are unbounded and approximate the same ratio of the underlying operations.

...read moreread less

Abstract: It is shown that the average latencies of all sequences of optimal cycles through a fixed vertex in the state graph of a multiconfigurable pipeline have a common limit, provided that their initiation numbers are unbounded and approximate the same ratio of the underlying operations. This leads to a definition of the minimal average latency of a multiconfigurable pipeline.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90108-X•

Conference report: Conference on experiences in applying parallel processors to scientific computation

[...]

B. L. Buzbee¹, George Michael²•Institutions (2)

Los Alamos National Laboratory¹, Lawrence Livermore National Laboratory²

1 Dec 1984

TL;DR: A broad spectrum of scientific computation is amenable to parallel processing, and parallel formulations have been achieved for meaningful computational kernels from such areas as Plasma simulation, Lagrangian fluid flow simulation, Reactor safety simulation, Automated reasoning.

...read moreread less

Abstract: There is general agreement that uniprocessor speeds are nearing fundamental limits and that the only way that more speed can be attained is to exploit computational parallelism of one sort or another. The concept is simple but the implementations are not. Clearly, the way to the future must be evolutionary, and it will require the skills of the computer industry, universities, and the national laboratories. The first production mutiprocessor supercomputer systems are now being delivered. Indications are that within the next five years supercomputer manufacturers will offer systems with up to 16 processors. Successful exploitation of their potential performance will require algorithms, software, and hardware that, when combined as a single system, will achieve high-average processor utilization and introduce little additional work relative to a single-processor implementation. The questions of how best to do these things are occupying the attention of researchers all over the world. This work is considered the key to expansion of computer performance. Within the past two years, a significant amount of experimentation on the parallel processing of scientific computation has been done. Most of this work as related to Supercomputers has not been previously reported. Consequently, on March 13-15, Lawrence Livermore National Laboratory and Los Alamos National Laboratory hosted a meeting at Gleneden Beach, Oregon, at which many of these experiments were discussed. The general thrust of these presentations suggests several important results such as 1. A broad spectrum of scientific computation is amenable to parallel processing. The presentations revealed that parallel formulations have been achieved for meaningful computational kernels from such areas as Plasma simulation, Lagrangian fluid flow simulation, Reactor safety simulation, Automated reasoning,

...read moreread less

Journal Article•10.1016/S0167-8191(84)90325-9•

Short communication: VLSI systems for some problems of computational geometry

[...]

Ondrej Sýkora¹•Institutions (1)

Slovak Academy of Sciences¹

1 Dec 1984

TL;DR: The capability of the network Mesh of Trees for application in VLSI systems solving fastly the computational geometry problems is shown on two examples: determination of the convex hull of a weakly externally visible polygon and determination ofThe visibility polygon of a polygon.

...read moreread less

Abstract: Fast solution of the computational geometry problems is important for computer graphics, image processing and pattern recognition. The capability of the network Mesh of Trees for application in VLSI systems solving fastly the computational geometry problems is shown on two examples: determination of the convex hull of a weakly externally visible polygon and determination of the visibility polygon of a polygon.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90096-6•

Short communication: Algorithms for pipeline control

[...]

Jürgen Tappe¹•Institutions (1)

RWTH Aachen University¹

1 Dec 1984

TL;DR: The control of a statically configured pipeline corresponds to certain paths in its state graph, and properties of this graph and algorithms for optimal paths are discussed.

...read moreread less

Abstract: The control of a statically configured pipeline corresponds to certain paths in its state graph. Properties of this graph and algorithms for optimal paths are discussed.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90309-0•

Higher-order communications for concurrent programming

[...]

Alberto Pettorossi, Andrzej Skowron¹•Institutions (1)

University of North Carolina at Charlotte¹

1 Dec 1984

TL;DR: This paper extends the approach of Pettorossi and Skowron (1983) and considers also `higher-order?

...read moreread less

Abstract: In Pettorossi and Skowron (1983) a recursive-equations language is introduced. Its operational semantics is specified by means of computing agents which communicate and exchange messages. Those communications are, so to speak, zero-order, in the sense that the exchanged messages are values of a data structure, possibly defined by the programmer.In this paper we extend that approach and we consider also `higher-order? communications by allowing the exchange of agents behaviours, i.e. sets of computations, among computing agents. This extension leads to a new programming methodology which makes use of proofs of computing agents behaviours and their related strategies.

...read moreread less

Journal Article•10.1016/S0167-8191(84)90036-X•

A collection of parallel linear equations routines for the Denelcor HEP

[...]

Jack Dongarra¹, Robert Hiromoto²•Institutions (2)

Argonne National Laboratory¹, Los Alamos National Laboratory²

1 Dec 1984

TL;DR: This paper describes the implementation and performance results for a few standard linear algebra routines on the Denelcor HEP computer, based on high-level modules that facilitate portability and perform efficiently in a wide range of environments.

...read moreread less

Abstract: This paper describes the implementation and performance results for a few standard linear algebra routines on the Denelcor HEP computer. The algorithms used here are based on high-level modules that facilitate portability and perform efficiently in a wide range of environments. The modules are chosen to be of a large enough computational granularity so that reasonably optimum performance may be insured. The design of algorithms with such fundamental modules in mind will also facilitate their replacement by others more suited to gain the desired performance on a particular computer architecture.

...read moreread less