Top 28 papers presented at Parallel Computing in 1985

Showing papers presented at "Parallel Computing in 1985"

Journal Article•10.1016/0167-8191(85)90016-X•

A parallel partition method for solving banded systems of linear equations

[...]

1 Mar 1985

TL;DR: The partition method of Wang for tridiagonal equations is generalized to the arbitrary band case and the algorithm is compared to Gaussian elimination and cyclic reduction.

...read moreread less

Abstract: The partition method of Wang for tridiagonal equations is generalized to the arbitrary band case. A stability criterion is given. The algorithm is compared to Gaussian elimination and cyclic reduction.

...read moreread less

60 citations

Journal Article•10.1016/0167-8191(85)90024-9•

MIMD computing in the USA - 1984

[...]

Roger W. Hockney¹•Institutions (1)

University of Reading¹

1 Jun 1985

TL;DR: A broad classification of MIMD computers is proposed, and the computers are discussed in this framework, together with brief details of the architecture and performance of each machine.

...read moreread less

Abstract: It is often said that the 1980s are becoming the decade of multiinstruction stream or MIMD computers, while the 1970s could be described as the decade of the SIMD (single instruction stream multiple data stream) computers. The availability of microprocessors and VLSI facilities has led to the proposal and construction of novel computer architectures based on linking many hundreds or even thousands of microprocessors, or specially designed VLSI chips. Some of the larger manufacturers offer computers with a small number of CPUs. Because of the variety of the new developments, it was decided to conduct a survey of proposed and existing MIMD computers in the U.S., taking into account a simple classification of the different devices. Particular attention is given to computers which are designed for numerical work with floating-point numbers and the solution of large problems in physics, chemistry, and engineering.

...read moreread less

40 citations

Journal Article•10.1016/0167-8191(85)90014-6•

() measurements on the 2-CPU CRAY X-MP

[...]

Roger W. Hockney

1 Mar 1985

39 citations

Journal Article•10.1016/0167-8191(85)90003-1•

Inner/outer iterative methods and numerical Schwarz algorithms

[...]

Garry Rodrigue¹•Institutions (1)

Lawrence Livermore National Laboratory¹

1 Nov 1985

TL;DR: Variants of the numerical Schwarz algorithms for solving elliptic partial differential equations on multiprocessing systems are described and it is shown that under certain matrix nonnegativity conditions that the convergence rate of the global iteration is invariant to the amount of overlap of the subdomains.

...read moreread less

Abstract: Variants of the numerical Schwarz algorithms for solving elliptic partial differential equations on multiprocessing systems are described and analyzed. the methods are described in terms of domain decomposition techniques and mathematically cast into an inner/outer iterative form. It is shown that under certain matrix nonnegativity conditions that the convergence rate of the global iteration is invariant to the amount of overlap of the subdomains.

...read moreread less

36 citations

Journal Article•10.1016/0167-8191(85)90002-X•

Overview of parallel processing

[...]

George S. Almasi

1 Nov 1985

TL;DR: An overview of the promises and accomplishments of parallel processing as well as the problems and work that remain is treated, which shows the field is at an interesting juncture.

...read moreread less

Abstract: We are on the threshold of a new era in computer architecture. It is becoming increasingly difficult to obtain more performance from the time-honored von Neumann model, and many of the technological constraints that influenced its design over thirty years ago have changed drastically. Many of the arguments for processing a single instruction at a time no longer apply, and a number of enthusiastic parallel processing projects are working on various ways to allow many processors to work on a single problem at the same time. However, this re-opens a Pandora's box of questions about how computation should be done, and some of the strengths of the von Neumann model which temporarily closed this box three decades ago become especially apparent when one tries to replace it. This overview treats the promises and accomplishments of parallel processing as well as the problems and work that remain. The paper is organized as follows: Current driving forces for parallel processing; Definitions and fundamental questions; Survey of projects; Emerging answers. As will be shown, the field is at an interesting juncture. Much work has been done, and the ideas are now there for putting it all together. But some large experiments are needed to provide real results from real programs if the pace of progress is to be maintained.

...read moreread less

27 citations

Journal Article•10.1016/0167-8191(85)90029-8•

Vector coding the finite-volume procedure for the CYBER 205

[...]

Arthur Rizzi

1 Dec 1985

TL;DR: A method for the large-scale numerical simulation of fluid flow and fundamental principles of vector programming in FORTRAN are discussed in order to set the stage for the main topic, the vector coding and execution of the finite-volume procedure on the CYBER 205.

...read moreread less

Abstract: The paper reviews a method for the large-scale numerical simulation of fluid flow and discusses fundamental principles of vector programming in FORTRAN in order to set the stage for the main topic, the vector coding and execution of the finite-volume procedure on the CYBER 205. With the proper structure given to the data by the grid transformation each coordinate direction can be differenced throughout the entire grid in one vector operation. Boundary conditions must be interleaved which tends to inhibit the concurrency of the overall scheme, but a stragey of no data motion together with only inner-loop vectorization is judged to be the best compromise. The computed example of transonic vortex flow separating from the sharp leading edge of a delta wing demonstrates the processing performance of the procedure. Vectors over 40000 elements long are obtained, and a rate of over 125 megaflops sustained over the entire computation indicates the high degree of vectorization achieved.

...read moreread less

18 citations

Journal Article•10.1016/0167-8191(85)90031-6•

CEPROL: a cellular programming language

[...]

Friedhelm Seutter¹•Institutions (1)

Braunschweig University of Technology¹

1 Dec 1985

TL;DR: A cellular programming language — named CEPROL — is presented which offers means for programming and controlling cellular automata processing such algorithms.

...read moreread less

Abstract: Realized cellular automata may be operated by universal computer systems as programmable special-purpose processors for parallelizable problems. Because of their architecture (local neighbourhood, small storage size per cell, they are well suited for processing systolic algorithms. A cellular programming language — named CEPROL — is presented which offers means for programming and controlling cellular automata processing such algorithms.

...read moreread less

15 citations

Journal Article•10.1016/0167-8191(85)90007-9•

Synchronization and control of parallel algorithms

[...]

Paul O. Frederickson¹, Rondall E. Jones², Brian T. Smith³•Institutions (3)

Los Alamos National Laboratory¹, Sandia National Laboratories², Argonne National Laboratory³

1 Nov 1985

TL;DR: A modest collection of primitives for synchronization and control in parallel numerical algorithms are proposed, phrased in a syntax that is compatible with FORTRAN, creating a publication language for parallel software.

...read moreread less

Abstract: We propose a modest collection of primitives for synchronization and control in parallel numerical algorithms. These are phrased in a syntax that is compatible with FORTRAN, creating a publication language for parallel software. A preprocessor may be used to map code written in this extended FORTRAN into standard FORTRAN with calls to the run-time libraries of the various parallel systems now in use. We solicit the reader's comments on the clarity, as well as the adequacy, of the primitives we have proposed.

...read moreread less

15 citations

Journal Article•10.1016/0167-8191(85)90017-1•

Parallel, iterative solution of sparse linear systems: Models and architectures

[...]

Daniel A. Reed¹, Merrell L. Patrick²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Duke University²

1 Mar 1985

TL;DR: A model of a general class of asynchronous, iterative solution methods for linear systems is developed and a data transfer model predicting both the probability that data must be transferred between two tasks and the amount of data to be transferred is presented.

...read moreread less

Abstract: A model of a general class of asynchronous, iterative solution methods for linear systems is developed. In the model, the system is solved by creating several cooperating tasks that each compute a portion of the solution vector. A data transfer model predicting both the probability that data must be transferred between two tasks and the amount of data to be transferred is presented. This model is used to derive an execution time model for predicting parallel execution time and an optimal number of tasks given the dimension and sparsity of the coefficient matrix and the costs of computation, synchronization, and communication. The suitability of different parallel architectures for solving randomly sparse linear systems is discussed. Based on the complexity of task scheduling, one parallel architecture, based on a broadcast bus, is presented and analyzed.

...read moreread less

15 citations

Journal Article•10.1016/0167-8191(85)90022-5•

FACOM VP-100/200: Supercomputers with ease of use

[...]

Hiroshi Tamura¹, Sachio Kamiya¹, Takahiro Ishigai¹•Institutions (1)

Fujitsu¹

1 Jun 1985

TL;DR: The key components to establish the ease of use and higher performance are described, and the actual performances on the VP system are described.

...read moreread less

Abstract: FUJITSU has developed pipelined supercomputers, the FACOM VP-100/200 with the latest technology and new architecture. Based on extensive analyses of application programs, the following advanced features are employed in the VP system: 1. 1) Dynamically reconfigurable vector registers with large capacity. 2. 2) Efficient vector operations for vectorizing IF-statements in DO-loops, 3. 3) High level concurrency for parallel scalar-vector and vector-vector operations, 4. 4) Powerful vectorizing compiler for utilizing the advanced features, 5. 5) Effective tuning tools to extract higher performance of application programs, and 6. 6) Keeping good affinity with general-purpose computer systems. The final goals of development of the VP system are both ease of use and higher performance for various scientific and engineering applications. These goals have been achieved successfully. This paper describes the key components to establish the ease of use and higher performance, and also describes actual performances on the VP system.

...read moreread less

14 citations

Journal Article•10.1016/0167-8191(85)90026-2•

Parallel algorithms for tree traversals

[...]

N.C. Kalra¹, P.C.P. Bhatt¹•Institutions (1)

Indian Institute of Technology Delhi¹

1 Jun 1985

TL;DR: This paper establishes a one-to-one correspondence between the set of nodes that possess right sibling and theSet of leaf nodes for any forest for pre-order traversal.

...read moreread less

Abstract: Three commonly used traversal methods for binary trees (forsets) are pre-order, in-order and post-order. It is well known that sequential algorithms for these traversals takes order O(N) time where N is the total number of nodes. This paper establishes a one-to-one correspondence between the set of nodes that possess right sibling and the set of leaf nodes for any forest. For the case of pre-order traversal, this result is shown to provide an alternate characterization that leads to a simple and elegant parallel algorithm of time complexity O(log N) with or without read-conflicts on an N processor SIMD shared memory model, where N is the total number of nodes in a forest.

...read moreread less

Journal Article•10.1016/0167-8191(85)90005-5•

Concurrency in programming languages: A survey

[...]

Carlo Ghezzi¹•Institutions (1)

Polytechnic University of Milan¹

1 Nov 1985

TL;DR: Concurrency aspects of ADA are presented as a case study of a state-of-the-art programming language and the problems of synchronization and communication includes semaphores, messages and mailboxes, and monitors.

...read moreread less

Abstract: This paper surveys concurrency issues of programming languages. The evolution of these issues is analyzed in the context of the evolution of other language concepts, such as data and control abstraction. Specific concurrency concepts discussed in the paper include: granularity of parallelism, degree of parallelism, synchronization and communication, and physical distribution. The review of the problems of synchronization and communication includes semaphores, messages and mailboxes, and monitors. Concurrency aspects of ADA are also presented as a case study of a state-of-the-art programming language.

...read moreread less

Journal Article•10.1016/0167-8191(85)90030-4•

Two parallel algorithms for the convex hull problem in a two dimensional space

[...]

David J. Evans¹, Shao-Wen Mai¹•Institutions (1)

Loughborough University¹

1 Dec 1985

TL;DR: Two parallel algorithms for determining the convex hull of a set of data points in two dimensional space are presented and experimental results on a MIMD parallel system of 4 processors are analysed and presented.

...read moreread less

Abstract: Two parallel algorithms for determining the convex hull of a set of data points in two dimensional space are presented. Both are suitable for MIMD parallel systems. The first is based on the strategy of divide-and-conquer, in which some simplest convex-hulls are generated first and then the final convex hull of all points is achieved by the processes of merging 2 sub-convex hulls. The second algorithm is by the process of picking up the points that are necessarily in the convex hull and discarding the points that are definitely not in the convex hull. Experimental results on a MIMD parallel system of 4 processors are analysed and presented.

...read moreread less

Journal Article•10.1016/0167-8191(85)90023-7•

Task granularity studies on a many-processor CRAY X-MP

[...]

Donald A. Calahan¹•Institutions (1)

University of Michigan¹

1 Jun 1985

TL;DR: A hybrid granularity model is proposed for general concurrent solution and relevance to a many-processor CRAY X-MP is demonstrated by simulation.

...read moreread less

Abstract: A hybrid granularity model is proposed for general concurrent solution. It is applied to the triangular factorization of a dense matrix ranging in size from 4 to 1024. Concurrency is achieved at two levels: (1) with small (micro) task granularity and (2) with large (blocked) task granularity. Relevance to a many-processor CRAY X-MP is demonstrated by simulation.

...read moreread less

Journal Article•10.1016/0167-8191(85)90020-1•

The efficient use of vector computers with emphasis to computational fluid dynamics

[...]

Willi Schönauer¹, Wolfgang Gentzsch•Institutions (1)

Karlsruhe Institute of Technology¹

1 Mar 1985

Journal Article•10.1016/0167-8191(85)90006-7•

Function-based computing and parallelism: A review

[...]

Peter M. Kogge¹•Institutions (1)

IBM¹

1 Nov 1985

TL;DR: An alternative approach, based on function-based computing, is reviewed that to a large degree eliminates or avoids much of the Von Neumann bottleneck, and offers opportunities for the exploitation of parallelism in ways not even conceivable in classical computing.

...read moreread less

Abstract: One of today's most popular computing folktheorems states that true parallel processing and conventional computing techniques are mutually incompatible. The term Von Neumann bottleneck summarizes what many feel are the basic stumbling blocks preventing the successful application of parallelism in day-to-day computing. This paper reviews an alternative approach, based on function-based computing, that to a large degree eliminates or avoids much of the Von Neumann bottleneck, and offers opportunities for the exploitation of parallelism in ways not even conceivable in classical computing. Topics covered include a review of the Von Neumann bottleneck and imperative languages, the mathematical foundation of functional computing, namely lambda calculus, how this foundation provides opportunities for parallelism, and characteristics of the design space for implementation of these concepts in real computing hardware.

...read moreread less

Journal Article•10.1016/0167-8191(85)90025-0•

A technique for achieving portability among multiprocessors: Implementation on the Lemur

[...]

J. A. Clausing¹, Ray Hagstrom¹, Ewing Lusk¹, Ross Overbeek¹•Institutions (1)

Argonne National Laboratory¹

1 Jun 1985

TL;DR: A programming methodology for multiprocessors that leads to well-structured code, ease of debugging, and, most important, portability among multipROcessors offering different synchronization primitives is described.

...read moreread less

Abstract: We describe here a programming methodology for multiprocessors that leads to well-structured code, ease of debugging, and, most important, portability among multiprocessors offering quire different synchronization primitives. The emphasis in this paper is on the implementation of this methodology for the Lemur, an eight-processor machine built at Argonne National Laboratory. Included are several complete programs illustrating the methodology.

...read moreread less

Journal Article•10.1016/0167-8191(85)90015-8•

Dynamic computer structures for manifold utilization

[...]

Wolfgang Händler¹•Institutions (1)

University of Erlangen-Nuremberg¹

1 Mar 1985

TL;DR: It seems to be possible to create a Standard-Processor STP, which unifies the many different operation modes in computing, and the resulting performance will be higher on an average over all modes than it could be achieved e.g. if one tries to transpose a typical APP-Problem onto a conventional GPP- Processor.

...read moreread less

Abstract: The question is raised, whether flexibility of computer structures, which proved to be a fruitful concept in computer history, can be extended to an elegible utilization of different operation modes like General Purpose Processor (GPP), High Level Language Processor (HLL), Reduction Automation (RED), Data Flow Processor (FLO), Associative Parallel Processor (APP), Cellular Automation (CEL), and eg Digital Differential Analyser (DDA) It is argued that all these principles (each one having a certain merit) are not incompatible on principle Instead it seems to be possible to create a Standard-Processor STP, which unifies the many different operation modes These modes are made eligible by the programmer The resulting performance will not be the highest possible one with respect to one specific operation mode Nevertheless the performance will be higher on an average over all modes than it could be achieved eg if one tries to transpose a typical APP-Problem onto a conventional GPP-Processor (or to transpose in a reverse direction!) The STP is not designed in detail The paper is thought to be rather a stimulus to investigate a universal hardware set of registers, control, and logic circuits which admit quite different interpretation modes in computing

...read moreread less

Journal Article•10.1016/0167-8191(85)90010-9•

Control-driven, data-driven and demand-driven computer architecture☆

[...]

Philip Treleaven

1 Nov 1985

Journal Article•10.1016/0167-8191(85)90034-1•

Taxonomy of parallel processing and definitions

[...]

Swamy Kutti¹•Institutions (1)

Deakin University¹

1 Dec 1985

TL;DR: A simple taxonomy for the interconnected computer systems is presented by using the address space or buffer type as the key identifying element to distinguish the major difference between multicomputer and multiprocessor systems.

...read moreread less

Abstract: This paper presents a simple taxonomy for the interconnected computer systems by using the address space or buffer type as the key identifying element. The main aim of this classification is to distinguish the major difference between multicomputer and multiprocessor systems and to derive the definitions for the same.

...read moreread less

Journal Article•10.1016/0167-8191(85)90004-3•

Overview of parallel processing research in Japan

[...]

Ryutarou Ohbuchi¹•Institutions (1)

IBM¹

1 Nov 1985

TL;DR: An overview of Japanese research and development efforts on the parallel processing architectures is given and some examples of research projects for each of the application domains such as artificial intelligence, numerical processing, and others like database, image, graphics, etc.

...read moreread less

Abstract: This paper gives an overview of Japanese research and development efforts on the parallel processing architectures. Projects are categorized by their application domains. Following an introduction, general trends and some examples of research projects for each of the application domains such as artificial intelligence, numerical processing, and others like database, image, graphics, etc. are presented.

...read moreread less

Journal Article•10.1016/0167-8191(85)90009-2•

Parallel processing in artificial intelligence

[...]

Scott E. Fahlman

1 Nov 1985

TL;DR: The parallel approaches to AI are divided into three broad categories, though the boundaries between them are often fuzzy: the general programming approach, applications of parallelism to the processing of specialized programming languages, and massively parallel active memory systems.

...read moreread less

Abstract: Intelligence, whether in a machine or in a living creature, is a mixture of many abilities. Our current artificial intelligence (AI) technology does a good job of emulating some aspects of human intelligence, generally those things that, when they are done by people, seem to be serial and conscious. AI is very far from being able to match other human abilities, generally those things that seem to happen “in a flash” and without any feeling of sustained mental effort. We are left with an unbalanced technology that is powerful enough to be of real commercial value, but that is very far from exhibiting intelligence in any broad, human-like sense of the word. It is ironic that AI’s successes have come in emulating the specialized performance of human experts, and yet we cannot begin to approach the common sense of a five-year-old child or the sensory abilities and physical coordination of a rat.

...read moreread less

Journal Article•10.1016/0167-8191(85)90008-0•

Comparison of five multiprocessor systems

[...]

Daniel D. Gajski¹, Jih-Kwon Peir¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

1 Nov 1985

TL;DR: This paper generalizes the traditional dataflow model of computation and defines the essential problems in multiprocessing: control implementation, program partitioning, scheduling, synchronization, and memory access.

...read moreread less

Abstract: This paper generalizes the traditional dataflow model of computation and defines the essential problems in multiprocessing: control implementation, program partitioning, scheduling, synchronization, and memory access. The paper assumes that these essential problems are axes of a multiprocessor design space and that the solutions to these problems are values on the axes. Each point in the space represents a multiprocessor including a computational paradigm that a user must follow to achieve high performance and efficiency on the particular machine. Thus, a classification of machines from the user's point of view is introduced naturally. Five well-known multiprocessors are compared using this classification scheme.

...read moreread less

Journal Article•10.1016/0167-8191(85)90027-4•

Parallel algorithms for rounding exact evaluation of sums of products

[...]

Wilhelm Oberaigner

1 Jun 1985

TL;DR: The first method transforms products to sums and applies one of the known methods for rounding exact summation in time complexity O( n 2 ) with n processors ( n denoting the “length” of the expression).

...read moreread less

Abstract: We propose two parallel algorithms for the rounding exact evaluation of sums of products. The first method transforms products to sums and applies one of the known methods for rounding exact summation in time complexity O( n 2 ) with n processors ( n denoting the “length” of the expression). The second method approximates the products as well as the sum and has average time complexity O( ld ( n )) for n /2 processors and has average time complexity O( n ) viewed as a sequential algorithm.

...read moreread less

Journal Article•10.1016/0167-8191(85)90018-3•

An algorithm for inverse square-roots

[...]

J. J. Modi¹, John S. Rollett•Institutions (1)

University of Cambridge¹

1 Mar 1985

TL;DR: The algorithm is designed to be particularly suited for parallel computation, in which floating-point multiplication, floating- point addition and fixed-point arithmetic can be performed simultaneously.

...read moreread less

Abstract: An algorithm is presented for finding x −1 2 , given x . The algorithm is designed to be particularly suited for parallel computation, in which floating-point multiplication, floating-point addition and fixed-point arithmetic can be performed simultaneously.

...read moreread less

Journal Article•10.1016/0167-8191(85)90032-8•

Physical datarepresentation in a multiprocessor database machine

[...]

Jørgen Staunstrup¹, Jens Ove Jespersen¹, Ole V. Johansen¹•Institutions (1)

Aarhus University¹

1 Dec 1985

TL;DR: By choosing a particular representation, the grid file, and analyzing its behaviour, this work wants to point out the difficulties encountered in trying to achieve speed improvements from a multiprocessor.

...read moreread less

Abstract: By using a multiprocessor to implement the lowest level of a relational database we want to achieve fast execution of database operations such as join, find, and update But the potential speed improvements provided by a multiprocessor can only be achieved if one can construct algorithms and corresponding physical data representations that can utilize the potential By choosing a particular representation, the grid file, and analyzing its behaviour, we want to point out the difficulties encountered in trying to achieve speed improvements from a multiprocessor

...read moreread less

Journal Article•10.1016/0167-8191(85)90019-5•

A method for SIMDMIMD functionally reconfigurable multimicroprocessor systems design and parallel data exchange algorithms

[...]

Nikola Kasabov

1 Mar 1985

TL;DR: The problems of designing such MMPSs are discussed as well as some realisations of a data exchange module as a register module and some algorithms for parallel data exchange between the MPMs.

...read moreread less

Abstract: In SIMD MIMD functionally reconfigurable multimicroprocessor systems /MMPS/ some of the microprocessor modules /MPM/ can execute a common program /SIMD mode/ while the rest of the MPMs are executing their own programs /MIMD mode/. Every MPM at any moment can be reconfigured functionally from one to another mode. In this paper the problems of designing such MMPSs are discussed as well as some realisations of a data exchange module as a register module and some algorithms for parallel data exchange between the MPMs. A hierarchically structed MMPS are developed.

...read moreread less

Journal Article•10.1016/0167-8191(85)90033-X•

The transformation of collections of communicating sequential processes that represent pipeline configurations

[...]

Shirley Williams¹•Institutions (1)

University of Reading¹

1 Dec 1985

TL;DR: This paper proposes a method to merge two communicating sequential processes (that would be adjacent in the pipeline) into one communicating sequential process by matching the output expressions of the first communicating Sequential Processes with the appropriate input expressions from the second.

...read moreread less

Abstract: The segments of a pipelined process can be represented as communicating sequential processes. The communication between the segments of the pipeline are represented as channel communication between the communicating sequential processes. It is possible to merge two communicating sequential processes (that would be adjacent in the pipeline) into one communicating sequential process. This is done by matching the output expressions of the first communicating sequential process (e.g. chlexpr) with the appropriate input expressions of the second communicating sequential process (e.g. ch?var) and replacing each pair by a single assignment statement (var = expr).

...read moreread less