Top 251 papers presented at Parallel Computing in 1994

Showing papers presented at "Parallel Computing in 1994"

Journal Article•10.1016/0167-8191(94)90028-0•

Monitors, messages, and clusters: the p4 parallel programming system

[...]

Ralph Butler¹, Ewing Lusk²•Institutions (2)

University of North Florida¹, Argonne National Laboratory²

1 Apr 1994

TL;DR: The design goals, history, and system architecture of p4 are discussed and a diverse collection of applications that have demonstrated the utility of the p4 system are described.

...read moreread less

Abstract: p4 is a portable library of C and Fortran subroutines for programming parallel computers. It is the current version of a system that has been in use since 1984. It includes features for explicit parallel programming of shared-memory machines, distributed-memory machines (including heterogeneous networks of workstations), and clusters, by which we mean shared-memory multiprocessors communicating via message passing. We discuss here the design goals, history, and system architecture of p4 and describe briefly a diverse collection of applications that have demonstrated the utility of p4.

...read moreread less

217 citations

Journal Article•10.1016/0167-8191(94)90033-7•

The design of a standard message passing interface for distributed memory concurrent computers

[...]

David W. Walker¹•Institutions (1)

Oak Ridge National Laboratory¹

1 Apr 1994

TL;DR: An overview of MPI, a proposed standard message passing interface for MIMD distributed memory concurrent computers, which includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies is presented.

...read moreread less

Abstract: This paper presents an overview of MPI, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of MPI has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. MPI includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies. While making use of new ideas where appropriate, the MPI standard is based largely on current practice.

...read moreread less

145 citations

Proceedings Article•

Parallel computing (2nd ed.): theory and practice

[...]

Michael J. Quinn¹•Institutions (1)

Oregon State University¹

2 Jan 1994

135 citations

Journal Article•10.1016/0167-8191(94)90080-9•

Message-passing multi-cell molecular dynamics on the Connection Machine 5

[...]

D. M. Beazley¹, Peter S. Lomdahl¹•Institutions (1)

Los Alamos National Laboratory¹

1 Feb 1994

TL;DR: In this article, a message-passing multi-cell approach is proposed for short-range molecular dynamics simulations on distributed memory MIMD multicomputers based on a message passing multicell approach.

...read moreread less

Abstract: We present a new scalable algorithm for short-range molecular dynamics simulations on distributed memory MIMD multicomputers based on a message-passing multi-cell approach We have implemented the algorithm on the Connection Machine 5 (CM-5) and demonstrate that meso-scale molecular dynamics with more than 108 particles is now possible on massively parallel MIMD computers Typical runs show single particle update-times of 015 μs in 2 dimentions (2D) and approximately 1 μs in 3 dimensions (3D) on a 1024 node CM-5 without vector units, corresponding to more than 18 Gflops overall performance We also present a scaling equation which agrees well with actually observed timings

...read moreread less

110 citations

Journal Article•10.1016/0167-8191(94)90021-3•

An overview of message passing environments

[...]

Oliver A. McBryan¹•Institutions (1)

University of Colorado Boulder¹

1 Apr 1994

TL;DR: An introduction to MPP systems in general is provided, and the development of ‘portability platforms’ — message passing systems that have been devised solely to allow portability of message passing programs between different systems are reviewed.

...read moreread less

Abstract: A majority of the MPP systems designed to date have been MIMD distributed memory systems. For almost all of these systems, message passing environments have provided the primary mechanism for programming multiprocessor applications. In this paper we provide an introduction to MPP systems in general. We then introduce current MPP message passing interfaces, by tracing their historical development over the last 10 years. In addition to their use within a single MPP architecture, we discuss the use of message passing systems to interconnect more loosely coupled processors in heterogeneous environments. Finally we review the development of ‘portability platforms’ — message passing systems that have been devised solely to allow portability of message passing programs between different systems.

...read moreread less

109 citations

Proceedings Article•10.1109/MPCS.1994.367018•

Massively parallel computing systems with real time constraints: the "Algorithm Architecture Adequation" methodology

[...]

Y. Sorel

2 May 1994

TL;DR: With this methodology, the application algorithm as well as the MPCS are specified with graphs, then the implementation of an algorithm on a MPCS in respect with real-time constraints may be formalized in terms of graph transformations, which allows one to optimize the real- time performances of the implementation taking into account critical inter-processor communications.

...read moreread less

Abstract: Massively Parallel Computing Systems (MPCS) provide high performance computing generally used to accelerate numerical computation applications. We present a methodology called "Algorithm Architecture Adequation" used to take advantage of the computation power of these systems in the case of real-time applications. With this methodology, the application algorithm as well as the MPCS are specified with graphs, then the implementation of an algorithm on a MPCS in respect with real-time constraints may be formalized in terms of graph transformations. This allows one to optimize the real-time performances of the implementation taking into account critical inter-processor communications. As a result, real-time distributed executives are produced automatically without dead-lock and with minimum overhead. This reduces drastically the development cycle for real-time applications running on MPCS. >

...read moreread less

88 citations

Journal Article•10.1016/0167-8191(94)90004-3•

Scalable iterative solution of sparse linear systems

[...]

Mark T. Jones¹, Paul E. Plassmann¹•Institutions (1)

Argonne National Laboratory¹

1 May 1994

TL;DR: It is found that the increase in parallelism afforded by the coloring-based orderings more than offsets any increase in the number of iterations required for the convergence of the conjugate gradient algorithm.

...read moreread less

Abstract: The efficiency of a parallel implementation of the conjugate gradient method preconditioned by an incomplete Cholesky factorization can very dramatically depending on the column ordering chosen. One method to minimize the number of major parallel steps is to choose an ordering based on a coloring of the symmetric graph representing the nonzero adjacency structure of the matrix. In this paper, we compare the performance of the preconditioned conjugate gradient method using these coloring orderings with a number of standard orderings on matrices arising from finite element models. Because optimal colorings for these systems may not be known a priori, we employ a graph coloring heuristic to obtain consistent colorings. Based on lower bounds obtained from the local structure of these systems, we find that the colorings determined by the heuristic are nearly optimal. For these problems, we find that the increase in parallelism afforded by the coloring-based orderings more than offsets any increase in the number of iterations required for the convergence of the conjugate gradient algorithm. We also demonstrate that the performance of this parallel preconditioner is scalable. We give results from the Intel iPSC/860 to support our claims.

...read moreread less

86 citations

Journal Article•10.1016/0167-8191(94)90031-0•

Portable programming with the PARMACS message-passing library

[...]

R. Calkin, Rolf Hempel, H.-C. Hoppe, P. Wypior

1 Apr 1994

TL;DR: The PARMACS library which is presented in this paper defines a portability layer which has been implemented on most MIMD computers, ranging from MPP systems to workstation networks, and does not cause any significant overhead.

...read moreread less

Abstract: Message passing is the most efficient and most general programming paradigm currently used on parallel machines with distributed memory. In the absence of a message passing standard the broad variety of vendor-specific interfaces inhibits the portability of application programs. The PARMACS library which is presented in this paper defines a portability layer which has been implemented on most MIMD computers, ranging from MPP systems to workstation networks. The new release version 6.0 is discussed in detail. It is available for applications written in Fortran 77 and C. To assess the time overhead caused by PARMACS, two benchmark applications with differing communication requirements have been implemented using machine-specific interfaces and portably using PARMACS. The performance has been compared for problems of various sizes on three machines of different architectures. In general the use of PARMACS does not cause any significant overhead.

...read moreread less

79 citations

Journal Article•10.1016/0167-8191(94)90117-1•

Microscopic traffic modeling on parallel high performance computers

[...]

K. Nagel, A. Schleicher

1 Jan 1994

TL;DR: A simple, rule-based approach to traffic flow can yield astonishingly realistic results and is therefore a candidate for very fast large scale microscopic traffic simulations and finds its highest computing speed by employing a single-bit coding scheme used in, e.g., Ising-model programming.

...read moreread less

Abstract: A simple, rule-based approach to traffic flow can yield astonishingly realistic results and is therefore a candidate for very fast large scale microscopic traffic simulations. In the present article, we evaluate two conceptually different codings of the same dynamics on parallel supercomputers. We use a Parsytec GCel-3 (1024 nodes), an Intel iPSC/860 (32 nodes), and a Connection Machine CM-5 (32 nodes). For comparison purposes, we use as well a NEC-SX3/11 traditional single node vector computer, and a net of coupled workstations. Compared to published computing speeds of microscopic traffic models, our model proves to be up to about 1000 times faster. We find our highest computing speed by employing a single-bit coding scheme used in, e.g., Ising-model programming. As traffic flow is a one-dimensional problem, a complication is that geometric parallelization has to be done in the same direction as single-bit coding. Nevertheless, we reach efficiencies near 100 percent for large systems. We use these computational resources in order to obtain high quality data of the model's average behavior (fundamental diagrams). In addition, we present results from modeling a road network, whose composition out of basic objects leads in a natural way to some temporal ‘slackness’ which helps balancing load asymmetries.

...read moreread less

78 citations

Journal Article•10.1016/0167-8191(94)90022-1•

The IBM external user interface for scalable parallel systems

[...]

Vasanth Bala¹, Jehoshua Bruck¹, Raymond M. Bryant¹, Robert Cypher¹, Peter de Jong¹, Pablo Elustondo¹, D. D. Frye¹, Alex Ho¹, Ching-Tien Ho¹, Gail Irwin¹, Shlomo Kipnis¹, Richard D. Lawrence¹, Marc Snir¹ - Show less +9 more•Institutions (1)

IBM¹

1 Apr 1994

TL;DR: This paper examines several aspects of the design and development of the EUI, a library of coordination and communication routines that can be invoked from within FORTRAN or C application programs.

...read moreread less

Abstract: The IBM External User Interface (EUI) for scalable parallel systems is a parallel programming library designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM 9076 SP1, was announced in February 1993. In essence, the EUI is a library of coordination and communication routines that can be invoked from within FORTRAN or C application programs. The EUI consists of four main components: task management routines, message passing routines, task group routines, and collective communication routines. This paper examines several aspects of the design and development of the EUI.

...read moreread less

64 citations

Journal Article•10.1016/0167-8191(94)90001-9•

Mapping uniform loop nests onto distributed memory architectures

[...]

Alain Darte¹, Yves Robert¹•Institutions (1)

École normale supérieure de Lyon¹

1 May 1994

TL;DR: The partitioning technique extends the methods developed for systolic array design methodologies to loop nests with several statements to synthesize a virtual grid architecture from the original loop nest.

...read moreread less

Abstract: This paper deals with scheduling, mapping and partitioning techniques for uniform loop nests. Target machines are SPMD distributed memory parallel computers. We use affine-by-statement scheduling and affine-by-variable mapping to synthesize a virtual grid architecture from the original loop nest. The virtual grid architecture is then partitioned into a physical processor grid. The key to the mapping strategy is the communication graph, which enables us to derive optimal mappings, i.e. where the number of communications is proved to be minimal. The partitioning technique extends the methods developed for systolic array design methodologies to loop nests with several statements.

...read moreread less

Journal Article•10.1016/0167-8191(94)90130-9•

Topological properties of the crossed cube architecture

[...]

Kemal Efe, P. K. Blackwell¹, W. Slough¹, T. Shiau²•Institutions (2)

University of Missouri¹, New Jersey Institute of Technology²

1 Dec 1994

TL;DR: An analysis of the number of isomorphic subgraphs, a formal proof for the diameter, and some new embedding properties of the crossed cube are investigated.

...read moreread less

Abstract: Crossed cube is a variant obtained from the hypercube by redirecting a subset of the edges to span two or more dimensions. As a result, the diameter is reduced by half without increasing the link complexity. The use of the crossed cube as a parallel architecture, and in a reconfigurable system has been investigated earlier. The topological properties of the crossed cube are investigated in this paper. The main results of this paper include: an analysis of the number of isomorphic subgraphs, a formal proof for the diameter, and some new embedding properties.

...read moreread less

Journal Article•10.1016/0167-8191(94)90023-X•

The NX message passing interface

[...]

Paul R. Pierce¹•Institutions (1)

Intel¹

1 Apr 1994

TL;DR: This paper explains design tradeoffs and why the NX interface and its implementations represent successful achievement of that balance in a highly evolved, full featured, high performance interface for parallel applications.

...read moreread less

Abstract: The challenge in designing a message passing interface for massively parallel distributed memory supercomputers is to balance high performance with usability. The NX interface and its implementations represent successful achievement of that balance in a highly evolved, full featured, high performance interface for parallel applications. It is the vendor-supplied programming interface on Intel multicomputers, implementing the typed send/receive model of multicomputer message passing. Central to this paper is a section on the philosophy behind the NX design, which explains design tradeoffs and why we think they result in a successful interface.

...read moreread less

Journal Article•10.1016/0167-8191(94)90030-2•

Express is not just a message passing system: current and future directions in Express

[...]

Jon Flower, Adam Kolawa

1 Apr 1994

TL;DR: A recently developed programming style which greatly simplifies programming as well as directly addressing complex issues such as dynamic load balancing and fault tolerance is introduced.

...read moreread less

Abstract: We describe some of the features of Express and the way that they were developed as a response to the needs of application programmers. We show how currently emerging computing platforms have led to new application needs and show how these are satisfied with Express features. We introduce a recently developed programming style which greatly simplifies programming as well as directly addressing complex issues such as dynamic load balancing and fault tolerance. Finally, we present a comparison of Express' features and motivation to the Message Passing Interface (MPI) standard currently being developed.

...read moreread less

Journal Article•10.1016/0304-3975(94)90163-5•

Methods for message routing in parallel machines

[...]

Tom Leighton¹•Institutions (1)

Massachusetts Institute of Technology¹

6 Jun 1994

TL;DR: This paper surveys many of the approaches that have been proposed for solving communication problems in parallel machines from a theoretician's perspective, although the paper was written for a general audience.

...read moreread less

Abstract: In this paper, we survey many of the approaches that have been proposed for solving communication problems in parallel machines. The material is presented from a theoretician's perspective, although the paper was written for a general audience.

...read moreread less

Journal Article•10.1016/0167-8191(94)90066-3•

Summary of GENESIS work at the European Centre for Medium-range Weather Forecasts (ECMWF)

[...]

Tuomo Kauranne¹•Institutions (1)

University of Eastern Finland¹

1 Nov 1994

TL;DR: In this article, a transposition strategy was used to implement atmospheric models on parallel supercomputers with thousands of processors, which can be applied to current sequential numerical methods and even most subroutines without any modifications.

...read moreread less

Abstract: Benchmarks with simplified atmospheric models on small parallel computers, and performance estimates with operational weather models on massively parallel computers, indicate that atmospheric models can be efficiently implemented on future parallel supercomputers with thousands of processors. By using the so-called transposition strategy, which employs a time-varying domain decomposition, current sequential numerical methods, and even most subroutines, can be employed in the parallel models without any modifications.

...read moreread less

Journal Article•10.1016/0167-8191(94)90002-7•

Automating non-unimodular loop transformations for massive parallelism

[...]

Jingling Xue¹•Institutions (1)

Nanyang Technological University¹

1 May 1994

TL;DR: An algorithm is presented that rewrites a loop nest under any non-singular (unimodular or non-unimmodular) transformation in a mechanical manner and works nicely with unimodular transformations being treated as a special case.

...read moreread less

Abstract: Loop transformations have been shown to be very useful in parallelising compilation and regular array design. This paper provides a solution to the open problem of automatic rewriting loop nests for non-unimodular transformations. We present an algorithm that rewrites a loop nest under any non-singular (unimodular or non-unimodular) transformation in a mechanical manner. The algorithm works nicely with unimodular transformations being treated as a special case. The extra time complexity incurred due to non-unimodularity is polynomially bounded by the depth of the loop nest.

...read moreread less

Journal Article•10.1016/0167-8191(94)90029-9•

The design and evolution of Zipcode

[...]

Anthony Skjellum¹, Steven G. Smith², Nathan E. Doss¹, Alvin P. Leung³, Manfred Morari⁴ - Show less +1 more•Institutions (4)

Mississippi State University¹, Lawrence Livermore National Laboratory², Syracuse University³, California Institute of Technology⁴

1 Apr 1994

TL;DR: Key features in Zipcode appear in the forthcoming MPI standard, including ‘gather-send’ and ‘receive-scatter’ semantics, based on persistent Zipcode ‘invoices’, both as a means to simplify message passing, and as an means to reveal more potential runtime optimizations.

...read moreread less

Abstract: Zipcode is a message-passing and process-management system that was designed for multicomputers and homogeneous networks of computers in order to support libraries and large-scale multicomputer software. The system has evolved significantly over the last five years, based on our experiences and identified needs. Features of Zipcode that were originally unique to it, were its simulataneous support of static process groups, communication contexts, and virtual topologies, forming the ‘mailer’ data structure. Point-to-point and collective operations reference the underlying group, and use contexts to avoid mixing up messages. Recently, we have added ‘gather-send’ and ‘receive-scatter’ semantics, based on persistent Zipcode ‘invoices’, both as a means to simplify message passing, and as a means to reveal more potential runtime optimizations. Key features in Zipcode appear in the forthcoming MPI standard.

...read moreread less

Journal Article•10.1016/S0167-8191(06)80013-X•

Parallel multiple shooting for the solution of initial value problems

[...]

M. Kiehl¹•Institutions (1)

Technische Universität München¹

1 Mar 1994

TL;DR: The computing time for the numerical solution of initial-value problems is closely related to the number of evaluations of the right-hand side and can only be reduced slightly on parallel computers, even if simultaneous evaluations of right- hand side are counted as one evaluation.

...read moreread less

Abstract: The computing time for the numerical solution of initial-value problems is closely related to the number of evaluations of the right-hand side. In general this number can only be reduced slightly on parallel computers, even if simultaneous evaluations of right-hand side are counted as one evaluation. For special problems, however, it is possible to construct special methods which show a remarkable speedup on parallel computers. Multiple shooting, a method for boundary-value problems with an inherent parallelism, can also be applied efficiently to linear initial-value problems and to non-linear initial-value problems if good approximations are available.

...read moreread less

Journal Article•10.1016/0167-8191(94)90120-1•

A partially asynchronous and iterative algorithm for distributed load balancing

[...]

Jianjian Song¹•Institutions (1)

National University of Singapore¹

1 Jun 1994

TL;DR: It is proved that the algorithm can achieve the maximum load imbalance of no more than ⌈ d 2 ⌉ tasks, where d is the diameter of a network and the algorithm converges geometrically as assured by a theorem for balancing continuous workload.

...read moreread less

Abstract: Defining task as independent entities with identical execution time and the workload of a processor as the number of tasks, load balancing is to distribute tasks among processors of a network so that the resulting workload of every processor will be as close to the average over all the workloads as possible. We propose in this paper a partially asynchronous and iterative algorithm for distributed load balancing, show its properties, and report its simulation results. The algorithm converges geometrically as assured by a theorem for balancing continuous workload. We prove that the algorithm can achieve the maximum load imbalance of no more than ⌈ d 2 ⌉ tasks, where d is the diameter of a network. Our simulation not only validated the properties but also showed that the algorithm could produce much smaller load imbalances for hypercubes. The obtained imbalances for hypercubes of order up to ten were no more than two tasks and 56% of the sample runs produced only one task difference, as opposed to the theoretical maximum of six tasks.

...read moreread less

Journal Article•10.1016/0167-8191(94)90109-0•

Heuristic algorithms for task assignment and scheduling in a processor network

[...]

Shen Shen Wu¹, David Sweeting¹•Institutions (1)

Queen Mary University of London¹

1 Jan 1994

TL;DR: The proposed algorithms are based on the family of heuristic approaches, and are particularly suitable for large grain parallel tasks and message passing parallel computers.

...read moreread less

Abstract: This paper addresses the problem of statically assigning and scheduling parallel executable tasks to processor networks in parallel or distributed computers to meet real-time computing constraints. Three procedures are carried out for the task assignment and scheduling. First, a task assignment algorithm is used to group M tasks into N task clusters and then initially to assign the N clusters onto N processors without considering the communication link constraints of the processors. Then a communication link number reduction algorithm is applied to remove the excess number of links according to the design limitations of the processors. A new structure of the processor network suitable for the N task clusters is then determined. Finally, the N assigned task clusters are scheduled by a task scheduling algorithm in order to minimize the idle time caused by interprocessor synchronized data communication and indirect data communication. The proposed algorithms are based on the family of heuristic approaches, and are particularly suitable for large grain parallel tasks and message passing parallel computers.

...read moreread less

Journal Article•10.1016/0167-8191(94)90123-6•

On alternating segment Crank-Nicolson scheme

[...]

Zhang Baolin, Li Wenzhi¹•Institutions (1)

Academia Sinica¹

1 Jun 1994

TL;DR: The Alternating Segment Crank-Nicolson method for the diffusion equation is developed and is unconditionally stable and has the obvious property of parallelism.

...read moreread less

Abstract: In this paper the Alternating Segment Crank-Nicolson method for the diffusion equation is developed. The method is unconditionally stable and has the obvious property of parallelism. The numerical experiments by the method are made on a 5 transputer system, and the speedup of the method may be greater than 3.

...read moreread less

Journal Article•10.1016/0304-3975(94)90170-8•

Markov analysis of multiple-disk prefetching strategies for external merging

[...]

Vinay Sadananda Pai, Alejandro A. Schäffer¹, Peter Varman¹•Institutions (1)

Rice University¹

6 Jun 1994

TL;DR: Open-form expressions for the average parallelism obtainable for a given cache size and number of disks are derived for both prefetching strategies and these analytic results are confirmed by simulation.

...read moreread less

Abstract: Multiple-disk organizations can be used to improve the I/O performance of problems like external merging. Concurrency can be introduced by overlapping I/O requests at different disks and by prefetching additional blocks on each I/O operation. To support this prefetching, a memory cache is required. Markov models for two prefetching strategies are developed and analyzed. Closed-form expressions for the average parallelism obtainable for a given cache size and number of disks are derived for both prefetching strategies. These analytic results are confirmed by simulation.

...read moreread less

Journal Article•10.1016/S0167-8191(06)80016-5•

Load balancing with network partitioning using host groups

[...]

David J. Evans¹, W. U. N. Butt¹•Institutions (1)

Loughborough University¹

1 Mar 1994

TL;DR: Network partitioning strategies are proposed to reduce the communication overhead of load balancing algorithms in a large distributed system environment by limiting the exchange of load information messages within smaller groups of hosts while restricting the transfer of tasks to long distance remote hosts which involve high communication costs.

...read moreread less

Abstract: One of the major issues concerning the efficiency and effectiveness of dynamic load balancing algorithms is their scalability. As the size of the distributed computer system increases, the overheads of a load balancing algorithm may increase resulting in a poor scalability. In this paper, network partitioning strategies are proposed to reduce the communication overhead of load balancing algorithms in a large distributed system environment. Several host-grouping strategies are suggested to improve the performance of load balancing algorithms. This is achieved by limiting the exchange of load information messages within smaller groups of hosts while restricting the transfer of tasks to long distance remote hosts which involve high communication costs. The group memberships are changed dynamically to adapt the varying load conditions across the entire network. Effectiveness of the proposed strategies is evaluated by simulations.

...read moreread less

Proceedings Article•10.1109/MPCS.1994.367020•

The Paprica massively parallel processor

[...]

Alberto Broggi, G. Conte, Francesco Gregoretti, Claudio Sansoe, Leonardo Reyneri - Show less +1 more

2 May 1994

TL;DR: The main goal of the project is to develop a subsystem that operates as a processing unit attached to a standard workstation and in perspective as a low-cost low-sized specialized embedded system devoted to low level image analyses and cellular neural networks emulation.

...read moreread less

Abstract: This paper describes a complete 6-year project, starting from its theoretical basis up to the hardware and software system implementation, and to the description of its future evolution. The main goal of the project is to develop a subsystem that operates as a processing unit attached to a standard workstation and in perspective as a low-cost low-sized specialized embedded system devoted to low level image analyses and cellular neural networks emulation. The architecture has been extensively used for basic low level image analysis tasks up to optical flow computation and feature tracking, showing encouraging performances even in the first prototype version. >

...read moreread less

Journal Article•10.1016/0167-8191(94)90011-6•

Multiplication of matrices of arbitrary shape on a data parallel computer

[...]

Kapil K. Mathur, S. Lennart Johnsson

1 Jul 1994

TL;DR: Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described.

...read moreread less

Abstract: Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described. No assumption is made on the shape or ...

...read moreread less

Journal Article•10.1016/0167-8191(94)90024-8•

CMMD: active messages on the CM-5

[...]

Lewis W. Tucker, Alan Mainwaring

1 Apr 1994

TL;DR: Examples are given which show how developers may incorporate active messages in application-specific ways, as an important new communication primitive for building message passing systems.

...read moreread less

Abstract: Active messages provide an important new communication primitive for building message passing systems. CMMD, the message passing system of the CM-5, uses active messages as a basic subtrate for constructing multiple, low overhead, communication paradigms. Examples are given which show how developers may incorporate active messages in application-specific ways.

...read moreread less

Journal Article•10.1016/0167-8191(94)90025-6•

Message passing on the Meiko CS-2

[...]

Eric Barton, James Cownie, Moray McLaren

1 Apr 1994

TL;DR: The hardware and software resources which are provided to the programmer to support message passing on the Meiko Computing Surface 2 (CS-2), as well as the philosophy which drove the design are described.

...read moreread less

Abstract: This paper describes the hardware and software resources which are provided to the programmer to support message passing on the Meiko Computing Surface 2 (CS-2), as well as the philosophy which drove the design. It also gives some measured communication performance numbers achieved by the machine.

...read moreread less

Journal Article•10.1016/0304-3975(94)90169-4•

Guarded repair of dependable systems

[...]

Hermann de Meer¹, Kishor S. Trivedi¹, Mario Dal Cin²•Institutions (2)

Duke University¹, University of Erlangen-Nuremberg²

6 Jun 1994

TL;DR: It is shown that guarded repair can improve system performance and dependability significantly and a time-dependent optimality of dependable, parallel configurations can be determined from the results.

...read moreread less

Abstract: Imperfect coverage and nonnegligible reconfiguration delay are known to have a deleterious effect on the dependability and the performance of a multiprocessor system. In particular, increasing the number of processor elements does not always increase dependability. An obvious reason for this is that the total failure rate increases, generally, linearly with the number of components in the system. It is also a well-known fact that the performance gain due to parallelism mostly turns out to be sublinear with the number of processors. It is therefore important to optimize the degree of parallelism in system design. A related issue is that by deferring repair, it is sometimes possible to improve system dependability. In this case decisions have to be made dynamically as to when to repair and when not to repair. Most of the current research deals with static optimization of the number of processors. No systematic approach for dynamic control of dependable systems has been proposed so far. Dynamic, i.e. transient, decision of whether or not to repair is the optimization problem considered in this paper. We propose extended Markov reward models (EMRM) to capture such questions. EMRM are a marriage between performability modeling techniques and Markov decision theory. A numerical solution procedure is developed to provide optimal solution trajectories for this problem. EMRM are a general framework for the dynamic optimization of reconfigurable, dependable systems. The optimization is applied on the basis of several performance and dependability measures. In particular, we explore availability, capacity-oriented availability, performance-oriented unavailability, and performability measures. Furthermore, off-line and on-line repair strategies are compared. We show that guarded repair can improve system performance and dependability significantly. The control strategies and reward functions differ a lot in each case. Each scenario turns out to be interest in its own right. A time-dependent optimality of dependable, parallel configurations can be determined from our results.

...read moreread less

Journal Article•10.1016/0304-3975(94)90162-7•

Clock construction in fully asynchronous parallel systems and PRAM simulation

[...]

Yonatan Aumann¹, Michael O. Rabin¹•Institutions (1)

Hebrew University of Jerusalem¹

6 Jun 1994

TL;DR: A novel clock for asynchronous systems is given, which can be implemented in a system with no atomicity assumptions, and in the presence of an adaptive adversary scheduler, and how to harness this clock to drive an efficient PRAM simulation on an asynchronous system.

...read moreread less

Abstract: We consider the problem of simulating synchronous computations on asynchronous shared memory systems. The systems we consider allow for arbitrary asynchronous behavior of the processors. In addition, we make very limited (and in some cases no) assumptions about the atomicity of read and write operations to shared memory. We provide detailed definitions of these asynchronous systems and their atomicity properties. The first construction in this paper is a novel clock for asynchronous systems. The clock is a basic tool for synchronization in the asynchronous environment. The constructiion we give is extremely robust, and can be implemented in a system with no atomicity assumptions, and in the presence of an adaptive adversary scheduler The correct behavior of the clock is obtained with overwhelming probability (>1−2−αn, α>0). We then show how to harness this clock to drive an efficient PRAM simulation on an asynchronous system. The simulation requires an O(log2 n) work, and O(log n) space, overhead. This improves by a log n factor on the efficiency of previously obtained simulation results, while relaxing the assumptions on the underlying asynchronous system.

...read moreread less

...

Expand