Top 250 papers published in the topic of Distributed memory in 1990

Showing papers on "Distributed memory published in 1990"

Proceedings Article•10.1145/285930.285997•

Memory consistency and event ordering in scalable shared-memory multiprocessors

[...]

Kourosh Gharachorloo¹, Daniel E. Lenoski¹, James Laudon¹, Phillip B. Gibbons¹, Anoop Gupta¹, John L. Hennessy¹ - Show less +2 more•Institutions (1)

Stanford University¹

1 May 1990

TL;DR: A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced and is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization.

...read moreread less

Abstract: Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.

...read moreread less

1,275 citations

Proceedings Article•10.1145/99163.99182•

Munin: distributed shared memory based on type-specific memory coherence

[...]

John K. Bennett¹, John B. Carter¹, Willy Zwaenepoel¹•Institutions (1)

Rice University¹

1 Feb 1990

TL;DR: This paper focuses on the design and use of Munin's memory coherence mechanisms, and compares the approach to previous work in this area.

...read moreread less

Abstract: We are developing Munin, a system that allows programs written for shared memory multiprocessors to be executed efficiently on distributed memory machines. Munin attempts to overcome the architectural limitations of shared memory machines, while maintaining their advantages in terms of ease of programming. Our system is unique in its use of loosely coherent memory, based on the partial order specified by a shared memory parallel program, and in its use of type-specific memory coherence. Instead of a single memory coherence mechanism for all shared data objects, Munin employs several different mechanisms, each appropriate for a different class of shared data object. These type-specific mechanisms are part of a runtime system that accepts hints from the user or the compiler to determine the coherence mechanism to be used for each object. This paper focuses on the design and use of Munin's memory coherence mechanisms, and compares our approach to previous work in this area.

...read moreread less

466 citations

Proceedings Article•10.1109/ICDCS.1990.89257•

Real-time synchronization protocols for shared memory multiprocessors

[...]

Ragunathan Rajkumar¹•Institutions (1)

IBM¹

1 Jan 1990

TL;DR: A priority-based synchronization protocol that explicitly uses shared-memory primitives is defined and analyzed, and the underlying priority consideration for a shared memory synchronization protocol are studied and priority assignments to be used by the protocol are derived.

...read moreread less

Abstract: A priority-based synchronization protocol that explicitly uses shared-memory primitives is defined and analyzed. A solution that has been proposed for bounding and minimizing synchronization delays in real-time systems is briefly reviewed. The waiting times introduced by synchronization requirements in multiple-processor environments are identified, and a set of goals for priority-based multiprocessor synchronization protocols is derived. The underlying priority consideration for a shared memory synchronization protocol are studied and priority assignments to be used by the protocol are derived. >

...read moreread less

259 citations

Journal Article•10.1109/2.53355•

Algorithms implementing distributed shared memory

[...]

Michael Stumm¹, S. Zhou¹•Institutions (1)

University of Toronto¹

01 May 1990-IEEE Computer

TL;DR: It is shown that the correct choice of algorithm is determined largely by the memory access behavior of the applications, and some limitations of distributed shared memory are noted.

...read moreread less

Abstract: Four basic algorithms for implementing distributed shared memory are compared. Conceptually, these algorithms extend local virtual address spaces to span multiple hosts connected by a local area network, and some of them can easily be integrated with the hosts' virtual memory systems. The merits of distributed shared memory and the assumptions made with respect to the environment in which the shared memory algorithms are executed are described. The algorithms are then described, and a comparative analysis of their performance in relation to application-level access behavior is presented. It is shown that the correct choice of algorithm is determined largely by the memory access behavior of the applications. Two particularly interesting extensions of the basic algorithms are described, and some limitations of distributed shared memory are noted. >

...read moreread less

258 citations

Journal Article•10.1016/0743-7315(90)90129-D•

Run-time scheduling and execution of loops on message passing machines

[...]

Joel H. Saltz¹, Kathleen Crowley², Kathleen Crowley¹, Ravi Mirchandaney², Ravi Mirchandaney¹, Harry Berryman¹ - Show less +2 more•Institutions (2)

Langley Research Center¹, Yale University²

01 Apr 1990-Journal of Parallel and Distributed Computing

TL;DR: This work examines the effectiveness of optimizations aimed to allowing distributed machine to efficiently compute inner loops over globally defined data structures by targeting loops in which some array references are made through a level of indirection.

...read moreread less

200 citations

Proceedings Article•10.1145/99163.99183•

Supporting shared data structures on distributed memory architectures

[...]

C. Koelbel¹, Piyush Mehrotra², J. Van Rosendale²•Institutions (2)

Purdue University¹, Langley Research Center²

1 Feb 1990

TL;DR: A new programming environment for distributed memory architectures is presented, providing a global name space and allowing direct access to remote parts of data values and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes is presented.

...read moreread less

Abstract: Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece “owned” by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. This paper presents a new programming environment for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. We describe the analysis and program transformations required to implement this environment, and present the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes.

...read moreread less

178 citations

Journal Article•10.1364/AO.29.002058•

Potentials of two-photon based 3-D optical memories for high performance computing

[...]

Susan Hunter¹, Fouad Kiamilev¹, Sadik C. Esener¹, Dimitri A. Parthenopoulos², Peter M. Rentzepis² - Show less +1 more•Institutions (2)

University of California, San Diego¹, University of California, Irvine²

10 May 1990-Applied Optics

TL;DR: An optical volume memory based on the two-photon effect which allows for high density and parallel access and has the advantages of having high capacity and throughput which may overcome the disadvantages of current memories.

...read moreread less

Abstract: The advent of optoelectronic computers and highly parallel electronic processors has brought about a need for storage systems with enormous memory capacity and memory bandwidth. These demands cannot be met with current memory technologies (i.e., semiconductor, magnetic, or optical disk) without having the memory system completely dominate the processors in terms of the overall cost, power consumption, volume, and weight. As a solution, we propose an optical volume memory based on the two-photon effect which allows for high density and parallel access. In addition, the two-photon 3-D memory system has the advantages of having high capacity and throughput which may overcome the disadvantages of current memories.

...read moreread less

149 citations

Patent•

Method and apparatus for independently resetting processors and cache controllers in multiple processor systems

[...]

David A. Miller, Kenneth A. Jansen, Paul R. Culley, Mark E. Taylor, Javier F. Izquierdo - Show less +1 more

24 Oct 1990

TL;DR: In this paper, a method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system is presented.

...read moreread less

Abstract: A method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system. Processors 20 and 120 are reset without causing cache memory controllers 24 and 124 to reset.

...read moreread less

143 citations

Journal Article•10.1109/12.54839•

Recoverable distributed shared virtual memory

[...]

Kun-Lung Wu¹, W.K. Fuchs¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Apr 1990-IEEE Transactions on Computers

TL;DR: A user-transparent checkpointing recovery scheme and a new twin-page disk storage management technique are presented for implementing recoverable distributed shared virtual memory.

...read moreread less

Abstract: The problem of rollback recovery in distributed shared virtual environments, in which the shared memory is implemented in software in a loosely coupled distributed multicomputer system, is examined. A user-transparent checkpointing recovery scheme and a new twin-page disk storage management technique are presented for implementing recoverable distributed shared virtual memory. The checkpointing scheme can be integrated with the memory coherence protocol for managing the shared virtual memory. The twin-page disk design allows checkpointing to proceed in an incremental fashion without an explicit undo at the time of recovery. The recoverable distributed shared virtual memory allows the system to restart computation from a checkpoint without a global restart. >

...read moreread less

120 citations

Journal Article•10.1109/71.80128•

Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

[...]

Prithviraj Banerjee¹, M.H. Jones², J.S. Sargent•Institutions (2)

University of Illinois at Urbana–Champaign¹, AT&T²

01 Jan 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied.

...read moreread less

Abstract: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied. Two types of move are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support such a parallel cost evaluation. A novel tree broadcasting strategy is presented for the hypercube that is used extensively in the algorithm for updating cell locations in the parallel environment. A dynamic parallel annealing schedule is proposed that estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control. The performance on an Intel iPSC-2/D4/MX hypercube is reported. >

...read moreread less

110 citations

Patent•

Mechanism for passing messages between several processors coupled through a shared intelligent memory

[...]

Alaiwan Haissam¹•Institutions (1)

IBM¹

27 Feb 1990

TL;DR: In this paper, the authors propose a message passing mechanism for a plurality of processors interconnected by a shared intelligent memory for secure passing of messages between tasks operated on said processors, where each processor includes serving means for getting the messages to the task operated by each processor.

...read moreread less

Abstract: In the environment of a plurality of processors interconnected by a shared intelligent memory, a mechanism for the secure passing of messages between tasks operated on said processors is provided. Inter-task message passing is provided by shared intelligent memory for storing the messages transmitted by sending tasks. Further, each processor includes serving means for getting the messages to be sent to the task operated by said each processor. The passing of messages from a processor to the shared intelligent memory and from the latter to another processor is made, using a set of high-level microcoded commands. A process is provided using the message passing mechanism together with redundancies built into the shared memory, to ensure fault-tolerant message passing in which the tasks operated primarily on a processor are automatically replaced by back-up tasks executed on another processor if the first processor fails.

...read moreread less

A New Design for Distributed Systems: The Remote Memory Model

[...]

Douglas Comer, James Griffioen

1 Jan 1990

TL;DR: This paper examines lhe design of a highly efficient, reliable, machine-independent prolOColused by the remote memory server to communicate with the client machines, and outlines the algorilhms and data structures employed by theRemote Memory Model to efficiently locate the data stored on lhe server.

...read moreread less

Abstract: This paper describes a new model for constructing distributed systems called lhe Remote Memory Model. The remote memory model consisls of several client machines, one or morc dedicated machines called remote memory servers, and a communication channel interconnecting lhem. In the remote memory model, client machines share lhe memory resources located on the remote memory server. Client machines that exhaust lheir local memory move portions of lheir address space to the remote memory server and retrieve pieces as needed. Because lhe remote memory server uses a machineindependent prolOCOl to communicate wilh client machines, lhe remote memory server can support multiple heterogeneous client machines simultaneously. This paper describes lhe remote memory model and discusses lhe advantages and issues of systems that use this model. It examines lhe design of a highly efficient, reliable, machine-independent prolOColused by the remote memory server to communicate with the client machines. It also outlines the algorilhms and data structures employed by the remote memory server to efficiently locate the data stored on lhe server. Finally, it presenls measuremenls of a prototype implementation that clearly demonstrate the viability and competitive performance of the remote memory model.

...read moreread less

Patent•

Checkpointing mechanism for fault-tolerant systems

[...]

Haissam Alaiwan¹, Claude Basso¹, Jean Calvignac¹, Jacques Combes¹, Francois Kermarec¹, Andre Pauporte¹ - Show less +2 more•Institutions (1)

IBM¹

8 Feb 1990

TL;DR: In this paper, the active and backup processors are coupled asynchronously with some hardware assist functions comprising a memory change detector which captures memory changes in the memory of the active processor and a mirroring control circuit which causes the memory changes when committed by establish recovery point signals generated by the active processors.

...read moreread less

Abstract: A checkpointing mechanism implemented in a data processing system comprising a dual processor configuration gives the system a fault tolerance capability while minimizing the complexity of both the software and the hardware. The active and backup processors are coupled asynchronously with some hardware assist functions comprising a memory change detector which captures the memory changes in the memory of the active processor and a mirroring control circuit which causes the memory changes when committed by establish recovery point signals generated by the active processor to be dumped into the memory of the back up processor so that the backup processor can resume the operations of the active processor from the last established recovery point. The active and backup processors may each be connected to a dedicated memory and recovery point storing means, or to a memory including two dual sides shared by all the processors for storing data structures and recovery points.

...read moreread less

Proceedings Article•10.1145/97444.97672•

Asynchronous shared memory parallel computation

[...]

Naomi Nishimura¹•Institutions (1)

University of Toronto¹

1 May 1990

TL;DR: A new model of asynchronous shared memory parallel computation is introduced, and it is shown that this model fulfils all the listed requirements and also analyzes in this model the complexity of several fundamental parallel algorithms.

...read moreread less

Abstract: The contributions of this paper are twofold. First, we outline criteria by which any model of asynchronous shared memory parallel computation can be judged. Previous models are considered with respect to these factors. Next, we introduce a new model, and show that this model fulfils all the listed requirements. We also analyze in our model the complexity of several fundamental parallel algorithms.

...read moreread less

Journal Article•10.1145/255129.255179•

Pandore: a system to manage data distribution

[...]

Françoise André, Jean-Louis Pazat, Henry Thomas

1 Jun 1990

TL;DR: The goal of the Pandore system is to allow the execution of parallel algorithms on DMPCs (Distributed Memory Parallel Computers) without having to take into account the low-level characteristics of the target distributed computer to program the algorithm.

...read moreread less

Abstract: The goal of the Pandore system is to allow the execution of parallel algorithms on DMPCs (Distributed Memory Parallel Computers) without having to take into account the low-level characteristics of the target distributed computer to program the algorithm. No explicit process definition and interprocess communications are needed. Parallelization is achieved through logical data organization. The Pandore system provides the user with a mean to specify data partitioning and data distribution over a domain of virtual processors for each parallel step of his algorithm.At compile time, Pandore splits the original program into parallel processes. Each process will execute some appropriate parts of the original code, according to the given data decomposition. In order to achieve a correct utilization of the data structures distributed over the processors, the Pandore system provides an execution scheme based on a communication layer, which is an abstraction of a message-passing architecture. This intermediate level is them implemented using the effective primitives of the real architecture (in our specific case, an Intel iPSC/2).

...read moreread less

Programming distributed memory architectures using Kali

[...]

Piyush Mehrotra¹, John Vanrosendale•Institutions (1)

Purdue University¹

1 Oct 1990

TL;DR: The paper presents a new programming environment, Kali, which provides a global name space and allows direct access to remote data values and a system of annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing.

...read moreread less

Abstract: Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.

...read moreread less

Proceedings Article•10.1109/DMCC.1990.556323•

An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers

[...]

K. Ikudome¹, Geoffrey C. Fox, A. Kolawa, J.W. Flower•Institutions (1)

California Institute of Technology¹

8 Apr 1990

TL;DR: ASAR (Automatic and Symbolic PARallelization) is described which consists of a source-to-source parallelizer and a set of interactive graphic tools and is designed for easy modification for other languages such as Fortran.

...read moreread less

Abstract: This paper describes ASPAR (Automatic and Symbolic PARallelization) which consists of a source-to-source parallelizer and a set of interactive graphic tools. While the issues of data dependency have already been explored and used in many parallel computer systems such as vector and shared memory machines, distributed memory parallel computers require, in addition, explicit data decomposition. New symbolic analysis and data-dependency analysis methods are used to determine an explicit data decomposition scheme. Automatic parallelization models using high level communications are also described in this paper. The target applications are of the “regular-mesh" type typical of many scientific calculations. The system has been implemented for the language C, and is designed for easy modification for other languages such as Fortran.

...read moreread less

Patent•

Method and apparatus for exploiting communications bandwidth as for providing shared memory

[...]

Daniel Manuel Dias¹, Balakrishna R. Iyer¹•Institutions (1)

IBM¹

8 Nov 1990

TL;DR: In this article, a multiprocessor system linked by a fiber optic ring network uses some of the bandwidth of the ring network as a shared memory resource, which can carry message packets from one processor to another or network memory packets which circulate indefinitely on the network.

...read moreread less

Abstract: A multiprocessor system linked by a fiber optic ring network uses some of the bandwidth of the ring network as a shared memory resource. Data slots are defined on the network which can carry message packets from one processor to another or network memory packets which circulate indefinitely on the network. One use of these network memory packets is as a lock management system for controlling concurrent access to a shared database by the multiple processors. The network memory packets are treated as lock entities. A processor indicates that it wants to procure a lock entity by circulating a packet, having a first network memory type, around the network. If no conflicting packets are detected when the circulated packet returns, the type of the slot is changed to a second network memory type indicating a procured lock entity.

...read moreread less

Journal Article•10.1016/0020-0190(90)90103-5•

The processor identity problem

[...]

Richard J. Lipton¹, A. Park²•Institutions (2)

Princeton University¹, University of California, Davis²

02 Oct 1990-Information Processing Letters

TL;DR: A probabilistic protocol is presented that solves this Processor Identiy Problem for asynchronous processors that communicate through a common shared memory and simplifies shared memory processor design by eliminating the need to encode processor identifiers in system hardware or software structures.

...read moreread less

Journal Article•10.1007/BF01901067•

A highly flexible multiprocessor solution for ray tracing

[...]

Stuart A. Green¹, Derek J. Paddon¹•Institutions (1)

University of Bristol¹

01 Mar 1990-The Visual Computer

TL;DR: A general-purpose multiprocessor solution for ray tracing which may be used to reduce execution time without restricting development of the ray tracing code is described.

...read moreread less

Abstract: The ray tracing algorithm continues to attract much research and development to improve the quality of the images that are generated, and to reduce the time taken to produce them. By identifying the key requirements of a development system from the user's point of view, we describe a general-purpose multiprocessor solution for ray tracing which may be used to reduce execution time without restricting development of the ray tracing code. The solution is based upon a distributed memory multiprocessor system in which each processor addresses a small amount of memory relative to the size of the model database. Methods for exploiting the coherence of references to entries in the database are described which use a combination of dynamic and static caching techniques. This scheme allows databases of arbitrary size to be supported on multiprocessors with limited distributed memory.

...read moreread less

Journal Article•10.1016/0167-8191(90)90032-5•

Finding the roots of a polynomial on an MIMD multicomputer

[...]

Michel Consnard¹, Pierre Fraigniaud¹•Institutions (1)

École normale supérieure de Lyon¹

1 Sep 1990

TL;DR: It is shown that among the different classical processors networks topologies (ring, 2d-torus or n-cube), the hypercube topology minimizes the communications.

...read moreread less

Abstract: This paper introduces the parallelization on a distributed memory multicomputer of two iterative methods for finding all the roots of a given polynomial. The parallel algorithms share the computation of the roots among the processors and perform a total exchange of the data at each step. Since the amount of communications is the main drawback of this approach, we study the effect of the network topology on the performance of the algorithms. Particularly, we show that among the different classical processors networks topologies (ring, 2d-torus or n-cube), the hypercube topology minimizes the communications. For each topology is computed the optimal number of processors. Experiments on the hypercube FPS T40 illustrate the results.

...read moreread less

Book•

The impact of vector and parallel architectures on the Gaussian elimination algorithm

[...]

Yves Robert¹•Institutions (1)

École Normale Supérieure¹

1 Jan 1990

TL;DR: This paper presents three case studies of Gaussian elimination in vector multiprocessor computing, a model system for Gaussian elimation, and methodologies for systolic arrays for dependence mapping method, complexity results, folding.

...read moreread less

Abstract: Introduction: background - Gaussian elimination, speedup and efficiency vector and parallel architectures: pipeline computers vector computers parallel computers three case studies. Part 1 Parallel algorithm design - vector multiprocessor computing - vectorization of vector-vectr operations, Gaussian elimination in terms of vector-vector kernels, vector register re-use, Gaussian elimination interms of matrix-vector kernels, cache re-use, Gaussian elimination in terms of matrix-matrix kernels, vectorization epilogue, fine-grain parallelism, parallel Gaussian elimination hypercube computing - topological properties of hypercubes, broadcasting, centralized Gaussian elimination, local pipelined algorithms, a word on speedup evaluation, matrices over finite fields systolic computing - 2D arrays, solving the triangular system on the fly, 1D arrays, matrices over finite fields. Part 2 Models and tools: task graph scheduling - task system for Gaussian elimation, bounds for parallel execution, an optimal schedule, with an arbitrary number of processors analysis of distributed algorithms - data allocation strategies, speedup evaluation on distributed memory machines design methodologies for systolic arrays - dependence mapping method, complexity results, folding.

...read moreread less

Patent•

Fault-tolerant digital computing system with reduced memory redundancy

[...]

Scott Gray¹, Steven R. Thompson¹•Institutions (1)

Honeywell¹

10 Apr 1990

TL;DR: In this paper, a linear block code error detection scheme is implemented with each shared memory, wherein the effect of random memory faults is sufficiently detected such that the inherent fault tolerance of a pair-spare architecture is not compromised.

...read moreread less

Abstract: A highly reliable data processing system using the pair-spare architecture obviates the need for separate memory arrays for each processor. A single memory is shared between each pair of processors wherein a linear block code error detection scheme is implemented with each shared memory, wherein the effect of random memory faults is sufficiently detected such that the inherent fault tolerance of a pair-spare architecture is not compromised.

...read moreread less

Journal Article•10.2118/19804-PA•

Reservoir Simulation on a Hypercube

[...]

John A. Wheeler¹, Richard A. Smith¹•Institutions (1)

ExxonMobil¹

01 Nov 1990-Spe Reservoir Engineering

TL;DR: Presentation of a testing of a 3D parallel implicit reservoir simulator for an Intel iPSC/2 hypercube with 16 vector processors, which demonstrates that up up to 96% of the available CPU time on the hypercube can be used.

...read moreread less

Abstract: Presentation of a testing of a 3D parallel implicit reservoir simulator for an Intel iPSC/2 hypercube with 16 vector processors. The simulator is based on an oil/water model. A correlation of computation efficiency with problem size and the number of processors demonstrates that up up to 96% of the available CPU time on the hypercube can be used. Such high efficiencies were achieved by developing special algorithms well suited for multiple processors and distributed memory.

...read moreread less

Patent•

Distributed data driven process

[...]

Chao-Kuang Pian, Minh-Tram D. Nguyen, Theodore E. Posch, Jeffrey E. Juhre

14 Jun 1990

TL;DR: In this article, a data driven method for coordinating the processing of arithmetic tasks in a multiple computer system having a multiplicity of arithmetic processors by determining whether an arithmetic task is in a blocked condition or is in an execution ready condition is presented.

...read moreread less

Abstract: A data driven method for coordinating the processing of arithmetic tasks in a multiple computer system having a multiplicity of arithmetic processors by determining whether an arithmetic task is in a blocked condition or is in an execution ready condition. A source distributed processor stores data in a local memory for processing by a local processor and then transfers the processed data to a global memory for buffering in preparation for subsequent processing by a destination distributed processor. The source distributed processor generates a produce message to a destination distributed processor to indicate that the data to be transferred is available in a buffer in the global memory. The destination distributed processor loads the data to be transferred from the buffer in the global memory and then generates a consume message to the source distributed processor to indicate that the data has been transferred from the global memory and the buffer in the global memory is now available.

...read moreread less

Proceedings Article•

A Comparison of Programming Models for Shared Memory Multiprocessors.

[...]

Calvin Lin, Lawrence Snyder

1 Jan 1990

Proceedings Article•10.1109/SPDP.1990.143621•

Multi-version memory: software cache management for concurrent B-trees

[...]

W.E. Weihl¹, P. Wang¹•Institutions (1)

Massachusetts Institute of Technology¹

2 Dec 1990

TL;DR: The authors describe a new concurrent B-tree algorithm designed to work well in large-scale parallel or distributed systems in which the number of processors sharing the tree is large or the communication delay between processors is large relative to the speed of local computation.

...read moreread less

Abstract: The authors describe a new concurrent B-tree algorithm. The algorithm is designed to work well in large-scale parallel or distributed systems in which the number of processors sharing the tree is large or the communication delay between processors (or between processors and the global memory for a shared-memory system) is large relative to the speed of local computation. The basis of the algorithm is an abstraction that is similar to coherent shared memory, but provides a weaker semantics; this abstraction is called multiversion memory. Multi-version memory uses caches but weakens the semantics of ordinary shared memory by allowing process reading data to be given an old version of the data. This semantics is adequate for the non-leaf nodes in the B-tree algorithms presented. >

...read moreread less

Patent•

Method and apparatus for circuit simulation using parallel processors including memory arrangements and matrix decomposition synchronization

[...]

Gabriel P. Bischoff, Steven S. Greenberg

23 Apr 1990

TL;DR: In this article, a digital data processing system including a plurality of processors processes a program in parallel to load process data into a two-dimensional matrix having plurality of matrix entries, and each processor can separately generate process data for different matrix entries from the preliminary data, there is no conflict in accessing of the memory locations among the various processors during of the process data.

...read moreread less

Abstract: A digital data processing system including a plurality of processors processes a program in parallel to load process data into a two-dimensional matrix having a plurality of matrix entries. So that the processors will not have to synchronize loading of process data into particular locations in the matrix, the matrix has a third dimension defining a plurality of memory locations, with each series of locations along the third dimension being associated with one of the matrix entries. Each processor initially loads preliminary process data into a memory location along the third dimension. After that has been completed, each processor generates process data for an entry of the two-dimensional matrix from the preliminary process data in the locations along the third dimension related thereto. Since the processors separately load preliminary process data into different memory locations, along the third dimension, there is no conflict with accessing of memory locations among the various processors during generation of preliminary process data. Further, since the processors can separately generate process data for different matrix entries from the preliminary data, there is no conflict in accessing of the memory locations among the various processors during of the process data.

...read moreread less

Book Chapter•10.1007/3-540-54195-0_63•

PARSAC-2: A Parallel SAC-2 Based on Threads

[...]

Wolfgang Kuechlin¹•Institutions (1)

Ohio State University¹

20 Aug 1990-Applicable Algebra in Engineering, Communication and Computing

TL;DR: It is demonstrated that S-threads permit a parallelization of SAC-2 down to the lowest algebraic level, and how a key parameter of the S- threads memory design influences parallel performance is shown.

...read moreread less

Abstract: We describe the design of PARSAC-2, a parallel version of the SAC-2 Computer Algebra system In PARSAC-2, parallelism is based on multiple threads (lightweight processes) executing on a shared memory multiprocessor The S-threads subsystem provides threads which are capable of parallel list processing on a shared heap The S-threads heap memory is designed to allow concurrent list cell allocation by multiple threads with minimal synchronization overhead S-threads may also perform parallel garbage collection, and a slightly weaker form of storage management called preventive garbage collection We present an example of algorithm development in PARSAC by parallelizing the SAC-2 algorithm IPRODK, an integer multiplication routine based on Karatsuba's method Using empirical data from this experiment, we demonstrate that S-threads permit a parallelization of SAC-2 down to the lowest algebraic level Finally, we show how a key parameter of the S-threads memory design influences parallel performance

...read moreread less

Proceedings Article•10.1109/ICDE.1990.113506•

Update propagation in distributed memory hierarchy

[...]

Matthew Bellew¹, Meichun Hsu¹, V.-O. Tam¹•Institutions (1)

Harvard University¹

5 Feb 1990

TL;DR: A DMH system is presented, the tradeoffs between conservative and aggressive update propagation strategies are defined, and promising new strategies are identified.

...read moreread less

Abstract: A distributed memory hierarchy (DMH) is a memory system consisting of storage modules distributed over a high-bandwidth local area network. It provides for transaction applications an abstraction of single virtual memory space to which shared data are mapped. As in a conventional memory hierarchy (MH) in a single-machine system, a DMH is responsible for locating, migrating, and caching data pages; however, unlike a conventional MH, a DMH must do so across the storage modules in a network. In addition, a DMH must handle the problem of propagation of transaction updates preserving serializability of transactions. The performance of a DMH system is strongly influenced by concurrency control and update propagation. It is also crucial that performance analysis accounts for memory resources and network requirements. A DMH system is presented, the tradeoffs between conservative and aggressive update propagation strategies are defined, and promising new strategies are identified. >

...read moreread less

...

Expand