Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Distributed memory
  4. 1990
  1. Home
  2. Topics
  3. Distributed memory
  4. 1990
Showing papers on "Distributed memory published in 1990"
Proceedings Article•10.1145/285930.285997•
Memory consistency and event ordering in scalable shared-memory multiprocessors

[...]

Kourosh Gharachorloo1, Daniel E. Lenoski1, James Laudon1, Phillip B. Gibbons1, Anoop Gupta1, John L. Hennessy1 •
Stanford University1
1 May 1990
TL;DR: A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced and is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization.
Abstract: Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.

1,275 citations

Proceedings Article•10.1145/99163.99182•
Munin: distributed shared memory based on type-specific memory coherence

[...]

John K. Bennett1, John B. Carter1, Willy Zwaenepoel1•
Rice University1
1 Feb 1990
TL;DR: This paper focuses on the design and use of Munin's memory coherence mechanisms, and compares the approach to previous work in this area.
Abstract: We are developing Munin, a system that allows programs written for shared memory multiprocessors to be executed efficiently on distributed memory machines. Munin attempts to overcome the architectural limitations of shared memory machines, while maintaining their advantages in terms of ease of programming. Our system is unique in its use of loosely coherent memory, based on the partial order specified by a shared memory parallel program, and in its use of type-specific memory coherence. Instead of a single memory coherence mechanism for all shared data objects, Munin employs several different mechanisms, each appropriate for a different class of shared data object. These type-specific mechanisms are part of a runtime system that accepts hints from the user or the compiler to determine the coherence mechanism to be used for each object. This paper focuses on the design and use of Munin's memory coherence mechanisms, and compares our approach to previous work in this area.

466 citations

Proceedings Article•10.1109/ICDCS.1990.89257•
Real-time synchronization protocols for shared memory multiprocessors

[...]

Ragunathan Rajkumar1•
IBM1
1 Jan 1990
TL;DR: A priority-based synchronization protocol that explicitly uses shared-memory primitives is defined and analyzed, and the underlying priority consideration for a shared memory synchronization protocol are studied and priority assignments to be used by the protocol are derived.
Abstract: A priority-based synchronization protocol that explicitly uses shared-memory primitives is defined and analyzed. A solution that has been proposed for bounding and minimizing synchronization delays in real-time systems is briefly reviewed. The waiting times introduced by synchronization requirements in multiple-processor environments are identified, and a set of goals for priority-based multiprocessor synchronization protocols is derived. The underlying priority consideration for a shared memory synchronization protocol are studied and priority assignments to be used by the protocol are derived. >

259 citations

Journal Article•10.1109/2.53355•
Algorithms implementing distributed shared memory

[...]

Michael Stumm1, S. Zhou1•
University of Toronto1
01 May 1990-IEEE Computer
TL;DR: It is shown that the correct choice of algorithm is determined largely by the memory access behavior of the applications, and some limitations of distributed shared memory are noted.
Abstract: Four basic algorithms for implementing distributed shared memory are compared. Conceptually, these algorithms extend local virtual address spaces to span multiple hosts connected by a local area network, and some of them can easily be integrated with the hosts' virtual memory systems. The merits of distributed shared memory and the assumptions made with respect to the environment in which the shared memory algorithms are executed are described. The algorithms are then described, and a comparative analysis of their performance in relation to application-level access behavior is presented. It is shown that the correct choice of algorithm is determined largely by the memory access behavior of the applications. Two particularly interesting extensions of the basic algorithms are described, and some limitations of distributed shared memory are noted. >

258 citations

Journal Article•10.1016/0743-7315(90)90129-D•
Run-time scheduling and execution of loops on message passing machines

[...]

Joel H. Saltz1, Kathleen Crowley2, Kathleen Crowley1, Ravi Mirchandaney2, Ravi Mirchandaney1, Harry Berryman1 •
Langley Research Center1, Yale University2
01 Apr 1990-Journal of Parallel and Distributed Computing
TL;DR: This work examines the effectiveness of optimizations aimed to allowing distributed machine to efficiently compute inner loops over globally defined data structures by targeting loops in which some array references are made through a level of indirection.

200 citations

Proceedings Article•10.1145/99163.99183•
Supporting shared data structures on distributed memory architectures

[...]

C. Koelbel1, Piyush Mehrotra2, J. Van Rosendale2•
Purdue University1, Langley Research Center2
1 Feb 1990
TL;DR: A new programming environment for distributed memory architectures is presented, providing a global name space and allowing direct access to remote parts of data values and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes is presented.
Abstract: Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece “owned” by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. This paper presents a new programming environment for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. We describe the analysis and program transformations required to implement this environment, and present the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes.

178 citations

Journal Article•10.1364/AO.29.002058•
Potentials of two-photon based 3-D optical memories for high performance computing

[...]

Susan Hunter1, Fouad Kiamilev1, Sadik C. Esener1, Dimitri A. Parthenopoulos2, Peter M. Rentzepis2 •
University of California, San Diego1, University of California, Irvine2
10 May 1990-Applied Optics
TL;DR: An optical volume memory based on the two-photon effect which allows for high density and parallel access and has the advantages of having high capacity and throughput which may overcome the disadvantages of current memories.
Abstract: The advent of optoelectronic computers and highly parallel electronic processors has brought about a need for storage systems with enormous memory capacity and memory bandwidth. These demands cannot be met with current memory technologies (i.e., semiconductor, magnetic, or optical disk) without having the memory system completely dominate the processors in terms of the overall cost, power consumption, volume, and weight. As a solution, we propose an optical volume memory based on the two-photon effect which allows for high density and parallel access. In addition, the two-photon 3-D memory system has the advantages of having high capacity and throughput which may overcome the disadvantages of current memories.

149 citations

Patent•
Method and apparatus for independently resetting processors and cache controllers in multiple processor systems

[...]

David A. Miller, Kenneth A. Jansen, Paul R. Culley, Mark E. Taylor, Javier F. Izquierdo 
24 Oct 1990
TL;DR: In this paper, a method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system is presented.
Abstract: A method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system. Processors 20 and 120 are reset without causing cache memory controllers 24 and 124 to reset.

143 citations

Journal Article•10.1109/12.54839•
Recoverable distributed shared virtual memory

[...]

Kun-Lung Wu1, W.K. Fuchs1•
University of Illinois at Urbana–Champaign1
01 Apr 1990-IEEE Transactions on Computers
TL;DR: A user-transparent checkpointing recovery scheme and a new twin-page disk storage management technique are presented for implementing recoverable distributed shared virtual memory.
Abstract: The problem of rollback recovery in distributed shared virtual environments, in which the shared memory is implemented in software in a loosely coupled distributed multicomputer system, is examined. A user-transparent checkpointing recovery scheme and a new twin-page disk storage management technique are presented for implementing recoverable distributed shared virtual memory. The checkpointing scheme can be integrated with the memory coherence protocol for managing the shared virtual memory. The twin-page disk design allows checkpointing to proceed in an incremental fashion without an explicit undo at the time of recovery. The recoverable distributed shared virtual memory allows the system to restart computation from a checkpoint without a global restart. >

120 citations

Journal Article•10.1109/71.80128•
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

[...]

Prithviraj Banerjee1, M.H. Jones2, J.S. Sargent•
University of Illinois at Urbana–Champaign1, AT&T2
01 Jan 1990-IEEE Transactions on Parallel and Distributed Systems
TL;DR: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied.
Abstract: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied. Two types of move are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support such a parallel cost evaluation. A novel tree broadcasting strategy is presented for the hypercube that is used extensively in the algorithm for updating cell locations in the parallel environment. A dynamic parallel annealing schedule is proposed that estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control. The performance on an Intel iPSC-2/D4/MX hypercube is reported. >

110 citations

Patent•
Mechanism for passing messages between several processors coupled through a shared intelligent memory

[...]

Alaiwan Haissam1•
IBM1
27 Feb 1990
TL;DR: In this paper, the authors propose a message passing mechanism for a plurality of processors interconnected by a shared intelligent memory for secure passing of messages between tasks operated on said processors, where each processor includes serving means for getting the messages to the task operated by each processor.
Abstract: In the environment of a plurality of processors interconnected by a shared intelligent memory, a mechanism for the secure passing of messages between tasks operated on said processors is provided. Inter-task message passing is provided by shared intelligent memory for storing the messages transmitted by sending tasks. Further, each processor includes serving means for getting the messages to be sent to the task operated by said each processor. The passing of messages from a processor to the shared intelligent memory and from the latter to another processor is made, using a set of high-level microcoded commands. A process is provided using the message passing mechanism together with redundancies built into the shared memory, to ensure fault-tolerant message passing in which the tasks operated primarily on a processor are automatically replaced by back-up tasks executed on another processor if the first processor fails.
A New Design for Distributed Systems: The Remote Memory Model

[...]

Douglas Comer, James Griffioen
1 Jan 1990
TL;DR: This paper examines lhe design of a highly efficient, reliable, machine-independent prolOColused by the remote memory server to communicate with the client machines, and outlines the algorilhms and data structures employed by theRemote Memory Model to efficiently locate the data stored on lhe server.
Abstract: This paper describes a new model for constructing distributed systems called lhe Remote Memory Model. The remote memory model consisls of several client machines, one or morc dedicated machines called remote memory servers, and a communication channel interconnecting lhem. In the remote memory model, client machines share lhe memory resources located on the remote memory server. Client machines that exhaust lheir local memory move portions of lheir address space to the remote memory server and retrieve pieces as needed. Because lhe remote memory server uses a machineindependent prolOCOl to communicate wilh client machines, lhe remote memory server can support multiple heterogeneous client machines simultaneously. This paper describes lhe remote memory model and discusses lhe advantages and issues of systems that use this model. It examines lhe design of a highly efficient, reliable, machine-independent prolOColused by the remote memory server to communicate with the client machines. It also outlines the algorilhms and data structures employed by the remote memory server to efficiently locate the data stored on lhe server. Finally, it presenls measuremenls of a prototype implementation that clearly demonstrate the viability and competitive performance of the remote memory model.
Patent•
Checkpointing mechanism for fault-tolerant systems

[...]

Haissam Alaiwan1, Claude Basso1, Jean Calvignac1, Jacques Combes1, Francois Kermarec1, Andre Pauporte1 •
IBM1
8 Feb 1990
TL;DR: In this paper, the active and backup processors are coupled asynchronously with some hardware assist functions comprising a memory change detector which captures memory changes in the memory of the active processor and a mirroring control circuit which causes the memory changes when committed by establish recovery point signals generated by the active processors.
Abstract: A checkpointing mechanism implemented in a data processing system comprising a dual processor configuration gives the system a fault tolerance capability while minimizing the complexity of both the software and the hardware. The active and backup processors are coupled asynchronously with some hardware assist functions comprising a memory change detector which captures the memory changes in the memory of the active processor and a mirroring control circuit which causes the memory changes when committed by establish recovery point signals generated by the active processor to be dumped into the memory of the back up processor so that the backup processor can resume the operations of the active processor from the last established recovery point. The active and backup processors may each be connected to a dedicated memory and recovery point storing means, or to a memory including two dual sides shared by all the processors for storing data structures and recovery points.
Proceedings Article•10.1145/97444.97672•
Asynchronous shared memory parallel computation

[...]

Naomi Nishimura1•
University of Toronto1
1 May 1990
TL;DR: A new model of asynchronous shared memory parallel computation is introduced, and it is shown that this model fulfils all the listed requirements and also analyzes in this model the complexity of several fundamental parallel algorithms.
Abstract: The contributions of this paper are twofold. First, we outline criteria by which any model of asynchronous shared memory parallel computation can be judged. Previous models are considered with respect to these factors. Next, we introduce a new model, and show that this model fulfils all the listed requirements. We also analyze in our model the complexity of several fundamental parallel algorithms.
Journal Article•10.1145/255129.255179•
Pandore: a system to manage data distribution

[...]

Françoise André, Jean-Louis Pazat, Henry Thomas
1 Jun 1990
TL;DR: The goal of the Pandore system is to allow the execution of parallel algorithms on DMPCs (Distributed Memory Parallel Computers) without having to take into account the low-level characteristics of the target distributed computer to program the algorithm.
Abstract: The goal of the Pandore system is to allow the execution of parallel algorithms on DMPCs (Distributed Memory Parallel Computers) without having to take into account the low-level characteristics of the target distributed computer to program the algorithm. No explicit process definition and interprocess communications are needed. Parallelization is achieved through logical data organization. The Pandore system provides the user with a mean to specify data partitioning and data distribution over a domain of virtual processors for each parallel step of his algorithm.At compile time, Pandore splits the original program into parallel processes. Each process will execute some appropriate parts of the original code, according to the given data decomposition. In order to achieve a correct utilization of the data structures distributed over the processors, the Pandore system provides an execution scheme based on a communication layer, which is an abstraction of a message-passing architecture. This intermediate level is them implemented using the effective primitives of the real architecture (in our specific case, an Intel iPSC/2).
Programming distributed memory architectures using Kali

[...]

Piyush Mehrotra1, John Vanrosendale•
Purdue University1
1 Oct 1990
TL;DR: The paper presents a new programming environment, Kali, which provides a global name space and allows direct access to remote data values and a system of annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing.
Abstract: Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.
Proceedings Article•10.1109/DMCC.1990.556323•
An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers

[...]

K. Ikudome1, Geoffrey C. Fox, A. Kolawa, J.W. Flower•
California Institute of Technology1
8 Apr 1990
TL;DR: ASAR (Automatic and Symbolic PARallelization) is described which consists of a source-to-source parallelizer and a set of interactive graphic tools and is designed for easy modification for other languages such as Fortran.
Abstract: This paper describes ASPAR (Automatic and Symbolic PARallelization) which consists of a source-to-source parallelizer and a set of interactive graphic tools. While the issues of data dependency have already been explored and used in many parallel computer systems such as vector and shared memory machines, distributed memory parallel computers require, in addition, explicit data decomposition. New symbolic analysis and data-dependency analysis methods are used to determine an explicit data decomposition scheme. Automatic parallelization models using high level communications are also described in this paper. The target applications are of the “regular-mesh" type typical of many scientific calculations. The system has been implemented for the language C, and is designed for easy modification for other languages such as Fortran.
Patent•
Method and apparatus for exploiting communications bandwidth as for providing shared memory

[...]

Daniel Manuel Dias1, Balakrishna R. Iyer1•
IBM1
8 Nov 1990
TL;DR: In this article, a multiprocessor system linked by a fiber optic ring network uses some of the bandwidth of the ring network as a shared memory resource, which can carry message packets from one processor to another or network memory packets which circulate indefinitely on the network.
Abstract: A multiprocessor system linked by a fiber optic ring network uses some of the bandwidth of the ring network as a shared memory resource. Data slots are defined on the network which can carry message packets from one processor to another or network memory packets which circulate indefinitely on the network. One use of these network memory packets is as a lock manage­ment system for controlling concurrent access to a shared database by the multiple processors. The network memory packets are treated as lock entities. A processor indicates that it wants to procure a lock entity by circulating a packet, having a first network memory type, around the network. If no conflicting packets are detected when the circulated packet returns, the type of the slot is changed to a second network memory type indicating a procured lock entity.
Journal Article•10.1016/0020-0190(90)90103-5•
The processor identity problem

[...]

Richard J. Lipton1, A. Park2•
Princeton University1, University of California, Davis2
02 Oct 1990-Information Processing Letters
TL;DR: A probabilistic protocol is presented that solves this Processor Identiy Problem for asynchronous processors that communicate through a common shared memory and simplifies shared memory processor design by eliminating the need to encode processor identifiers in system hardware or software structures.
Journal Article•10.1007/BF01901067•
A highly flexible multiprocessor solution for ray tracing

[...]

Stuart A. Green1, Derek J. Paddon1•
University of Bristol1
01 Mar 1990-The Visual Computer
TL;DR: A general-purpose multiprocessor solution for ray tracing which may be used to reduce execution time without restricting development of the ray tracing code is described.
Abstract: The ray tracing algorithm continues to attract much research and development to improve the quality of the images that are generated, and to reduce the time taken to produce them. By identifying the key requirements of a development system from the user's point of view, we describe a general-purpose multiprocessor solution for ray tracing which may be used to reduce execution time without restricting development of the ray tracing code. The solution is based upon a distributed memory multiprocessor system in which each processor addresses a small amount of memory relative to the size of the model database. Methods for exploiting the coherence of references to entries in the database are described which use a combination of dynamic and static caching techniques. This scheme allows databases of arbitrary size to be supported on multiprocessors with limited distributed memory.
Journal Article•10.1016/0167-8191(90)90032-5•
Finding the roots of a polynomial on an MIMD multicomputer

[...]

Michel Consnard1, Pierre Fraigniaud1•
École normale supérieure de Lyon1
1 Sep 1990
TL;DR: It is shown that among the different classical processors networks topologies (ring, 2d-torus or n-cube), the hypercube topology minimizes the communications.
Abstract: This paper introduces the parallelization on a distributed memory multicomputer of two iterative methods for finding all the roots of a given polynomial. The parallel algorithms share the computation of the roots among the processors and perform a total exchange of the data at each step. Since the amount of communications is the main drawback of this approach, we study the effect of the network topology on the performance of the algorithms. Particularly, we show that among the different classical processors networks topologies (ring, 2d-torus or n-cube), the hypercube topology minimizes the communications. For each topology is computed the optimal number of processors. Experiments on the hypercube FPS T40 illustrate the results.
Book•
The impact of vector and parallel architectures on the Gaussian elimination algorithm

[...]

Yves Robert1•
École Normale Supérieure1
1 Jan 1990
TL;DR: This paper presents three case studies of Gaussian elimination in vector multiprocessor computing, a model system for Gaussian elimation, and methodologies for systolic arrays for dependence mapping method, complexity results, folding.
Abstract: Introduction: background - Gaussian elimination, speedup and efficiency vector and parallel architectures: pipeline computers vector computers parallel computers three case studies. Part 1 Parallel algorithm design - vector multiprocessor computing - vectorization of vector-vectr operations, Gaussian elimination in terms of vector-vector kernels, vector register re-use, Gaussian elimination interms of matrix-vector kernels, cache re-use, Gaussian elimination in terms of matrix-matrix kernels, vectorization epilogue, fine-grain parallelism, parallel Gaussian elimination hypercube computing - topological properties of hypercubes, broadcasting, centralized Gaussian elimination, local pipelined algorithms, a word on speedup evaluation, matrices over finite fields systolic computing - 2D arrays, solving the triangular system on the fly, 1D arrays, matrices over finite fields. Part 2 Models and tools: task graph scheduling - task system for Gaussian elimation, bounds for parallel execution, an optimal schedule, with an arbitrary number of processors analysis of distributed algorithms - data allocation strategies, speedup evaluation on distributed memory machines design methodologies for systolic arrays - dependence mapping method, complexity results, folding.
Patent•
Fault-tolerant digital computing system with reduced memory redundancy

[...]

Scott Gray1, Steven R. Thompson1•
Honeywell1
10 Apr 1990
TL;DR: In this paper, a linear block code error detection scheme is implemented with each shared memory, wherein the effect of random memory faults is sufficiently detected such that the inherent fault tolerance of a pair-spare architecture is not compromised.
Abstract: A highly reliable data processing system using the pair-spare architecture obviates the need for separate memory arrays for each processor. A single memory is shared between each pair of processors wherein a linear block code error detection scheme is implemented with each shared memory, wherein the effect of random memory faults is sufficiently detected such that the inherent fault tolerance of a pair-spare architecture is not compromised.
Journal Article•10.2118/19804-PA•
Reservoir Simulation on a Hypercube

[...]

John A. Wheeler1, Richard A. Smith1•
ExxonMobil1
01 Nov 1990-Spe Reservoir Engineering
TL;DR: Presentation of a testing of a 3D parallel implicit reservoir simulator for an Intel iPSC/2 hypercube with 16 vector processors, which demonstrates that up up to 96% of the available CPU time on the hypercube can be used.
Abstract: Presentation of a testing of a 3D parallel implicit reservoir simulator for an Intel iPSC/2 hypercube with 16 vector processors. The simulator is based on an oil/water model. A correlation of computation efficiency with problem size and the number of processors demonstrates that up up to 96% of the available CPU time on the hypercube can be used. Such high efficiencies were achieved by developing special algorithms well suited for multiple processors and distributed memory.
Patent•
Distributed data driven process

[...]

Chao-Kuang Pian, Minh-Tram D. Nguyen, Theodore E. Posch, Jeffrey E. Juhre
14 Jun 1990
TL;DR: In this article, a data driven method for coordinating the processing of arithmetic tasks in a multiple computer system having a multiplicity of arithmetic processors by determining whether an arithmetic task is in a blocked condition or is in an execution ready condition is presented.
Abstract: A data driven method for coordinating the processing of arithmetic tasks in a multiple computer system having a multiplicity of arithmetic processors by determining whether an arithmetic task is in a blocked condition or is in an execution ready condition. A source distributed processor stores data in a local memory for processing by a local processor and then transfers the processed data to a global memory for buffering in preparation for subsequent processing by a destination distributed processor. The source distributed processor generates a produce message to a destination distributed processor to indicate that the data to be transferred is available in a buffer in the global memory. The destination distributed processor loads the data to be transferred from the buffer in the global memory and then generates a consume message to the source distributed processor to indicate that the data has been transferred from the global memory and the buffer in the global memory is now available.
Proceedings Article•
A Comparison of Programming Models for Shared Memory Multiprocessors.

[...]

Calvin Lin, Lawrence Snyder
1 Jan 1990
Proceedings Article•10.1109/SPDP.1990.143621•
Multi-version memory: software cache management for concurrent B-trees

[...]

W.E. Weihl1, P. Wang1•
Massachusetts Institute of Technology1
2 Dec 1990
TL;DR: The authors describe a new concurrent B-tree algorithm designed to work well in large-scale parallel or distributed systems in which the number of processors sharing the tree is large or the communication delay between processors is large relative to the speed of local computation.
Abstract: The authors describe a new concurrent B-tree algorithm. The algorithm is designed to work well in large-scale parallel or distributed systems in which the number of processors sharing the tree is large or the communication delay between processors (or between processors and the global memory for a shared-memory system) is large relative to the speed of local computation. The basis of the algorithm is an abstraction that is similar to coherent shared memory, but provides a weaker semantics; this abstraction is called multiversion memory. Multi-version memory uses caches but weakens the semantics of ordinary shared memory by allowing process reading data to be given an old version of the data. This semantics is adequate for the non-leaf nodes in the B-tree algorithms presented. >
Patent•
Method and apparatus for circuit simulation using parallel processors including memory arrangements and matrix decomposition synchronization

[...]

Gabriel P. Bischoff, Steven S. Greenberg
23 Apr 1990
TL;DR: In this article, a digital data processing system including a plurality of processors processes a program in parallel to load process data into a two-dimensional matrix having plurality of matrix entries, and each processor can separately generate process data for different matrix entries from the preliminary data, there is no conflict in accessing of the memory locations among the various processors during of the process data.
Abstract: A digital data processing system including a plurality of processors processes a program in parallel to load process data into a two-dimensional matrix having a plurality of matrix entries. So that the processors will not have to synchronize loading of process data into particular locations in the matrix, the matrix has a third dimension defining a plurality of memory locations, with each series of locations along the third dimension being associated with one of the matrix entries. Each processor initially loads preliminary process data into a memory location along the third dimension. After that has been completed, each processor generates process data for an entry of the two-dimensional matrix from the preliminary process data in the locations along the third dimension related thereto. Since the processors separately load preliminary process data into different memory locations, along the third dimension, there is no conflict with accessing of memory locations among the various processors during generation of preliminary process data. Further, since the processors can separately generate process data for different matrix entries from the preliminary data, there is no conflict in accessing of the memory locations among the various processors during of the process data.
Book Chapter•10.1007/3-540-54195-0_63•
PARSAC-2: A Parallel SAC-2 Based on Threads

[...]

Wolfgang Kuechlin1•
Ohio State University1
20 Aug 1990-Applicable Algebra in Engineering, Communication and Computing
TL;DR: It is demonstrated that S-threads permit a parallelization of SAC-2 down to the lowest algebraic level, and how a key parameter of the S- threads memory design influences parallel performance is shown.
Abstract: We describe the design of PARSAC-2, a parallel version of the SAC-2 Computer Algebra system In PARSAC-2, parallelism is based on multiple threads (lightweight processes) executing on a shared memory multiprocessor The S-threads subsystem provides threads which are capable of parallel list processing on a shared heap The S-threads heap memory is designed to allow concurrent list cell allocation by multiple threads with minimal synchronization overhead S-threads may also perform parallel garbage collection, and a slightly weaker form of storage management called preventive garbage collection We present an example of algorithm development in PARSAC by parallelizing the SAC-2 algorithm IPRODK, an integer multiplication routine based on Karatsuba's method Using empirical data from this experiment, we demonstrate that S-threads permit a parallelization of SAC-2 down to the lowest algebraic level Finally, we show how a key parameter of the S-threads memory design influences parallel performance
Proceedings Article•10.1109/ICDE.1990.113506•
Update propagation in distributed memory hierarchy

[...]

Matthew Bellew1, Meichun Hsu1, V.-O. Tam1•
Harvard University1
5 Feb 1990
TL;DR: A DMH system is presented, the tradeoffs between conservative and aggressive update propagation strategies are defined, and promising new strategies are identified.
Abstract: A distributed memory hierarchy (DMH) is a memory system consisting of storage modules distributed over a high-bandwidth local area network. It provides for transaction applications an abstraction of single virtual memory space to which shared data are mapped. As in a conventional memory hierarchy (MH) in a single-machine system, a DMH is responsible for locating, migrating, and caching data pages; however, unlike a conventional MH, a DMH must do so across the storage modules in a network. In addition, a DMH must handle the problem of propagation of transaction updates preserving serializability of transactions. The performance of a DMH system is strongly influenced by concurrency control and update propagation. It is also crucial that performance analysis accounts for memory resources and network requirements. A DMH system is presented, the tradeoffs between conservative and aggressive update propagation strategies are defined, and promising new strategies are identified. >
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve