Top 142 papers published in the topic of Distributed memory in 1988

Showing papers on "Distributed memory published in 1988"

Book•

Sparse Distributed Memory

[...]

30 Nov 1988

TL;DR: Pentti Kanerva's Sparse Distributed Memory presents a mathematically elegant theory of human long term memory that resembles the cortex of the cerebellum, and provides an overall perspective on neural systems.

...read moreread less

Abstract: From the Publisher: Motivated by the remarkable fluidity of memory the way in which items are pulled spontaneously and effortlessly from our memory by vague similarities to what is currently occupying our attention Sparse Distributed Memory presents a mathematically elegant theory of human long term memory The book, which is self contained, begins with background material from mathematics, computers, and neurophysiology; this is followed by a step by step development of the memory model The concluding chapter describes an autonomous system that builds from experience an internal model of the world and bases its operation on that internal model Close attention is paid to the engineering of the memory, including comparisons to ordinary computer memories Sparse Distributed Memory provides an overall perspective on neural systems The model it describes can aid in understanding human memory and learning, and a system based on it sheds light on outstanding problems in philosophy and artificial intelligence Applications of the memory are expected to be found in the creation of adaptive systems for signal processing, speech, vision, motor control, and (in general) robots Perhaps the most exciting aspect of the memory, in its implications for research in neural networks, is that its realization with neuronlike components resembles the cortex of the cerebellum Pentti Kanerva is a scientist at the Research Institute for Advanced Computer Science at the NASA Ames Research Center and a visiting scholar at the Stanford Center for the Study of Language and Information A Bradford Book

...read moreread less

1,163 citations

Journal Article•10.1007/BF00128175•

Compiling programs for distributed-memory multiprocessors

[...]

David Callahan¹, Ken Kennedy¹•Institutions (1)

Rice University¹

01 Oct 1988-The Journal of Supercomputing

TL;DR: One possible input language for describing distributions is described and one efficient message-passing program is derived from a sequential shared-memory program annotated with directions on how elements of shared arrays are distributed to processors.

...read moreread less

Abstract: We describe a new approach to programming distributed-memory computers. Rather than having each node in the system explicitly programmed, we derive an efficient message-passing program from a sequential shared-memory program annotated with directions on how elements of shared arrays are distributed to processors. This article describes one possible input language for describing distributions and then details the compilation process and the optimization necessary to generate an efficient program.

...read moreread less

314 citations

Patent•

Memory address mechanism in a distributed memory architecture

[...]

Osey C. Parrish, Robert E. Peiffer, James H. Thomas, Edwin J. Hilpert

15 Dec 1988

TL;DR: In this article, the use of the partitioning process permits data to be duplicated throughout a distributed system architecture and permits read cycles for shared data to execute at local bus speeds.

...read moreread less

Abstract: A computer system having plural nodes interconnected by a common broadcast bus. Each node has memory and at least one node has a processor. The system has a dynamically configurable memory which may be located within the system address space of a distributed system architecture including memory within each node having a processor and the memory resident within other nodes. The memory in the system address space is addressable by system physical addresses which are isolated from the physical addresses for memory in each node. The node physical addresses are translatable to and from the system physical addresses by partition maps located in partition tables at each node. Memory located anywhere in the distributed system architecture may be partitioned dynamically and accessed on a local basis by programming the partition tables, stored in partitioning RAMs. The use of the partitioning process permits data to be duplicated throughout a distributed system architecture and permits read cycles for shared data to execute at local bus speeds.

...read moreread less

175 citations

Proceedings Article•10.5555/62972.62977•

A processor architecture for Horizon

[...]

M. R. Thistle, Burton Smith¹•Institutions (1)

Tera Computer Company¹

1 Nov 1988

TL;DR: The architecture of the processor in the Horizon system is described, which will be capable performing at a rate of several hundred MFLOPS (millions of floating-point operations per second) to achieve an overall system performance target of 100 GFLOPS.

...read moreread less

Abstract: Horizon is a scalable shared-memory Multiple Instruction stream - Multiple Data stream (MIMD) computer architecture independently under study at the Supercomputing Research Center (SRC) and Tera Computer Company. It is composed of a few hundred identical scalar processors and a comparable number of memories, sparsely embedded in a three-dimensional nearest-neighbor network. Each processor has a horizontal instruction set that can issue up to three floating point operations per cycle without resorting to vector operations. Processors will each be capable of performing several hundred Million Floating Point Operations Per Second (FLOPS) in order to achieve an overall system performance target of 100 Billion (1011) FLOPS.This paper describes the architecture of the processor in the Horizon system. In the fashion of the Denelcor HEP, the processor maintains a variable number of Single Instruction stream - Single Data stream (SISD) processes, which are called instruction streams. Memory latency introduced by the large shared memory is hidden by switching context (instruction stream) each machine cycle. The processor functional units are pipelined to achieve high computational throughput rates; however, pipeline dependencies are hidden from user code. Hardware mechanisms manage the resources to guarantee anonymity and independence of instruction streams.

...read moreread less

172 citations

Journal Article•10.1137/0909037•

Parallel solution of triangular systems on distributed-memory multiprocessors

[...]

Michael T. Heath, Charles H. Romine

01 May 1988-Siam Journal on Scientific and Statistical Computing

TL;DR: Several parallel algorithms are presented for solving triangular systems of linear equations on distributed-memory multiprocessors and new wavefront algorithms are developed for both row-oriented and column-oriented matrix storage.

...read moreread less

Abstract: Several parallel algorithms are presented for solving triangular systems of linear equations on distributed-memory multiprocessors. New wavefront algorithms are developed for both row-oriented and column-oriented matrix storage. Performance of the new algorithms and several previously proposed algorithms is analyzed theoretically and illustrated empirically using implementations on commercially available hypercube multiprocessors.

...read moreread less

172 citations

Journal Article•10.1137/0909021•

Sparse Cholesky factorization on a local-memory multiprocessor

[...]

Alan George¹, Michael T. Heath², Joseph W. H. Liu³, Esmond Ng²•Institutions (3)

University of Tennessee¹, Oak Ridge National Laboratory², York University³

01 Mar 1988-Siam Journal on Scientific and Statistical Computing

TL;DR: This article deals with the problem of factoring a large sparse positive definite matrix on a multiprocessor system where the processors are assumed to have substantial local memory but no globally shared memory.

...read moreread less

Abstract: This article deals with the problem of factoring a large sparse positive definite matrix on a multiprocessor system. The processors are assumed to have substantial local memory but no globally shared memory. They communicate among themselves and with a host processor through message passing. Our primary interest is in designing an algorithm which exploits parallelism, rather than in exploiting features of the underlying topology of the hardware. However, part of our study is aimed at determining, for certain sparse matrix problems, whether hardware based on the binary hypercube topology adequately supports the communication requirements for such problems. Numerical results from experiments conducted on a hypercube multiprocessor are included.

...read moreread less

150 citations

Journal Article•10.1109/2.17•

Sequoia: a fault-tolerant tightly coupled multiprocessor for transaction processing

[...]

P.A. Bernstein

01 Feb 1988-IEEE Computer

TL;DR: It is shown the kernel, through a combination of locking, shadowed memory, and controlled flushing of non-write-through cache, maintains a consistent main memory state recoverable from any single-point failure.

...read moreread less

Abstract: The Sequoia computer is a tightly coupled multiprocessor that avoids most of the fault-tolerance disadvantages of tight coupling by using a fault-tolerant hardware-design approach. An overview is give of how the hardware architecture and operating system (OS) work together to provide a high degree of fault tolerance with good system performance. A description of hardware is followed by a discussion of the multiprocessor synchronization problem. Kernel support for fault recovery and the recovery process itself are examined. It is shown the kernel, through a combination of locking, shadowed memory, and controlled flushing of non-write-through cache, maintains a consistent main memory state recoverable from any single-point failure. The user shared memory is also discussed. >

...read moreread less

145 citations

Patent•

Topologically-distributed-memory multiprocessor computer

[...]

Herbert R. Carleton¹, J. Q. Broughton¹•Institutions (1)

State University of New York System¹

22 Jan 1988

TL;DR: In this paper, a modular, expandable, topologically-distributed-memory multiprocessor computer comprises a plurality of non-directly communicating slave processors under the control of a synchronizer and a master processor.

...read moreread less

Abstract: A modular, expandable, topologically-distributed-memory multiprocessor computer comprises a plurality of non-directly communicating slave processors under the control of a synchronizer and a master processor. Memory space is partitioned into a plurality of memory cells. Dynamic variables may be mapped into the memory cells so that they depend upon processing in nearby partitions. Each slave processor is connected in a topologically well-defined way through a dynamic bi-directional switching system (gateway) to different respective ones of the memory cells. Access by the slave processors to their respective topologically similar memory cells occurs concurrently or in parallel in such a way that no data-flow conflicts occur. The topology of data distribution may be chosen to take advantage of symmetries which occur in broad classes of problems. The system may be tied to a host computer used for data storage and analysis of data not efficiently processed by the multiprocessor computer.

...read moreread less

133 citations

Journal Article•10.1137/0909042•

$LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures

[...]

George A. Geist, Charles H. Romine

01 Jul 1988-Siam Journal on Scientific and Statistical Computing

TL;DR: It is concluded that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting.

...read moreread less

Abstract: In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.

...read moreread less

98 citations

Patent•

Enhanced input/ouput architecture for toroidally-connected distributed-memory parallel computers

[...]

Ronald S. Cok¹•Institutions (1)

Eastman Kodak Company¹

29 Sep 1988

TL;DR: A toroidally-connected distributed-memory parallel computer with rows of processors with each processor having an independent memory is described in this article, where each buffering mechanism is associated with one processor of the single row of processors.

...read moreread less

Abstract: A toroidally-connected distributed-memory parallel computer having rows of processors (12), with each processor having an independent memory. The computer includes at least one common I/O channel (26) adapted to be connected to a single row of processors (20) by buffering (24) mechanisms. Each buffering mechanism is associated with one processor of the single row of processors.

...read moreread less

98 citations

Journal Article•10.1137/0909032•

A parallel triangular solver for distributed-memory multiprocessor

[...]

Guangye Li¹, Thomas F. Coleman¹•Institutions (1)

Cornell University¹

01 May 1988-Siam Journal on Scientific and Statistical Computing

TL;DR: This work considers solving triangular systems of linear equations on a distributed-memory multiprocessor which allows for a ring embedding and proposes a parallel algorithm, applicable when the triangular matrix is distributed by column in a wrap fashion.

...read moreread less

Abstract: We consider solving triangular systems of linear equations on a distributed-memory multiprocessor which allows for a ring embedding. Specifically, we propose a parallel algorithm, applicable when the triangular matrix is distributed by column in a wrap fashion. Numerical experiments indicate that the new algorithm is very efficient in some circumstances (in particular, when the size of the problem is sufficiently large relative to the number of processors).A theoretical analysis confirms that the total running time varies linearly, with respect to the matrix order, up to a threshold value of the matrix order, after which the dependence is quadratic. Moreover, we show that total message traffic is essentially the minimum possible.Finally, we describe an analogous row-oriented algorithm.

...read moreread less

Patent•

Multiprocessing system having nodes containing a processor and an associated memory module with dynamically allocated local/global storage in the memory modules

[...]

William C. Brantley¹, Kevin Patrick Mcaulifee¹, Vern Alan Norton¹, Gregory Francis Pfister¹, Joseph Weiss¹ - Show less +1 more•Institutions (1)

IBM¹

16 Mar 1988

TL;DR: In this paper, a multiprocessing system is presented having a plurality of processing nodes interconnected together by a communication network, each processing node including a processor, responsive to user software running on the system, and an associated memory module, and capable under user control of dynamically partitioning each memory module into a global storage efficiently accessible by a number of processors connected to the network, and local storage efficient accessible by its associated processor.

...read moreread less

Abstract: A multiprocessing system is presented having a plurality of processing nodes interconnected together by a communication network, each processing node including a processor, responsive to user software running on the system, and an associated memory module, and capable under user control of dynamically partitioning each memory module into a global storage efficiently accessible by a number of processors connected to the network, and local storage efficiently accessible by its associated processor.

...read moreread less

Patent•

Computer vector multiprocessing control with multiple access memory and priority conflict resolution method

[...]

Steve S. Chen¹, Alan J. Schiffleger¹•Institutions (1)

Cray¹

16 Jun 1988

TL;DR: In this paper, a multiprocessing system and a method for multi-processing is described, in which a pair of processors are connected to a central memory through a plurality of memory reference ports, and each processor is further connected to shared registers which may be directly addressed by either processor at rates commensurate with intra-processor operation.

...read moreread less

Abstract: A multiprocessing system and method for multiprocessing is disclosed A pair of processors are provided, and each are connected to a central memory through a plurality of memory reference ports The processors are further each connected to a plurality of shared registers which may be directly addressed by either processor at rates commensurate with intra-processor operation The shared registers include registers for holding scalar and address information and registers for holding information to be used in coordinating the transfer of information through the shared registers A multiport memory is provided and includes a conflict resolution circuit which senses and prioritizes conflicting references to the central memory Each CPU is interfaced with the central memory through three ports, with each of the ports handling different ones of several different types of memory references which may be made At least one I/O port is provided to be shared by the processors in transferring information between the central memory and peripheral storage devices A vector register design is also disclosed for use in vector processing computers, and provides that each register consist of at least two independently addressable memories, to deliver data to or accept data from a functional unit The method of multiprocessing permits multitasking in the multiprocessor, in which the shared registers allow independent tasks of different jobs or related tasks of a single job to be run concurrently, and facilitate multithreading of the operating system by permitting multiple critical code regions to be independently synchronized

...read moreread less

Journal Article•10.1007/BF00128176•

Compiling parallel programs by optimizing performance

[...]

Marina C. Chen¹, Young-il Choo¹, Jingke Li¹•Institutions (1)

Yale University¹

01 Oct 1988-The Journal of Supercomputing

TL;DR: This paper describes how Crystal, a language based on familiar mathematical notation and lambda calculus, addresses the issues of programmability and performance for parallel supercomputers and illustrates the power of its approach with benchmarks of compiled parallel code from Crystal source.

...read moreread less

Abstract: This paper describes how Crystal, a language based on familiar mathematical notation and lambda calculus, addresses the issues of programmability and performance for parallel supercomputers. Some scientifc programmers and theoreticians may ask, “What is new about Crystal?” or “How is it different from existing functional languages?” The answers lie in its model of parallel computation and a theory of parallel program optimization, and we examine this in the text to follow. We illustrate the power of our approach with benchmarks of compiled parallel code from Crystal source. The target machines are hypercube multiprocessors with distributed memory, on which it is considered difficult for functional programs to achieve high efficiency.

...read moreread less

Journal Article•

Data Diffusion Machine - A Scalable Shared Virtual Memory Multiprocessor.

[...]

David H. D. Warren, Seif Haridi

01 Jan 1988-Future Generation Computer Systems

Patent•

Method of up-front load balancing for local memory parallel processors

[...]

Paul T. Baffes

12 Dec 1988

TL;DR: In this paper, a method for uniformly balancing the aggregate computational load in, and utilizing a minimal memory by, a network having identical computations to be executed at each connection therein is disclosed.

...read moreread less

Abstract: In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing a minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of partitions, and the computational load is subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which memory is subdivided.

...read moreread less

Journal Article•10.1137/0909038•

Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessors

[...]

Stanley C. Eisenstat¹, Michael T. Heath, Charles S. Henkel, Charles H. Romine•Institutions (1)

Yale University¹

01 May 1988-Siam Journal on Scientific and Statistical Computing

TL;DR: New parallel algorithms and comparative test results are given for solving triangular systems of linear equations on distributed-memory multiprocessors and the new algorithms are shown to provide substantial performance improvements.

...read moreread less

Abstract: New parallel algorithms and comparative test results are given for solving triangular systems of linear equations on distributed-memory multiprocessors. These results supplement those given in a previous paper. All of the new algorithms are variations on the cyclic algorithms discussed previously. The new algorithms are shown to provide substantial performance improvements.

...read moreread less

Proceedings Article•10.1145/62546.62547•

The cost of messages

[...]

Jim Gray

1 Jan 1988

TL;DR: This model abstracts the three degrees of distribution: shared memory, local network, and wide area network and quantifies these differences for past, current, and future technologies.

...read moreread less

Abstract: Distributed systems can be modeled as processes communicating via messages. This model abstracts the three degrees of distribution: shared memory, local network, and wide area network. Although these three forms of distribution are qualitatively the same, there are huge quantitative differences in their message transport costs and message transport reliability. This paper quantifies these differences for past, current, and future technologies. Table of

...read moreread less

Design, implementation, and performance evaluation of a distributed shared memory server for Mach

[...]

Alessandro Forin

1 Jan 1988

TL;DR: A new distributed algorithm is shown to outperform centralized ones and provide unrestricted sharing of read-write memory between tasks running on either strongly coupled or loosely coupled architectures, and any mixture thereof.

...read moreread less

Abstract: This report describes the design, implementation and performance evaluation of a virtual shared memory server for the Mach operating system. The server provides unrestricted sharing of read-write memory between tasks running on either strongly coupled or loosely coupled architectures, and any mixture thereof. A number of memory coherency algorithms have been implemented and evaluated, including a new distributed algorithm that is shown to outperform centralized ones. Some of the features of the server include support for machines with multiple page sizes, for heterogeneous shared memory, and for fault tolerance. Extensive performance measures of applications are presented, and the intrinsic costs evaluated. Table of

...read moreread less

The architecture and implementation of MEMNET: a high--speed shared-memory computer communication network

[...]

Gary Scott Delp

1 Jun 1988

TL;DR: Analytic and experimental results confirm the viability of distributed shared memory supported in hardware at the memory controller level and the impact of distributing the memory resource is under 10% of the undistributed performance.

...read moreread less

Abstract: A major limitation of current distributed system technology is that the overhead associated with the normal input/output paradigm of interconnection severely affects the system performance. This research has taken a new perspective on the interconnection problem based on a memory extension paradigm. Evidence to date demonstrates that the processor overhead is greatly reduced and significant additional functionality is gained. Memnet is a computer architecture in which the local network appears as memory in the physical address space of each processor on the network. Local area networking and distributed system support are two potential applications of this architecture. The Memnet principles of computer/communication interconnection are extendable to wide-area, high-speed, low-latency processor interconnection. This dissertation includes a survey of interprocess communication schemes and shared memory architectures. A description of the Memnet architecture and implementation is followed by an analysis of the behavior of the Memnet architecture over a wide range of uses. The state machines and schematics of the experimental implementation are included as appendices. Analytic and experimental results confirm the viability of distributed shared memory supported in hardware at the memory controller level. For many applications, the impact of distributing the memory resource is under 10% of the undistributed performance.

...read moreread less

Proceedings Article•

The Uniform System: An approach to runtime support for large scale shared memory parallel processors.

[...]

Robert H. Thomas, William R. Crowther

1 Jan 1988

Journal Article•10.1109/40.521•

The Balance multiprocessor system

[...]

S. Thakkar, P. Gifford, G. Fielland

01 Jan 1988-IEEE Micro

TL;DR: A description is given of the architecture, operating system, and performance of Balance, a shared-memory, tightly coupled multiprocessor system that supports both the 4.2 BSD and System V Unix environments.

...read moreread less

Abstract: A description is given of the architecture, operating system, and performance of Balance, a shared-memory, tightly coupled multiprocessor system. Balance can contain two to thirty 32-bit microprocessors with an aggregate performance of up to 21 million instructions per second (MIPS). Each processor has a private cache as well as a small local memory to hold frequently used kernel routines. The system features a high-bandwidth pipelined bus, up to 28 Mbytes of main memory, a diagnostic and console processor, up to four IEEE 769 (Multibus) adapters, an IEEE 802.3 (Ethernet) LAN interface, and an ANSI Small Computer System Interface (SCSI). Dynix, a multiprocessor operating system supporting both the 4.2 BSD and System V Unix environments, manages Balance, providing transparent support for multiprocessing as well as tools and libraries for developing parallel applications. The various subsystems and the Dynix operating system are examined. Applications and performance are discussed. >

...read moreread less

Patent•

[...]

Wen-Tai Lin¹, Jyh-Pin Hwang¹•Institutions (1)

General Electric¹

22 Dec 1988

TL;DR: In this article, a crossbar switch is constructed in monolithic integrated circuit form together with respective memory cells controlling each of the component crosspoint switches in the cross bar switch, which reduces the number of bits which must be provided in parallel to the integrated circuit for controlling the cross point switches.

...read moreread less

Abstract: A crossbar switch is constructed in monolithic integrated circuit form together with respective memory cells controlling each of the component crosspoint switches in the crossbar switch. The memory cells permit control signals for the crosspoint switches to be supplied serially to the monolithic integrated circuit and thus permit those control signals to be supplied in coded form as orthogonal cross addressing for the memory cells. This reduces the number of bits which must be provided in parallel to the integrated circuit for controlling the crosspoint switches. In preferred embodiments of the crossbar switch, provision is made for operation as a corner-turn array for rotating bit matrices and for faster operation as a barrel shifter.

...read moreread less

A parallel implementation of logic programs

[...]

Yow-Jian Lin

1 Jun 1988

Journal Article•10.1145/48675.48682•

Completing an MIMD multiprocessor taxonomy

[...]

Eric E. Johnson

01 Jun 1988-ACM Sigarch Computer Architecture News

TL;DR: This taxonomy of MIMD multiprocessor architectures, classified as shared memory, message passing, or "hybrid" architectures, is shown to be incomplete, and an alternative complete taxonomy is suggested.

...read moreread less

Abstract: MIMD multiprocessor architectures have been classified as shared memory, message passing, or "hybrid" architectures. This taxonomy is shown to be incomplete, and an alternative complete taxonomy is suggested. Examples of each class of the taxonomy are discussed, along with general attributes of the classes.

...read moreread less

...

Expand