Top 285 papers published in the topic of Hypercube in 1991

Showing papers on "Hypercube published in 1991"

Book•

Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes

[...]

1 Sep 1991

TL;DR: This chapter discusses sorting on a Linear Array with a Systolic and Semisystolic Model of Computation, which automates the very labor-intensive and therefore time-heavy and expensive process of manually sorting arrays.

...read moreread less

Abstract: Preface Acknowledgments Notation 1 Arrays and Trees 1.1 Elementary Sorting and Counting 1.1.1 Sorting on a Linear Array Assessing the Performance of the Algorithm Sorting N Numbers with Fewer Than N Processors 1.1.2 Sorting in the Bit Model 1.1.3 Lower Bounds 1.1.4 A Counterexample-Counting 1.1.5 Properties of the Fixed-Connection Network Model 1.2 Integer Arithmetic 1.2.1 Carry-Lookahead Addition 1.2.2 Prefix Computations-Segmented Prefix Computations 1.2.3 Carry-Save Addition 1.2.4 Multiplication and Convolution 1.2.5 Division and Newton Iteration 1.3 Matrix Algorithms 1.3.1 Elementary Matrix Products 1.3.2 Algorithms for Triangular Matrices 1.3.3 Algorithms for Tridiagonal Matrices -Odd-Even Reduction -Parallel Prefix Algorithms 1.3.4 Gaussian Elimination 1.3.5 Iterative Methods -Jacobi Relaxation -Gauss-Seidel Relaxation Finite Difference Methods -Multigrid Methods 1.4 Retiming and Systolic Conversion 1.4.1 A Motivating Example-Palindrome Recognition 1.4.2 The Systolic and Semisystolic Model of Computation 1.4.3 Retiming Semisystolic Networks 1.4.4 Conversion of a Semisystolic Network into a Systolic Network 1.4.5 The Special Case of Broadcasting 1.4.6 Retiming the Host 1.4.7 Design by Systolic Conversion-A Summary 1.5 Graph Algorithms 1.5.1 Transitive Closure 1.5.2 Connected Components 1.5.3 Shortest Paths 1.5.4 Breadth-First Spanning Trees 1.5.5 Minimum Weight Spanning Trees 1.6 Sorting Revisited 1.6.1 Odd-Even Transposition Sort on a Linear Array 1.6.2 A Simple Root-N(log N + 1)-Step Sorting Algorithm 1.6.3 A (3 Root- N + o(Root-N))-Step Sorting Algorithm 1.6.4 A Matching Lower Bound 1.7 Packet Routing 1.7.1 Greedy Algorithms 1.7.2 Average-Case Analysis of Greedy Algorithms -Routing N Packets to Random Destinations -Analysis of Dynamic Routing Problems 1.7.3 Randomized Routing Algorithms 1.7.4 Deterministic Algorithms with Small Queues 1.7.5 An Off-line Algorithm 1.7.6 Other Routing Models and Algorithms 1.8 Image Analysis and Computational Geometry 1.8.1 Component-Labelling Algorithms -Levialdi's Algorithm -An O (Root-N)-Step Recursive Algorithm 1.8.2 Computing Hough Transforms 1.8.3 Nearest-Neighbor Algorithms 1.8.4 Finding Convex Hulls 1.9 Higher-Dimensional Arrays 1.9.1 Definitions and Properties 1.9.2 Matrix Multiplication 1.9.3 Sorting 1.9.4 Packet Routing 1.9.5 Simulating High-Dimensional Arrays on Low-Dimensional Arrays 1.10 problems 1.11 Bibliographic Notes 2 Meshes of Trees 2.1 The Two-Dimensional Mesh of Trees 2.1.1 Definition and Properties 2.1.2 Recursive Decomposition 2.1.3 Derivation from KN,N 2.1.4 Variations 2.1.5 Comparison With the Pyramid and Multigrid 2.2 Elementary O(log N)-Step Algorithms 2.2.1 Routing 2.2.2 Sorting 2.2.3 Matrix-Vector Multiplication 2.2.4 Jacobi Relaxation 2.2.5 Pivoting 2.2.6 Convolution 2.2.7 Convex Hull 2.3 Integer Arithmetic 2.3.1 Multiplication 2.3.2 Division and Chinese Remaindering 2.3.3 Related Problems -Iterated Products -Rooting Finding 2.4 Matrix Algorithms 2.4.1 The Three-Dimensional Mesh of Trees 2.4.2 Matrix Multiplication 2.4.3 Inverting Lower Triangular Matrices 2.4.4 Inverting Arbitrary Matrices -Csanky's Algorithm -Inversion by Newton Iteration 2.4.5 Related Problems 2.5 Graph Algorithms 2.5.1 Minimum-Weight Spanning Trees 2.5.2 Connected Components 2.5.3 Transitive Closure 2.5.4 Shortest Paths 2.5.5 Matching Problems 2.6 Fast Evaluation of Straight-Line Code 2.6.1 Addition and Multiplication Over a Semiring 2.6.2 Extension to Codes with Subtraction and Division 2.6.3 Applications 2.7 Higher-Dimensional meshes of Trees 2.7.1 Definitions and Properties 2.7.2 The Shuffle-Tree Graph 2.8 Problems 2.9 Bibliographic Notes 3 Hypercubes and Related Networks 3.1 The Hypercube 3.1.1 Definitions and Properties 3.1.2 Containment of Arrays -Higher-Dimensional Arrays -Non-Power-of-2 Arrays 3.1.3 Containment of Complete Binary Trees 3.1.4 Embeddings of Arbitrary Binary Trees -Embeddings with Dilation 1 and Load O(M over N + log N) -Embeddings with Dilation O(1) and Load O (M over N + 1) -A Review of One-Error-Correcting Codes -Embedding Plog N into Hlog N 3.1.5 Containment of Meshes of Trees 3.1.6 Other Containment Results 3.2 The Butterfly, Cube-Connected-Cycles , and Benes Network 3.2.1 Definitions and Properties 3.2.2 Simulation of Arbitrary Networks 3.2.3 Simulation of Normal Hypercube Algorithms 3.2.4 Some Containment and Simulation Results 3.3 The Shuffle-Exchange and de Bruijn Graphs 3.3.1 Definitions and Properties 3.3.2 The Diaconis Card Tricks 3.3.3 Simulation of Normal Hypercube Algorithms 3.3.4 Similarities with the Butterfly 3.3.5 Some Containment and Simulation Results 3.4 Packet-Routing Algorithms 3.4.1 Definitions and Routing Models 3.4.2 Greedy Routing Algorithms and Worst-Case Problems 3.4.3 Packing, Spreading, and Monotone Routing Problems -Reducing a Many-to-Many Routing Problem to a Many-to-One Routing Problem -Reducing a Routing Problem to a Sorting Problem 3.4.4 The Average-Case Behavior of the Greedy Algorithm -Bounds on Congestion -Bounds on Running Time -Analyzing Non-Predictive Contention-Resolution Protocols 3.4.5 Converting Worst-Case Routing Problems into Average-Case Routing Problems -Hashing -Randomized Routing 3.4.6 Bounding Queue Sizes -Routing on Arbitrary Levelled Networks 3.4.7 Routing with Combining 3.4.8 The Information Dispersal Approach to Routing -Using Information Dispersal to Attain Fault-Tolerance -Finite Fields and Coding Theory 3.4.9 Circuit-Switching Algorithms 3.5 Sorting 3.5.1 Odd-Even Merge Sort -Constructing a Sorting Circuit with Depth log N(log N +1)/2 3.5.2 Sorting Small Sets 3.5.3 A Deterministic O(log N log log N)-Step Sorting Algorithm 3.5.4 Randomized O(log N)-Step Sorting Algorithms -A Circuit with Depth 7.45 log N that Usually Sorts 3.6 Simulating a Parallel Random Access Machine 3.6.1 PRAM Models and Shared Memories 3.6.2 Randomized Simulations Based on Hashing 3.6.3 Deterministic Simulations using Replicated Data 3.6.4 Using Information Dispersal to Improve Performance 3.7 The Fast Fourier Transform 3.7.1 The Algorithm 3.7.2 Implementation on the Butterfly and Shuffle-Exchange Graph 3.7.3 Application to Convolution and Polynomial Arithmetic 3.7.4 Application to Integer Multiplication 3.8 Other Hypercubic Networks 3.8.1 Butterflylike Networks -The Omega Network -The Flip Network -The Baseline and Reverse Baseline Networks -Banyan and Delta Networks -k-ary Butterflies 3.8.2 De Bruijn-Type Networks -The k-ary de Bruijn Graph -The Generalized Shuffle-Exchange Graph 3.9 Problems 3.10 Bibliographic Notes Bibliography Index Lemmas, Theorems, and Corollaries Author Index Subject Index

...read moreread less

3,130 citations

Journal Article•10.1109/71.80187•

Properties and performance of folded hypercubes

[...]

A. El-Amawy¹, Shahram Latifi²•Institutions (2)

Louisiana State University¹, University of Nevada, Las Vegas²

01 Jan 1991-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A new hypercube-type structure, the folded hypercube (FHC), which is basically a standard hypercube with some extra links established between its nodes, is proposed and analyzed and it is shown that this structure offers substantial improvement over existing hyper cube-type networks in terms of the above-mentioned network parameters.

...read moreread less

Abstract: A new hypercube-type structure, the folded hypercube (FHC), which is basically a standard hypercube with some extra links established between its nodes, is proposed and analyzed. The hardware overhead is almost 1/n, n being the dimensionality of the hypercube, which is negligible for large n. For this new design, optimal routing algorithms are developed and proven to be remarkably more efficient than those of the conventional n-cube. For one-to-one communication, each node can reach any other node in the network in at most (n/2) hops (each hop corresponds to the traversal of a single link), as opposed to n hops in the standard hypercube. One-to-all communication (broadcasting) can also be performed in only (n/2) steps, yielding a 50% improvement in broadcasting time over that of the standard hypercube. All routing algorithms are simple and easy to implement. Correctness proofs for the algorithms are given. For the proposed architecture, communication parameters such as average distance, message traffic density, and communication time delay are derived. In addition, some fault tolerance capabilities of this architecture are quantified and compared to those of the standard cube. It is shown that this structure offers substantial improvement over existing hypercube-type networks in terms of the above-mentioned network parameters. >

...read moreread less

485 citations

Journal Article•10.1109/12.102840•

A variation on the hypercube with lower diameter

[...]

Kemal Efe¹•Institutions (1)

University of Louisiana at Lafayette¹

01 Nov 1991-IEEE Transactions on Computers

TL;DR: A new interconnection structure is proposed as a basis for distributed-memory parallel computer architectures that is a variation of the hypercube and preserves many of its desirable properties, including regularity and large vertex connectivity.

...read moreread less

Abstract: A new interconnection structure is proposed as a basis for distributed-memory parallel computer architectures. The network is a variation of the hypercube and preserves many of its desirable properties, including regularity and large vertex connectivity. It has the same node and link complexity, but has a diameter only about half of the hypercube's. Some of the basic properties of this topology are discussed. Efficient routing and broadcasting algorithms are presented. >

...read moreread less

309 citations

Journal Article•10.1109/12.76405•

Enhanced hypercubes

[...]

Nian-Feng Tzeng, S. Wei

01 Mar 1991-IEEE Transactions on Computers

TL;DR: A hypercube with extra connections added between pairs of nodes through otherwise unused links is investigated and achieves noticeable improvement in diameter, mean internode distance, and traffic density.

...read moreread less

Abstract: A hypercube with extra connections added between pairs of nodes through otherwise unused links is investigated. The extra connections are made in a way that maximizes the improvement of the performance measure of interest under various traffic distributions. The resulting hypercube, called the enhanced hypercube, requires a simple routing algorithm and is guaranteed not to create any traffic-congested points or links. The enhanced hypercube achieves noticeable improvement in diameter, mean internode distance, and traffic density, and it also is more cost effective than a regular hypercube. An efficient broadcast algorithm that can considerably speed up the broadcast process in enhanced hypercubes is provided. >

...read moreread less

181 citations

Journal Article•10.1109/12.67323•

The twisted N-cube with application to multiprocessing

[...]

Abdol-Hossein Esfahanian¹, Lionel M. Ni¹, Bruce E. Sagan¹•Institutions (1)

Michigan State University¹

01 Jan 1991-IEEE Transactions on Computers

TL;DR: It is shown that by exchanging any two independent edges in any shortest cycle of the n-cube, its diameter decreases by one unit, which leads to the definition of a new class of n-regular graphs, denoted TQ/sub n/, with 2/sup n/ vertices and diameter n-1, which has the (n-1)-cube as subgraph.

...read moreread less

Abstract: It is shown that by exchanging any two independent edges in any shortest cycle of the n-cube (n>or=3), its diameter decreases by one unit. This leads to the definition of a new class of n-regular graphs, denoted TQ/sub n/, with 2/sup n/ vertices and diameter n-1, which has the (n-1)-cube as subgraph. Other properties of TQ/sub n/ such as connectivity and the lengths of the disjoints paths are also investigated. Moreover, it is shown that the complete binary tree on 2/sup n/-1 vertices, which is not a subgraph of the n-cube, is a subgraph of TQ/sub n/. How these results can be used to enhance hypercube multiprocessors is discussed. >

...read moreread less

156 citations

Journal Article•10.1016/0010-4655(91)90097-5•

Molecular dynamics on hypercube parallel computers

[...]

William Hayden Smith¹•Institutions (1)

Daresbury Laboratory¹

01 Mar 1991-Computer Physics Communications

TL;DR: Three particular algorithms are described: replicated data (RD); systolic loop (SLS-G), and parallelised link-cells (PLC), all of which have good load balancing and the performance characteristics of each algorithm and the factors affecting their scaling properties are discussed.

...read moreread less

141 citations

Journal Article•10.1137/0404045•

On the existence of Hamiltonian circuits in faulty hypercubes

[...]

Mee Yee Chan, Shiang-Jen Lee

01 Sep 1991-SIAM Journal on Discrete Mathematics

TL;DR: It is shown that the problem of determining whether an n-cube with an arbitrary number of link faults has a Hamiltonian circuit is NP-complete.

...read moreread less

Abstract: The problem of finding Hamiltonian circuits in faulty hypercubes is explored. There are many different Hamiltonian circuits in a nonfaulty hypercube. The question of interest here is the following: if a certain number of links are removed from the hypercube, will a Hamiltonian circuit still exist? In partial answer to this question are the following results. First, it is shown that for any n-cube $( n\geqq 3 )$ with $\leqq 2n - 5$ link faults in which each node is incident to at least two nonfaulty links, there exists a Hamiltonian circuit consisting of only nonfaulty links. Since as will be shown, there exists an n-cube with $2n - 4$ faulty links, in which each node is incident to at least two nonfaulty links, for which there is no Hamiltonian circuit, this result is optimal. Second, it is shown that the problem of determining whether an n-cube with an arbitrary number of link faults has a Hamiltonian circuit is NP-complete.

...read moreread less

119 citations

Journal Article•10.1109/71.97904•

Parallel simulated annealing using speculative computation

[...]

E.E. Witte¹, Roger D. Chamberlain¹, Mark A. Franklin¹•Institutions (1)

Washington University in St. Louis¹

01 Oct 1991-IEEE Transactions on Parallel and Distributed Systems

TL;DR: In this article, a parallel simulated annealing algorithm that is problem-independent, maintains the serial decision sequence, and obtains speedup which can exceed log/sub 2/P on P processors is discussed.

...read moreread less

Abstract: A parallel simulated annealing algorithm that is problem-independent, maintains the serial decision sequence, and obtains speedup which can exceed log/sub 2/P on P processors is discussed. The algorithm achieves parallelism by using the concurrency technique of speculative computation. Implementation of the parallel algorithm on a hypercube multiprocessor and application to a task assignment problem are described. The simulated annealing solutions are shown to be, on average, 28% better than the solutions produced by a random task assignment algorithm and 2% better than the solutions produced by a heuristic. >

...read moreread less

112 citations

Proceedings Article•10.1109/DMCC.1991.633357•

The Mobius Cubes

[...]

P. Cull¹, S.M. Larson•Institutions (1)

Oregon State University¹

1 Jan 1991

TL;DR: By rearranging some of the connections in the hypercube, the Mobius cubes are obtained which have smaller distances (as measured in communication links) between processors.

...read moreread less

Abstract: The Mobius cubes are hypercube variants that give better performance with the same number of links and processors. We show that the diameter of the Mobius cubes is about one half the diameter of the equivalent hypercube, and that the average number of steps between processors for a Mobius cube is about two-thirds of the average for a hypercube. We give an efficient routing algorithm for the Mobius cubes. This routing algorithm finds a shortest path and operates in time proportional to the dimension of the cube. We also give efficient broadcast algorithms for the Mobius cubes. We show that the Mobius cubes contain ring networks and other networks. We report results of simulation studies on the dynamic message-passing performance of the hypercube, the Twisted Cube of P.A.J. Hilbers et al. (1987), and the Mobius cubes. Our results are in agreement with S. Abraham (1990), showing that the Twisted Cube has worse dynamic performance than the hypercube, but our results show that the 1-Mobius cube has dynamic performance superior to that of the hypercube. This contradicts current literature, which implies that twisted cube variants will have worse dynamic performance. >

...read moreread less

106 citations

Patent•

Improved hypercube topology for multiprocessor computer systems

[...]

Renben Shu¹, David H. C. Du¹•Institutions (1)

University of Minnesota¹

13 Feb 1991

TL;DR: In this article, a modified hypercube topology is described, which adds additional communication links between the most distant nodes of a classic hypercube, which is termed as a Modified Hypercube (MH) topology.

...read moreread less

Abstract: A hypercube system which has been modified by adding additional communication links between the most distant nodes of a classic hypercube topology is described herein. This improvement in a hypercube topology is termed as a Modified Hypercube topology. Such a topology contains extra links which connects a node to another node in the topology which requires the greatest number of nodal hops over the shortest path. Also stated another way, that node having the greatest number of singly traversed or hopped nodes along the shortest path from an originating node to that node makes that node the most distant processor node. If hamming were to be implemented in the system, there is added an extra link between two nodes having the greatest hamming distance. Such a system makes a technological trade off to reduce the diameter of a classic hypercube at the cost of incrementally increasing the number of I/O ports at each node. This trade off has been recognized in the industry as advantageous since a great gain in performance is achieved n exchange for an incremental impact to the hardware. Clearly the performance advantages of the present invention grows as the number of nodes in the hypercube grows and the maximum distance between nodes increases.

...read moreread less

102 citations

Journal Article•10.1109/71.80186•

A top-down processor allocation scheme for hypercube computers

[...]

Jong Kim¹, Chita R. Das¹, Woei Lin•Institutions (1)

Pennsylvania State University¹

01 Jan 1991-IEEE Transactions on Parallel and Distributed Systems

TL;DR: It is shown that the free list policy is optimal in a static environment, as are the other policies, and it also gives better subcube recognition ability compared to the previous schemes in a dynamic environment.

...read moreread less

Abstract: An efficient processor allocation policy is presented for hypercube computers. The allocation policy is called free list since it maintains a list of free subcubes available in the system. An incoming request of dimension k (2/sup k/ nodes) is allocated by finding a free subcube of dimension k or by decomposing an available subcube of dimension greater than k. This free list policy uses a top-down allocation rule in contrast to the bottom-up approach used by the previous bit-map allocation algorithms. This allocation scheme is compared to the buddy, gray code (GC), and modified buddy allocation policies reported for the hypercubes. It is shown that the free list policy is optimal in a static environment, as are the other policies, and it also gives better subcube recognition ability compared to the previous schemes in a dynamic environment. The performance of this policy, in terms of parameters such as average delay, system utilization, and time complexity, is compared to the other schemes to demonstrate its effectiveness. The extension of the algorithm for parallel implementation, noncubic allocation, and inclusion/exclusion allocation is also given. >

...read moreread less

Journal Article•10.1109/12.76410•

Heuristic technique for processor and link assignment in multicomputers

[...]

S.W. Bollinger, Scott F. Midkiff

01 Mar 1991-IEEE Transactions on Computers

TL;DR: A graph-based solution to the mapping problem using the simulated annealing optimization heuristic and implemented using the hypercube as a host architecture, and results for several image graphs are presented.

...read moreread less

Abstract: A graph-based solution to the mapping problem using the simulated annealing optimization heuristic is developed. An automated two-phase mapping strategy is formulated: process annealing assigns parallel processes to processing nodes, and connection annealing schedules traffic connections on network data links so that interprocess communication conflicts are minimized. To evaluate the quality of generated mappings. cost functions suitable for simulated annealing that accurately quantify communications overhead are derived. Communication efficiency is formulated to measure the quality of assignments when the optimal mapping is unknown. The mapping scheme is implemented using the hypercube as a host architecture, and results for several image graphs are presented. >

...read moreread less

Journal Article•10.1109/59.117001•

Parallel Newton type methods for power system stability analysis using local and shared memory multiprocessors

[...]

J.S. Chai¹, N. Zhu¹, Anjan Bose¹, Daniel Tylavsky¹•Institutions (1)

Arizona State University¹

01 Nov 1991-IEEE Transactions on Power Systems

TL;DR: The main thrust is to explore the match between the algorithms, their implementation, and the machine architectures, and to present various considerations together with the results.

...read moreread less

Abstract: Both the very dishonest Newton (VDHN) and the successive over relaxed (SOR) Newton algorithms have been implemented on the iPSC/2 and Alliant FX/8 computers for power system dynamic simulation using complex generator and nonlinear load models. The main thrust is to explore the match between the algorithms, their implementation, and the machine architectures. For example, the less parallel but sequentially faster VDHN runs faster on the hypercube (iPSC/2) whereas the more parallel SOR-Newton requires data sharing more often because of the extra iterations and does better on the Alliant. The implementation on the hypercube requires significant manual programming to schedule the processors and their communication whereas the compiler in the Alliant recognizes parallel steps but only if the software is properly coded. The authors present these various considerations together with the results. >

...read moreread less

Journal Article•10.1016/0743-7315(91)90129-W•

Designing fault-tolerant systems using automorphisms

[...]

Shantanu Dutt¹, John P. Hayes²•Institutions (2)

University of Minnesota¹, University of Michigan²

01 Jul 1991-Journal of Parallel and Distributed Computing

TL;DR: A general theory for modeling and designing fault-tolerant multiprocessor systems in a systematic and efficient manner is presented and the resulting designs are shown to be far superior to those proposed in previous work.

...read moreread less

Journal Article•10.1016/0021-9991(91)90216-8•

Parallel spectral element solution of the Stokes problem

[...]

Paul Fischer¹, Anthony T. Patera¹•Institutions (1)

Massachusetts Institute of Technology¹

02 Jan 1991-Journal of Computational Physics

TL;DR: A high-efficiency medium-grained parallel spectral element method for numerical solution of the Stokes problem in general domains and the performance of this algorithm-architecture coupling is evaluated in a technical and economic framework that reflects the true advantages of parallel solution of partial differential equations.

...read moreread less

Proceedings Article•10.1145/113379.113391•

Coding theory, hypercube embeddings, and fault tolerance

[...]

Bill Aiello, Tom Leighton¹•Institutions (1)

Massachusetts Institute of Technology¹

1 Jun 1991

TL;DR: This paper aims to demonstrate the efforts towards in-situ applicability of EMMARM, which aims to provide real-time information about the response of the immune system to natural disasters.

...read moreread less

Abstract: Mathematics Department and 07960 Laboratory for Computer Science Massachusetts Institute of Technology

...read moreread less

Journal Article•10.1016/0743-7315(91)90083-L•

Scalability of parallel algorithms for the all-pairs shortest-path problem

[...]

Vipin Kumar¹, Vineet Singh•Institutions (1)

University of Minnesota¹

01 Oct 1991-Journal of Parallel and Distributed Computing

TL;DR: In this article, the authors use the isoefficiency metric to analyze the scalability of parallel algorithms for finding shortest paths between all pairs of nodes in a densely connected graph, and find the classic trade-offs of hardware cost vs scalability and memory vs time to be represented here as tradeoffs of HPCs vs. scalability.

...read moreread less

Journal Article•10.1007/BF01759054•

Stochastic neural networks

[...]

Eugene Wong¹•Institutions (1)

University of California, Berkeley¹

01 Jun 1991-Algorithmica

TL;DR: A class of algorithms for finding the global minimum of a continuous-variable function defined on a hypercube, based on both diffusion processes and simulated annealing, are presented, and it is shown that “learning” in these networks can be achieved by a set of three interconnected diffusion machines.

...read moreread less

Abstract: The first purpose of this paper is to present a class of algorithms for finding the global minimum of a continuous-variable function defined on a hypercube. These algorithms, based on both diffusion processes and simulated annealing, are implementable as analog integrated circuits. Such circuits can be viewed as generalizations of neural networks of the Hopfield type, and are called "diffusion machines." Our second objective is to show that "learning" in these networks can be achieved by a set of three interconnected diffusion machines: one that learns, one to model the desired behavior, and one to compute the weight changes.

...read moreread less

Proceedings Article•10.1145/113379.113402•

The efficiency of greedy routing in hypercubes and butterflies

[...]

George D. Stamoulis¹, John N. Tsitsiklis¹•Institutions (1)

Massachusetts Institute of Technology¹

1 Jun 1991

TL;DR: It is proved that the average delay T per packet satisfies T < AdE p, thus showing that an average delay of O(d) is attainable for any fixed p < 1, and based on a stochastic comparison with a product-form network.

...read moreread less

Abstract: We analyze the following problem: Each node of the d-dimensional hypercube independently generates packets according to a Poisson process with rate A. Each of the packets is to be sent to a randomly chosen destination; each of the nodes at Hamming distance k from a packet's origin is assigned an a priori probability pk (1 _p)d- k. Packets are routed under a simple greedy scheme: each of them is forced to cross the hypercube dimensions required in increasing index-order, with possible queueing at the hypercube nodes. Assuming unit packet length and no other communications taking place, we show that this scheme is stable (in steady-state) if p < 1, where pde Ap is the load factor of the network; this is seen to be the broadest possible range for stability. Furthermore, we prove that the average delay T per packet satisfies T < AdE p , thus showing that an average delay of O(d) is attainable for any fixed p < 1. We also establish similar results in the context of the butterfly network. Our analysis is based on a stochastic comparison with a product-form network.

...read moreread less

Journal Article•10.1109/59.76716•

Coarse grain scheduling in parallel triangular factorization and solution of power system matrices

[...]

K. Lau, Daniel Tylavsky, Anjan Bose

01 May 1991-IEEE Transactions on Power Systems

TL;DR: It may be concluded that a fine-grain scheduling scheme is not appropriate for parallel LU factorization using an iPSC hypercube parallel processing computer, and the parallelLU factorization implementation using factorization path scheduling was found to perform significantly better than levelwise scheduling.

...read moreread less

Abstract: Two new coarse-grain scheduling schemes, the levelwise and factorization path scheduling schemes, are examined. These schemes differ significantly from fine-grain scheduling schemes which have been proposed in the past. If a fine-grain scheduling scheme at the floating-point-operation level is an appropriate scheduling method for the iPSC hypercube parallel processing computer, then the levelwise scheduling scheme presented should have gain comparable to that obtained using the factorization path scheduling scheme. Since this is not the case, it may be concluded that a fine-grain scheduling scheme is not appropriate for parallel LU factorization using an iPSC hypercube. Furthermore, the parallel LU factorization implementation using factorization path scheduling was found to perform significantly better than levelwise scheduling. The maximum speedup of 2.08 was obtained by using four processors on the 494 bus system. The efficiency at maximum speedup was 52.1%. >

...read moreread less

Journal Article•10.1007/BF02090402•

Fast algorithms for bit-serial routing on a hypercube

[...]

William Aiello¹, Frank Thomson Leighton², Bruce M. Maggs³, Mark Newman⁴•Institutions (4)

Telcordia Technologies¹, Massachusetts Institute of Technology², Princeton University³, Temple University⁴

01 Dec 1991-Theory of Computing Systems \/ Mathematical Systems Theory

TL;DR: The algorithm is adaptive and it is shown that this is necessary to achieve the logarithmic speedup, and generalize the Borodin-Hopcroft lower bound on oblivious routing by proving that any randomized oblivious algorithm on a polylogarithic degree network requires at least Ω(log2N/log logN) bit steps with high probability for almost all permutations.

...read moreread less

Abstract: In this paper we describe anO(logN)-bit-step randomized algorithm for bit-serial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of on-line circuit switching in anO(1)-dilated hypercube (i.e., the problem of establishing edge-disjoint paths between the nodes of the dilated hypercube for any one-to-one mapping). Our algorithm is adaptive and we show that this is necessary to achieve the logarithmic speedup. We generalize the Borodin-Hopcroft lower bound on oblivious routing by proving that any randomized oblivious algorithm on a polylogarithmic degree network requires at least Ω(log2 N/log logN) bit steps with high probability for almost all permutations.

...read moreread less

Journal Article•10.1016/0743-7315(91)90039-C•

Optimal matrix transposition of bit reversal on hypercubes: all-to-personalized communication

[...]

Alan Edelman¹•Institutions (1)

University of California, Berkeley¹

02 Feb 1991-Journal of Parallel and Distributed Computing

TL;DR: An optimal algorithm for performing the communication described by exchanging the bits of the node address with that of the local address is described, typically in both matrix transposition and bit reversal for the fast Fourier transform.

...read moreread less

Proceedings Article•10.1145/109625.109634•

A production-quality C* compiler for Hypercube multicomputers

[...]

Philip J. Hatcher, Anthony J. Lapadula, Robert R. Jones, Michael J. Quinn, Ray J. Anderson - Show less +1 more

1 Apr 1991

TL;DR: The third-generation C* compiler for hypercube multicomputers incorporates new optimization and utilizes an improved set of comnlunication primitives, and it allows the programmer to specify a custom mapping of data to the distributed memories of the hypercube.

...read moreread less

Abstract: We describe our third-generation C* compiler for hy percube multicomputers. This compiler generates code suitable for execution on both the nC;UBE 3200 and the Intel iPSC/2. The compiler incorporates new optimization and utilizes an improved set of comnlunication primitives. It supports a variety of standard clomain clecomposition primitives, and it also allows the programmer to specify a custom mapping of data to the distributed memories of the hypercube. The performance of this compiler on benchmark programs clenlonstrates that high efficiency can be achieved executing SIMD code on multicomputer architectures.

...read moreread less

Journal Article•10.1007/BF02921309•

Minimal cones on hypercubes

[...]

Kenneth A. Brakke¹•Institutions (1)

Susquehanna University¹

01 Dec 1991-Journal of Geometric Analysis

TL;DR: In this paper, it was shown that in dimension greater than four, the minimal area hypersurface separating the faces of a hypercube is the cone over the edges of the hypercube, even if the area separating opposite faces is given zero weight.

...read moreread less

Abstract: It is shown that in dimension greater than four, the minimal area hypersurface separating the faces of a hypercube is the cone over the edges of the hypercube This constrasts with the cases of two and three dimensions, where the cone is not minimal For example, a soap film on a cubical frame has a small rounded square in the center In dimensions over 6, the cone is minimal even if the area separating opposite faces is given zero weight The proof uses the maximal flow problem that is dual to the minimal surface problem

...read moreread less

Journal Article•10.1007/BF02090401•

A unified framework for off-line permutation routing in parallel networks

[...]

Marc Baumslag¹, Fred S. Annexstein²•Institutions (2)

The Graduate Center, CUNY¹, University of Cincinnati²

01 Dec 1991-Theory of Computing Systems \/ Mathematical Systems Theory

TL;DR: This paper presents a general strategy for finding efficient permutation routes in parallel networks and investigates the use of this algorithm for routingmultiple permutations and extends its applicability to a wide class of graphs, including several families of Cayley graphs.

...read moreread less

Abstract: In this paper we present a general strategy for finding efficient permutation routes in parallel networks. Among the popular parallel networks to which the strategy applies are mesh networks, hypercube networks, hypercube-derivative networks, ring networks, and star networks. The routes produced are generally congestion-free and take a number of routing steps that is within a small constant factor of the diameter of the network. Our basic strategy is derived from an algorithm that finds (in polynomial time) efficient permutation routes for aproduct network, G×H, given efficient permutation routes forG andH. We investigate the use of this algorithm for routingmultiple permutations and extend its applicability to a wide class of graphs, including several families ofCayley graphs. Finally, we show that our approach can be used to find efficient permutation routes among the remaining live nodes infaulty networks.

...read moreread less

Journal Article•10.1080/00207179108934214•

Parallel algorithms for algebraic Riccati equations

[...]

Judith D. Gardiner¹, Alan J. Laub¹•Institutions (1)

University of California, Santa Barbara¹

01 Dec 1991-International Journal of Control

TL;DR: Three forms of the matrix sign function are implemented and tested on a distributed memory hypercube multiprocessor and performance results indicate that the method is an excellent means of solving large-scale problems on a parallel computer.

...read moreread less

Abstract: The matrix sign function is the basis of a parallel algorithm for solving the generalized algebraic Riccati equation. Three forms of the algorithm were implemented and tested on a distributed memory hypercube multiprocessor. Performance results indicate that the method is an excellent means of solving large-scale problems on a parallel computer.

...read moreread less

Proceedings Article•

Multiphase Complete Exchange on a Circuit Switched Hypercube.

[...]

Shahid H. Bokhari

1 Jan 1991

TL;DR: In this article, a unified multiphase algorithm for complete exchange on a hypercube of dimension d and block size m is described, which is applicable to all circuit-switched hypercubes that use the common e-cube routing strategy.

...read moreread less

Abstract: On a distributed memory parallel computer, the complete exchange (all-to-all personalized) communication pattern requires each of n processors to send a different block of data to each of the remaining n - 1 processors. This pattern is at the heart of many important algorithms, most notably the matrix transpose. For a circuit switched hypercube of dimension d(n = 2(sup d)), two algorithms for achieving complete exchange are known. These are (1) the Standard Exchange approach that employs d transmissions of size 2(sup d-1) blocks each and is useful for small block sizes, and (2) the Optimal Circuit Switched algorithm that employs 2(sup d) - 1 transmissions of 1 block each and is best for large block sizes. A unified multiphase algorithm is described that includes these two algorithms as special cases. The complete exchange on a hypercube of dimension d and block size m is achieved by carrying out k partial exchange on subcubes of dimension d(sub i) Sigma(sup k)(sub i=1) d(sub i) = d and effective block size m(sub i) = m2(sup d-di). When k = d and all d(sub i) = 1, this corresponds to algorithm (1) above. For the case of k = 1 and d(sub i) = d, this becomes the circuit switched algorithm (2). Changing the subcube dimensions d, varies the effective block size and permits a compromise between the data permutation and block transmission overhead of (1) and the startup overhead of (2). For a hypercube of dimension d, the number of possible combinations of subcubes is p(d), the number of partitions of the integer d. This is an exponential but very slowly growing function and it is feasible over these partitions to discover the best combination for a given message size. The approach was analyzed for, and implemented on, the Intel iPSC-860 circuit switched hypercube. Measurements show good agreement with predictions and demonstrate that the multiphase approach can substantially improve performance for block sizes in the 0 to 160 byte range. This range, which corresponds to 0 to 40 floating point numbers per processor, is commonly encountered in practical numeric applications. The multiphase technique is applicable to all circuit-switched hypercubes that use the common e-cube routing strategy.

...read moreread less

Journal Article•10.1016/0743-7315(91)90028-8•

Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor

[...]

Charles H. Tong¹, Paul N. Swarztrauber²•Institutions (2)

Sandia National Laboratories¹, National Center for Atmospheric Research²

01 May 1991-Journal of Parallel and Distributed Computing

TL;DR: This work examines design alternatives for ordered radix-2 DIF (decimation-in-frequency) FFT algorithms on massively parallel hypercube multiprocessors such as the Connection Machine and combines the order and computational phases of the FFT and also uses sequence to processor maps that reduce communication.

...read moreread less

Journal Article•10.1016/0743-7315(91)90109-M•

Optimal expression evaluation for data parallel architectures

[...]

John R. Gilbert¹, Robert Schreiber²•Institutions (2)

PARC¹, Research Institute for Advanced Computer Science²

01 Sep 1991-Journal of Parallel and Distributed Computing

TL;DR: This work gives an efficient algorithm to find the minimum-cost way to evaluate an expression, for several different data parallel architectures, and applies to any architecture in which the metric describing the cost of moving an array has a property the authors call “robustness".

...read moreread less

Proceedings Article•10.1109/FTCS.1991.146672•

Optimal broadcasting in faulty hypercubes

[...]

Bogdan S. Chlebus¹, Krzysztof Diks¹, Andrzej Pelc¹•Institutions (1)

University of Warsaw¹

25 Jun 1991

TL;DR: A broadcasting algorithm that disseminates information throughout the whole network in time a log n with probability exceeding 1-bn/sup -c/ with positive constants a, b, c depending on p, provided that p

...read moreread less

Abstract: The problem of broadcasting information in an n-node hypercube in which links fail independently with fixed probability 0 >

...read moreread less

...

Expand