TL;DR: It is proved that when used for integration, the sampling scheme with OA-based Latin hypercubes offers a substantial improvement over Latin hypercube sampling.
Abstract: In this article, we use orthogonal arrays (OA's) to construct Latin hypercubes. Besides preserving the univariate stratification properties of Latin hypercubes, these strength r OA-based Latin hypercubes also stratify each r-dimensional margin. Therefore, such OA-based Latin hypercubes provide more suitable designs for computer experiments and numerical integration than do general Latin hypercubes. We prove that when used for integration, the sampling scheme with OA-based Latin hypercubes offers a substantial improvement over Latin hypercube sampling.
TL;DR: A novel interconnection topology called the Fibonacci cube is shown to possess attractive recurrent structures in spite of its asymmetric and relatively sparse interconnections.
Abstract: A novel interconnection topology called the Fibonacci cube is shown to possess attractive recurrent structures in spite of its asymmetric and relatively sparse interconnections. Since it can be embedded as a subgraph in the Boolean cube (hypercube) and it is also a supergraph of other structures, the Fibonacci cube may find applications in fault-tolerant computing. For a graph with N nodes, the diameter, the edge connectivity, and the node connectivity of the Fibonacci cube are in the logarithmic order of N. It is also shown that common system communication primitives can be implemented efficiently. >
TL;DR: This paper introduces a new class of interconnection scheme based on the Cayley graph of the alternating group, and it is shown that this class of graphs are edge symmetric and 2-transitive.
TL;DR: A class of graphs which are variants of the hypercube graph, known as the class of hypercube-like graphs/networks, is introduced and it is shown that thehypercube, the twisted n-cube and the multiply-twisted cube are members of this class of graph.
Abstract: We introduce a class of graphs which are variants of the hypercube graph. Many of the properties of this class of graphs are similar to that of the hypercube hence, we refer to them as the class of hypercube-like graphs/networks. We show that the hypercube, the twisted n-cube and the multiply-twisted cube are members of this class of graphs. We also propose simple strategies for distributed routing and broadcast and discuss some issues regarding embedding other graphs and reconfiguration in such networks. >
TL;DR: Using the very-dishonest Newton method as the base, Gauss, Newton and relaxed-Newton type parallel algorithms are discussed and compared with solution data obtained using the iPSC-2 32 node hypercube, and the Alliant FX-8 and Sequent/Symmetry shared-memory machines.
Abstract: Using the very-dishonest Newton method as the base, Gauss, Newton and relaxed-Newton type parallel algorithms are discussed and compared with solution data obtained using the iPSC-2 32 node hypercube, and the Alliant FX-8 and Sequent/Symmetry (26 CPUs) shared-memory machines. The bottlenecks in both algorithm and implementation are described in some detail. Various techniques and in particular their potential bottlenecks when using large-scale parallel processing are also discussed. A new parallel algorithm, the Maclaurin-Newton method (MNM), is used for stability analysis for the first time. The implementation of this method for the dynamic analysis is discussed, and it is compared to other methods. The advantage of the MNM is that it is completely parallel while retaining some Newton-type convergence characteristics. The relaxed-Newton-type algorithms are shown to be the most effective. A toroidal method (or traveling window technique) is adopted for parallel-in-space and -in-time implementation. Some comments on the improvement and its limitations are provided. >
TL;DR: A class of hierarchical networks that is suitable for implementation of large multi-computers in VLSI with wafer scale integration (VLSI/WSI) technology is introduced, which employ the hypercube topology as a basic cluster, connect many of these clusters using a de Bruijn graph, and maintain the node connectivity to be the same for all nodes product graph.
Abstract: Introduces a class of hierarchical networks that is suitable for implementation of large multi-computers in VLSI with wafer scale integration (VLSI/WSI) technology. These networks, which are termed dBCube, employ the hypercube topology as a basic cluster, connect many of these clusters using a de Bruijn graph, and maintain the node connectivity to be the same for all nodes product graph. The size of this class of regular networks can be easily extended by increments of a cluster size. Local communication, to be satisfied by the hypercube topology, allows easy embedding of existing parallel algorithms, while the de Bruijn graph, which was chosen for JPL's 8096-node multiprocessor, provides the shortest distance between clusters running different parts of an application. A scheme for obtaining WSI layout is introduced and used to estimate the number of tracks needed and the required area of the wafer. The exact number of tracks in the hypercube and an approximation for the de Bruijn graph are also obtained. Tradeoffs of area versus static parameters and the size of the hypercube versus that of the de Bruijn graph are also discussed. >
TL;DR: The authors optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor.
Abstract: This paper presents several techniques for tolerating faults in d-dimensional mesh and hypercube architectures. The approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. The authors optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and k=1, they present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2d) and has only one spare processor. They also present efficient layouts for fault-tolerant two- and three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, they give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes. >
TL;DR: Recursive Diagonal Torus (RDT), a class of interconnection network is proposed for massively parallel computers with up to 2/sup 16/ nodes, which comprises the mesh structure, and emulates hypercube and tree structures easily.
Abstract: Recursive Diagonal Torus (RDT), a class of interconnection network is proposed for massively parallel computers with up to 2/sup 16/ nodes. By adding remote links to the diagonal directions of the torus network recursively, the RDT can realize a smaller diameter (e.g., it is 11 for 2/sup 16/ nodes) with smaller number of links per node (i.e., 8 links per node) than that of the hypercube. A simple routing algorithm called vector routing, which is near-optima and easy to implement is also proposed. The RDT comprises the mesh structure, and emulates hypercube and tree structures easily. FFT and the bitonic sorting algorithm are also easy to implement. >
TL;DR: In this paper, the problem of drawing a graph in the plane so that edges appear as straight lines and the minimum angle formed by any pair of incident edges is maximized is presented.
Abstract: This paper presents the problem of drawing a graph in the plane so that edges appear as straight lines and the minimum angle formed by any pair of incident edges is maximized. The resolution of a layout is defined to be the size of the minimum angle formed by incident edges of the graph, and the resolution of a graph to be the maximum resolution of any layout of the graph. The resolution R of a graph is characterized in terms of the maximum node degree d of the graph by proving that $\Omega (\frac{1}{{d^2 }}) \leqslant R \leqslant \frac{{2\pi }}{d}$ for any graph. Moreover, it is proved that $R = \Theta (\frac{1}{d})$ for many graphs including planar graphs, complete graphs, hypercubes, multidimensional meshes and tori, and other special networks. It is also shown that the problem of deciding if $R = \frac{{2\pi }}{d}$ for a graph is NP-hard for $d = 4$, and by using a counting argument that $R = O(\frac{{\log d}}{{d^2 }})$ for many graphs.
TL;DR: Best known lower bounds for K ( n, m ) and λ( n , m ) are proved, several new recursive inequalities and new upper bounds are established, and their asymptotic behavior for fixed m and for fixed n − m is analyzed.
Abstract: We consider the problem of determining the minimum number of faulty processors, K ( n , m ), and of faulty links, λ( n , m ), in an n -dimensional hypercube computer so that every m -dimensional subcube is faulty. Best known lower bounds for K ( n , m ) and λ( n , m ) are proved, several new recursive inequalities and new upper bounds are established, their asymptotic behavior for fixed m and for fixed n − m is analyzed, and their exact values are determined for small n and m . Most of the methods employed show how to construct sets of faults attaining the bounds. An extensive survey of related work is also included, showing connections to resource allocation, k -independent sets, and exhaustive testing.
TL;DR: In this paper, an account is given of experience gained in implementing computational chemistry application software, including quantum chemistry and macromolecular refinement codes, on distributed memory parallel processors, which are used in general purpose molecular mechanics, molecular dynamics and free energy perturbation calculations.
Abstract: An account is given of experience gained in implementing computational chemistry application software, including quantum chemistry and macromolecular refinement codes, on distributed memory parallel processors. In quantum chemistry we consider the coarse-grained implementation of Gaussian integral and derivative integral evaluation, the direct-SCF computation of an uncorrelated wavefunction, the 4-index transformation of two-electron integrals and the direct-CI calculation of correlated wavefunctions. In the refinement of macromolecular conformations, we describe domain decomposition techniques used in implementing general purpose molecular mechanics, molecular dynamics and free energy perturbation calculations. Attention is focused on performance figures obtained on the Intel iPSC/2 and iPSC/860 hypercubes, which are compared with those obtained on a Cray Y-MP/464 and Convex C-220 minisupercomputer. From this data we deduce the cost effectiveness of parallel processors in the field of computational chemistry.
TL;DR: A new scheme for routing data on the star and pancake networks is described, which unifies data routing on these two networks, and makes them as powerful as the hypercube when solving a host of problems.
Abstract: A new scheme for routing data on the star and pancake networks is described. It unifies data routing on these two networks, and makes them as powerful as the hypercube when solving a host of problems. Consequently, it allows a certain class of algorithms designed for the hypercube to be implemented directly on the star and pancake networks without time loss. The new scheme is used to derive parallel (star and pancake) algorithms for computing minimum spanning forests in both sparse and dense weighted graphs. The time complexities of these algorithms match those of the equivalent hypercube algorithms. These results take added importance when one recalls the many attractive properties that the star and pancake networks possess by comparison with the hypercube, in particular their smaller degree and diameter.
TL;DR: It is shown that for two common broadcasting problems, a star graph performs better than a k-ary hypercube with a comparable number of nodes only in networks consisting of an impractically large numbers of nodes.
Abstract: It is shown that for two common broadcasting problems, a star graph performs better than a k-ary hypercube with a comparable number of nodes only in networks consisting of an impractically large numbers of nodes. This result is based on a comparison of the costs of known solutions to the one-to-all broadcast and the complete broadcast problems for each network. It is suggested that the cost of solutions to these common problems is a better indication of the expected performance of an interconnection network than is a comparison of scalar measures such as the diameter and degree. >
TL;DR: A bound for the diameter of the faulty hypercube Qn-F, when mod F mod >2/sub n-2/, as n+2 is obtained, which improves the previously known bound of n+6 obtained by A.-H.
Abstract: In an n-dimensional hypercube Qn, with the fault set mod F mod >2/sub n-2/, assuming Sand D are not isolated, it is shown that there exists a path of length equal to at mosttheir Hamming distance plus 4. An algorithm with complexity O( mod F mod logn) is given to find such a path. A bound for the diameter of the faulty hypercube Qn-F, when mod F mod >2/sub n-2/, as n+2 is obtained. This improves the previously known bound of n+6 obtained by A.-H. Esfahanian (1989). Worst case scenarios are constructed to show that these bounds for shortest paths and diameter are tight. It is also shown that when mod F mod >2n-2, the diameter bound is reduced to n+1 if every node has at least 2 nonfaulty neighbors and reduced to n if every node has at least 3 nonfaulty neighbors.
TL;DR: An algorithm is presented for reducing symmetric banded matrices to tridiagonal form via Householder transformations that is numerically stable and well suited to parallel execution on distributed memory multiple instruction multiple data (MIMD) computers.
Abstract: An algorithm is presented for reducing symmetric banded matrices to tridiagonal form via Householder transformations. The algorithm is numerically stable and is well suited to parallel execution on distributed memory multiple instruction multiple data (MIMD) computers. Numerical experiments on the iPSC/860 hypercube show that the new method yields nearly full speedup if it is run on multiple processors. In addition, even on a single processor the new method usually will be several times faster than the corresponding EISPACK and LAPACK routines.
TL;DR: Performance results from an nCUBE-2 multicomputers are given that demonstrate the advantage of the method over the traditional spanning binomial tree approach.
Abstract: A method to reduce broadcast time in wormhole routed hypercube systems in described The method takes advantage of the destance insensitivity of wormhole routing and the presence of multiple ports between processors and their routes Performance results from an nCUBE-2 multicomputers are given that demonstrate the advantage of the method over the traditional spanning binomial tree approach
TL;DR: A method for solving visibility-based terrain path planning problems using massively parallel hypercube machines is proposed and it is shown that the method can be applied to several realistic problems with a variety of path optimizations.
Abstract: A method for solving visibility-based terrain path planning problems using massively parallel hypercube machines is proposed. A typical example is to find a path that is hidden from moving adversaries. This kind of problem can be generalized as a time-varying constrained path planning problem and is proven to be computationally hard. An approximation based on both temporal and, spatial sampling is proposed. Since a 2-D grid cell representation of terrain can be embedded into a hypercube with extra links for fast communication, the method can be very efficient when implemented on hypercube machines. The time complexity is in general O(T*E*log N) using O(N) processors, where T is the number of temporal samples, E is the number of adversary agents, and N is the number of grid cells on the terrain. It is also shown that the method can be applied to several realistic problems with a variety of path optimizations. All algorithms have been implemented on the Connection Machine CM-2 and results of experiments are presented. >
TL;DR: The authors analyze the problem in which each node of the binary hypercube independently generates packets according to a Poisson process with rate lambda, and observe that the system can be stable in steady-state only if the load factor rho identical to lambda satisfies rho.
Abstract: The authors analyze the problem in which each node of the binary hypercube independently generates packets according to a Poisson process with rate lambda ; each of the packets is to be broadcast to all other nodes. Assuming unit packet length and no other communications taking place, it is observed that the system can be stable in steady-state only if the load factor rho identical to lambda (2/sup d/-1)/d satisfies rho >
TL;DR: The lattice structure of conventional linear congruential random number generators (LCGs) over integers was studied in this paper, where the state of the generator evolves according to a linear recursion and can be mapped to a number between 0 and 1, producing what we call a LS2 sequence.
Abstract: The lattice structure of conventional linear congruential random number generators (LCGs), over integers, is well known. In this paper, we study LCGs in the field of formal Laurent series, with coefficients in the Galois field F2. The state of the generator (a Laurent series) evolves according to a linear recursion and can be mapped to a number between 0 and 1, producing what we call a LS2 sequence. In particular, the sequences produced by simple or combined Tausworthe generators are special cases of LS2 sequences. By analyzing the lattice structure of the LCG, we obtain a precise description of how all the k-dimensional vectors formed by successive values in the LS2 sequence are distributed in the unit hypercube. More specifically, for any partition of the k-dimensional hypercube into 2kl identical subcubes, we can quickly compute a table giving the exact number of subcubes that contain exactly n points, for each integer n. We give numerical examples and discuss the practical implications of our results.
TL;DR: A stochastic approach to the problem of packing two-dimensional figures in a rectangular area efficiently, similar to those used in genetic algorithms or in simulated annealing algorithms, and achieves a minimum of 80% efficiency or utilization based on bin length.
Abstract: This study describes a stochastic approach to the problem of packing two-dimensional figures in a rectangular area efficiently. The techniques employed are similar to those used in genetic algorithms or in simulated annealing algorithms, algorithmic methods which are grouped under the general classification of stochastic optimization. A parallel processing system, an Intel i860 hypercube, is used to speed up execution. Execution time is quite lengthy due to the costly process of evaluating the lengths of layouts. Load balancing is quite efficient and near-perfect load balancing is achieved. Four different data sets were tested, the simplest consisting of 129 figures, each of seven possible shapes and of differing sizes. The goal of a minimum of 80% efficiency or utilization based on bin length was achieved in all runs performed. >
TL;DR: The results refine the previous known bound and show that Algorithm SB of Bartal et al. for the on-line File Allocation problem is O(log log N)-competitive on an N-node hypercube or butterfly network.
Abstract: We study the on-line Steiner tree problem on a general metric space. We show that a class of greedy on-line algorithms are O(log(d/zs))-competitive and no deterministic algorithm is better than Ω(log(d/zs))-competitive, where s is the number of regular nodes, d the maximum metric distance between any two revealed nodes and z the optimal off-line cost. Our results refine the previous known bound [8] and show that Algorithm SB of Bartal et al. [4] for the on-line File Allocation problem is O(log log N)-competitive on an N-node hypercube or butterfly network.
TL;DR: Experimental results on a 16-node hypercube computer show that the sorting algorithm is competitive with the previous algorithms and faster for skewed data distributions.
Abstract: A parallel sorting algorithm for sorting n elements evenly distributed over 2/sup d/ p nodes of a d-dimensional hypercube is presented. The average running time of the algorithm is O((n log n)/p+p log 2n). The algorithm maintains a perfect load balance in the nodes by determining the (kn/p)th elements (k1,. . ., (p-1)) of the final sorted list in advance. These p-1 keys are used to partition the sorted sublists in each node to redistribute data to the nodes to be merged in parallel. The nodes finish the sort with an equal number of elements (n/p) regardless of the data distribution. A parallel selection algorithm for determining the balanced partition keys in O(p log2n) time is presented. The speed of the sorting algorithm is further enhanced by the distance-d communication capability of the iPSC/2 hypercube computer and a novel conflict-free routing algorithm. Experimental results on a 16-node hypercube computer show that the sorting algorithm is competitive with the previous algorithms and faster for skewed data distributions. >
TL;DR: This work describes an algorithm for the static load balancing of scientific computations that generalizes and improves upon spectral bisection and can divide a computation into 4 or 8 pieces at once, leading to balanced partitions that have lower communication overhead and are less expensive to compute.
Abstract: We describe an algorithm for the static load balancing of scientific computations that generalizes and improves upon spectral bisection. Through a novel use of multiple eigenvectors, our new spectral algorithm can divide a computation into 4 or 8 pieces at once. This leads to balanced partitions that have lower communication overhead and are less expensive to compute than those of spectral bisection. In addition, our approach automatically works to minimize message contention on a hypercube or mesh architecture.
TL;DR: It is proved that if each connection between two neighboring nodes consists of two pairs of links, the hypercube can handle two arbitrary permutations simultaneously and is rearrangeable if one additional pair of links is provided in any one dimension of connections.
TL;DR: The authors extend this result showing that the lower bound is at least 2/sup n-3/n! to show that with at most n-2 faulty links a faulty hypercube has at least2(n-2)! Hamiltonian cycles.
Abstract: Hamiltonian properties of hypercube, incomplete hypercube and supercube are examined. It is known that in a nonfaulty hypercube there are at least n! Hamiltonian cycles. The authors extend this result showing that the lower bound is at least 2/sup n-3/n! They show that with at most n-2 faulty links a faulty hypercube has at least 2(n-2)! Hamiltonian cycles. They establish that an incomplete hypercube with odd (even) number of nodes has (n-2)! Hamiltonian paths (cycles). They show that a supercube has at least (n-1)! Hamiltonian cycles and when the number of nodes is 2/sup n-1/+2/sup n-2/, then the number of Hamiltonian cycles is at least as high as 2(n-1)!. >
TL;DR: The technique proposed here is to determine the induced cycle structure of a signed permutation by the number of fixed vertices or fixed edges of asigned permutation in the cyclic group generated by a signedpermutation of given type.
Abstract: The hyperoctahedral group B{sub n} is treated as the automorphism group of the n-dimensional hypercube, denoted Q{sub n}, which is nowadays understood to be a graph on 2{sup n} vertices. It is well-known that B{sub n} can be represented by the group of signed permutations. In other words, any signed permutation induces a permutation on the vertices of Q{sub n} which preserves adjacencies. Moreover, signed permutations also a permutation group on the edge of Q{sub n}, denoted H{sub n}. We study the cycle structures of both B{sub n} and H{sub n}. The technique proposed here is to determine the induced cycle structure of a signed permutation by the number of fixed vertices or fixed edges of a signed permutation in the cyclic group generated by a signed permutation of given type. Here we directly define the type of a signed permutation by a double partition based on its signed cycle decomposition. In this way, we obtain explicit formulas for the number of induced cycles on vertices as well as edges of Q{sub n} of a signed permutation in terms of its type. By further exploring the connection between cycle indices and the structure of fixed points, we obtain the cyclemore » indices of both B{sub n} and H{sub n}. Our formula for the cycle index of B{sub n}is much more natural and considerably simpler than that of Harrison and High. Meanwhile, the cycle structure of H{sub n} seems to have been untouched before, although it is well motivated by nonisomorphic edge colorings of Q{sub n} as well as by the recent interest in symmetries of computer networks.« less
TL;DR: This work hopes to further stimulate interest in the hypercube graphs by introducing a tantalizing unsolved problem that is based on dominating sets for this very regularly structured family of graphs.
TL;DR: A variation on thehypercube, denoted GQn for Generalized Twisted Cube of dimension n, that contains many of the good properties of the hypercube with a smaller diameter is presented and is shown to be Hamiltonian.
TL;DR: It is proved that, ifM is graphic (or cographic), the distance between any two vertices ofG corresponding to disjoint bases is equal to the rank ofM (generalizing a result of [10]).
Abstract: LetM be ablock matroid (i.e. a matroid whose ground setE is the disjoint union of two bases). We associate withM two objects:
We prove that, ifM is graphic (or cographic), the distance between any two vertices ofG corresponding to disjoint bases is equal to the rank ofM (generalizing a result of [10]). Concerning the polytope we prove thatK is an hypercube if and only if dim(K)=rank(M). A constructive characterization of the class of matroids realizing this equality is given.