TL;DR: This paper discusses the design of a primary memory system for an array processor which allows parallel, conflict-free access to various slices of data, and subsequent alignment of these data for processing, and a network based on Stone's shuffle-exchange operation is presented.
Abstract: This paper discusses the design of a primary memory system for an array processor which allows parallel, conflict-free access to various slices of data (e.g., rows, columns, diagonals, etc.), and subsequent alignment of these data for processing. Memory access requirements for an array processor are discussed in general terms and a set of common requirements are defined. The ability to meet these requirements is shown to depend on the number of independent memory units and on the mapping of the data in these memories. Next, the need to align these data for processing is demonstrated and various alignment requirements are defined. Hardware which can perform this alignment function is discussed, e.g., permutation, indexing, switching or sorting networks, and a network (the omega network) based on Stone's shuffle-exchange operation [1] is presented. Construction of this network is described and many of its useful properties are proven. Finally, as an example of these ideas, an array processor is shown which allows conflict-free access and alignment of rows, columns, diagonals, backward diagonals, and square blocks in row or column major order, as well as certain other special operations.
TL;DR: Various network topologies and switching strategies are covered here, including interconnection networks for communication among processors and memory modules.
Abstract: Concurrent processing depends on interconnection networks for communication among processors and memory modules. Various network topologies and switching strategies are covered here.
TL;DR: A set of algebraic tools is developed and is used to prove that Lawrie's inverse Omega network, Pease's indirect binary n-cube array, and a network related to the 3-stage rearrangeable switching network studied by Clos and Beneš have identical switching capabilities.
Abstract: In this paper a number of properties of Shuffle/Exchange networks are analyzed. A set of algebraic tools is developed and is used to prove that Lawrie's inverse Omega network, Pease's indirect binary n-cube array, and a network related to the 3-stage rearrangeable switching network studied by Clos and Benes have identical switching capabilities. The approach used leads to a number of insights on the structure of the fast Fourier transform (FFT) algorithm. The inherent permuting power, or "universality," of the networks when used iteratively is then probed, leading to some nonintuitive results which have implications on the optimal control of Shuffle/Exchange-type networks for realizing permutations and broadcast connections.
TL;DR: Two graph theoretic models are introduced that provide a uniform procedure for analyzing 2n-input/2n-output Multistage Interconnection Networks (MIN's), implemented with 2- input/2-output Switching Elements (SE's) and satisfying a characteristics called the "buddy property."
Abstract: This paper introduces two graph theoretic models that provide a uniform procedure for analyzing 2n-input/2n-output Multistage Interconnection Networks (MIN's), implemented with 2-input/2-output Switching Elements (SE's) and satisfying a characteristics called the "buddy property." These models show that all such n-stage MIN's are topologically equivalent and hence prove that one MIN can be implemented from integrated circuits designed for another MIN. The proposed techniques also allow identical modeling and comparison of permutation capabilities of n-stage MIN's and other link-controlled networks like augmented data manipulator and SW Banyan Network and hence, allows comparison of their permutation. In the case of any conflict in the MIN, an upper bound for the required number of passes has been obtained.
TL;DR: An augmented data manipulator network using a modified control structure to perform more single pass interconnections than the other networks is presented.
Abstract: Four SIMD multistage networks - Feng's data manipulator, STARAN flip network, omega network, and indirect binary n-cube—are analyzed. Three parameters - topology, interchange box, and control structure—are defined. It is shown that the latter three networks use equivalent topologies and differences in their capabilities result from the other parameters. An augmented data manipulator network using a modified control structure to perform more single pass interconnections than the other networks is presented. Some problems may be solved more efficiently if the 2n processing elements of an SIMD machine can be partitioned into submachines of size 2r. Single and multiple control partitioning are defined. The capabilities of these multistage networks to perform in these partioned environments are discussed.