TL;DR: Regular mesh-connected arrays are shown to be isomorphic to a class of so-called regular iterative algorithms, which include arrays for Fourier Transform, Matrix Multiplication, and Sorting.
Abstract: Regular mesh-connected arrays are shown to be isomorphic to a class of so-called regular iterative algorithms. For a wide variety of problems it is shown how to obtain appropriate iterative algorithms and then how to translate these algorithms into arrays in a systematic fashion. Several "systolic" arrays presented in the literature are shown to be specific cases of the variety of architectures that can be derived by the techniques presented here. These include arrays for Fourier Transform, Matrix Multiplication, and Sorting.
TL;DR: Primary properties of rank and approximate rank of bilinear mappings, multiplication of large matrices, and algorithm varieties are studied.
Abstract: Complexity and rank of bilinear mappings.- Elementary properties of rank and approximate rank of bilinear mappings.- Multiplication of large matrices.- Complexity and rank of finite dimensional associative algebras.- Algorithm varieties.
TL;DR: In this article, the authors considered the multiplication of certain classes of operators of fractional calculus defined in terms of the Gaussian hypergeometric function, and presented solutions of various boundary value problems involving the celebrated Euler-Darboux equation.
TL;DR: Radix-4 algorithms for square root and division are developed and are shown to be suitable for implementing as a unified hardware unit which evaluates square root, division, and multiplication.
Abstract: In this paper radix-4 algorithms for square root and division are developed. The division algorithm evaluates the more useful function xz/y. These algorithms are shown to be suitable for implementing as a unified hardware unit which evaluates square root, division, and multiplication. Cost reductions in the hardware are obtained by use of gate arrays. A design based on the Motorola MCA2500 series of Macrocell gate array (MCA) is presented. At a cost of 9 MCA's and 16 commercial ECL 100 K parts a 64-bit square root can be evaluated in 750 us using worst case delays. Division takes 710 ns and multiplication 325 ns. Redundancy in the digit set together with carry-save adders are used to achieve this high performance.
TL;DR: This paper describes how a small set of primitive instructions combined with careful frequency analysis and clever programming allows the Hewlett-Packard Precision Architecture integer multiplication and division implementation to provide adequate performance at little or no hardware cost.
Abstract: In recent years, many architectural design efforts have focused on maximizing performance for frequently executed, simple instructions. Although these efforts have resulted in machines with better average price/performance ratios, certain complex instructions and, thus, certain classes of programs which heavily depend on these instructions may suffer by comparison. Integer multiplication and division are one such set of complex instructions. This paper describes how a small set of primitive instructions combined with careful frequency analysis and clever programming allows the Hewlett-Packard Precision Architecture integer multiplication and division implementation to provide adequate performance at little or no hardware cost.
TL;DR: In this article, a multiplier array circuit including decoders for decoding a multiplier on the basis of Booth's algorithm, cell array blocks for receiving the selection signals from the decoder and a multiplier and performing the multiplication of the multiplicand and the multiplier, and an adder for obtaining the final products on the output of the outputs from the cell arrays.
Abstract: A multiplier array circuit including decoders for decoding a multiplier on the basis of Booth's algorithm; cell array blocks for receiving the selection signals from the decoders and a multiplicand and performing the multiplication of the multiplicand and the multiplier on the basis of Booth's algorithm; and an adder for obtaining the final products on the basis of the outputs from the cell array blocks. In order to enable the functionally divisional operation, the cell array blocks includes complex cells which operate as the basic cells in the non-division mode and which operate as the code cells in the division mode. Further, the cell array blocks include selectors to supply an inactive value to the cells to perform the multiplication of the upper bits of the multiplicand and the lower bits of the multiplier and to the cells to perform the multiplication of the lower bits of the multiplicand and the upper bits of the multiplier in such a manner that the cell array blocks can supply the multiplicand and its inverted data to the cells constituting the cell array blocks in the non-division mode and can simultaneously execute two series of multiplications in the division mode.
TL;DR: It is shown that some popular pseudo-random number generators can be regarded as special cases of this method, and the period lengths of this generator type are examined and characterized.
Abstract: In this paper a method for the direct generation of pseudo-random vectors is considered. Thereby the n-th pseudo-random vector is recursively generated from the (n−1)-th pseudo-random vector by multiplication with a matrix. The period lengths of this generator type are examined and characterized. Furthermore it is shown that some popular pseudo-random number generators can be regarded as special cases of this method.
TL;DR: The current state of knowledge concerning the computation of Boolean functions by networks, with particular emphasis on the addition and multiplication of binary numbers, is surveyed.
Abstract: We survey the current state of knowledge concerning the computation of Boolean functions by networks, with particular emphasis on the addition and multiplication of binary numbers.
TL;DR: In this paper, an arithmetic unit comprising a partial product circuit for calculating a plurality of partial products for two numbers and a Wallace tree responsive to the partial products is presented, where an addend is supplied to the Wallace tree as an additional partial product.
Abstract: In an arithmetic unit comprising a partial product circuit for calculating a plurality of partial products for two numbers and a Wallace tree responsive to the partial products for producing a plurality of tree outputs which gives a total product of the two numbers when summed up, an addend is supplied to the Wallace tree as an additional partial product. The arithmetic unit produces a resultant sum of the total product plus the addend. The addend may be supplied to the Wallace tree from one or more registers therefor. Alternatively, a result register is used for the total product with the total product supplied to the Wallace tree as the addend. As a further alternative, an additional register is used for a third number which is used with bit shifts as the addend. In this last event, the arithmetic unit preferably produces a sum selected from the resultant sum.
TL;DR: In this article, a Galois field arithmetic logic unit of a code error check/correct apparatus to be employed when recording/reproducing data on an optical disk is presented. But the present unit is limited to the case where the code system has a great code length and the degree of the error location polynomial associated with the long distance code is as high as d=17.
Abstract: The present invention relates to a Galois field arithmetic logic unit of a code error check/correct apparatus to be employed when recording/reproducing data on an optical disk. The arithmetic logic unit uses a combination including a parallel multiplication circuitry of a primitive element α of a Galois field, an EX-OR addition circuitry for the multiplication resuts, a 0 element decision circuitry for the results of the addition, the registers to which the multiplication results are fed back so as to accomplish a parallel computation of a polynomial, thereby enabling a root and an error value of an error location equation to be obtained at a high speed. The arithmetic logic unit develops a remarkable reduction of the amount of computation particularly when the code system has a great code length and the degree of the error location polynomial associated with the long distance code is as high as d=17.
TL;DR: In this article, a fast Fourier transform circuit, including an illustrative radix-eight DFT kernel that operates on an n-bit-serial data format, for an efficient serial-like, pipelined operation within the DFT.
Abstract: A fast Fourier transform circuit, including an illustrative radix-eight discrete Fourier transform (DFT) kernel that operates on an n-bit-serial data format, for an efficient serial-like, pipelined operation within the DFT. The circuit performs a four-point DFT on half of the input data words at a time, stores intermediate results from the four-point DFT in a commutation stage, then combines the intermediate results in two two-point DFTs. Internal multiplication in the eight-point DFT is effected in delay registers that also serve to store the intermediate results, thereby providing an economy of timing and circuit routing. Interleaving and deinterleaving operations convert the data format between three-bit-serial and conventional bit-parallel used outside the eight-point DFT kernel, which may therefore be easily cascaded for more complex FFT operations. The DFT kernel also includes means for selectively bypassing butterfly computation modules to perform shorter-length DFTs.
TL;DR: In this article, a modified Booth algorithm is implemented in the arithmetic logic of the ALU data path to cut the number of cycles to do a multiply in half thereby improving execution time of the multiplication operation.
Abstract: A modified Booth algorithm is implemented in the arithmetic logic of the ALU data path to cut the number of cycles to do a multiply in half thereby improving execution time of the multiplication operation. A Booth Encoder examines the two least significant bits of the multiplier stored in the Q2 register and the bit which was previously shifted out on the last partial product shift cycle. Based upon the status of these three bits, the Booth Encoder causes the ALU to add or substract one times the multiplicand to the contents of the partial product register and shift twice, add or substract two times the multiplicand to the contents of the partial product register and shift twice, or do nothing but shift twice. A pre ALU B shifter provides a single left shift of the multiplicand to provide the multiplication by two when same is necessary.
TL;DR: In this paper, the concept of using a self-dual normal basis to design the Massey-Omura finite field multiplier is presented, and a method to construct the product function for designing the MASSEY-OMURA multiplier is developed.
Abstract: Finite field multiplication is central in the implementation of some error-correcting coders. Massey and Omura have presented a revolutionary design for multiplication in a finite field. In their design, a normal base is utilized to represent the elements of the field. The concept of using a self-dual normal basis to design the Massey-Omura finite field multiplier is presented. Presented first is an algorithm to locate a self-dual normal basis for GF(2 sup m) for odd m. Then a method to construct the product function for designing the Massey-Omura multiplier is developed. It is shown that the construction of the product function base on a self-dual basis is simpler than that based on an arbitrary normal base.
TL;DR: This paper shows the design of a parallel optical adder based on MSD number representation using the method of symbolic substitution originally proposed by Karl-Heinz Brenner and Alan Huang and discusses the use of this adder along with barrel shifters to efficiently implement multiplication.
Abstract: The design of a processor can vary considerably with the type of technology (optical or electronic, analog or digital), the number system, and the coding scheme used for the number representation. Binary number representation is accepted as the best suited for electronic computers. However, the delay due to carry propagation in binary arithmetic makes the binary number representation a very weak candidate for an optical processor that is inherently parallel. The modified signed digit (MSD) number representation satisfies the requirements of totally parallel addition using modular or identical units and allows the addition of any two numbers in three successive steps. In this paper, we show the design of a parallel optical adder based on MSD number representation using the method of symbolic substitution originally proposed by Karl-Heinz Brenner [Appl. Opt. 25(18), 3061-3064 (1986)] and Alan Huang [Proc. IEEE Int. Optical Computing Conf., pp. 13-17 (1983)]. Polarized light is used to code the inputs and outputs. We also discuss the use of this adder along with barrel shifters to efficiently implement multiplication.
TL;DR: A novel processor for the implementation of multiplierless FFT's in VLSI with the capability of achieving a 40 MHz throughput rate for a 1024-point FFT using 20 processing IC's is presented.
Abstract: This paper presents a novel processor for the implementation of multiplierless FFT's in VLSI. The arithmetic scheme is specially tailored for the simple binary coefficients used for these FFT's, which make multiplication trivial. (The class of coefficients dealt with are those that have a maximum of 2 nonzero digits; i.e., sum of 2 integers powers of 2 with each power in the range 0-4.) A single chip processing element for a 4-point DFT (for a radix 4 FFT) with an execution time of 400 ns using a 10 MHz clock has been realized. The chip has an estimated maximum gate count of 11 000 and pin count of 85. It has the capability of achieving a 40 MHz throughput rate for a 1024-point FFT using 20 processing IC's. The use of the 4-point chip to implement higher radix algorithms and various other issues are discussed.
TL;DR: It is shown that, using the proposed architecture, these operations can be implemented with state-of-the-art technologies in holography and integrated optics.
Abstract: A cascadable residue arithmetic processor based on optical Fredkin gate arrays and page-oriented holographic memories is introduced. The implementations of residue functions and operations by this processor are described. Analytic expressions are derived for the number of holograms and waveguide channels required for the implementation of residue addition and multiplication. The practical cases of 16-bit addition and multiplication are analyzed as specific examples. It is shown that, using the proposed architecture, these operations can be implemented with state-of-the-art technologies in holography and integrated optics.
TL;DR: In this paper, a structure theorem about the algebra d (Q ) = {My: cp e H ~~ (~2} was proved, which parallels known results from the case in which ~2 is the open unit disc in lE (cf.
Abstract: In this paper we will prove a structure theorem about the algebra d ( Q ) = {My: cp e H ~~ (~2)} which parallels known results from the case in which ~2 is the open unit disc in lE (cf. [1]). We also give applications of the structure theorem to the study of the invariant subspaces of ~ (~). We hope to give a more complete study of the lattice of invariant subspaces of d (~) in a future paper. The only assumption that we make about the domain Q is that the spaces under consideration be nontrivial, i.e., A 2 ( Q ) # {0}, and H ~~ (fa) contains nonconstant functions. It is known that the space A 2 (Q) can have finite nonzero dimension (cf. [7]), hence it is worthwhile to note that we will not encounter such situations.
TL;DR: In this paper, the two-position multiplicand-multiplicand (2PC) architecture was proposed to perform two-complement multiplication with signed and unsigned operands as unsigned numbers and adds a correction factor which is the two complement of the other operand if the operand is signed.
Abstract: A binary multiplier architecture which multiplies signed and unsigned operands as unsigned numbers and adds a correction factor which is the two's complement of the other operand if the operand is signed. The multiplier architecture performs two's complement multiplication when the multiplier has 1's in more than half of its bits and performs unsigned binary multiplication by adding only shifted multiplicand vectors as a function of the multiplier all other times. Two's complement multiplication is performed by adding a multiplicand and a multiplier to two's complemented shifted multiplicand vectors as a function of the two's complement of the multiplier. To reduce the number of additions necessary, portions of the operands are merged with the shifted complemented vectors prior to addition to the shifted vectors.
TL;DR: A library of parallel vector and matrix operations for hypercube multiprocessors that supports both full and sparse matrices is described and it is shown that these algorithms perform at high computational efficiency on both the Caltech and Intel hypercubes.
Abstract: We describe a library of parallel vector and matrix operations for hypercube multiprocessors that supports both full and sparse matrices The library includes operations such as vector arithmetic, innerproducts, matrix transpose, matrix-vector and matrix-matrix multiplication and rank one updates The library should be generally applicable to a wide range of architectures Performance of the library routines depends on the ability to map various topological graphs onto the processor network In the case of hypercubes we have used such mappings for binary trees, hierarchies of rings and rectangular grids We describe algorithms for the solution of elliptic and hyperbolic equations on parallel computers, and present results of several implementations The library is a fundamental tool in the development of the PDE solution algorithms and all machine dependencies of these algorithms are hidden in the linear algebra package We show that these algorithms perform at high computational efficiency on both the Caltech and Intel hypercubes Solution methods involved include preconditioned conjugate gradient, multigrid methods, and for hyperbolic problems, both explicit finite differences and the random choice method These algorithms implement substantial parts of many fluid dynamics calculations
TL;DR: The National Council of Curriculum and Assessment Standards for Mathematical Instruction (NCTM) as mentioned in this paper has developed a curriculum and evaluation standards for elementary and secondary mathematics education, which are evaluated by the National Assessment of Instruction (NAE).
Abstract: Problem solving a way of life: planning learning experiences from the child's perspective getting ready for a good beginning learning pre-number concepts 100s, 10s, 1s - the best yet! Our base-ten numeration system addition and subtraction of whole numbers constructing meaning addition and subtraction algorithms of whole numbers - building, understanding, applying and estimating multiplication and division of whole numbers constructing meaning multiplication and division algorithms of whole numbers - building, understanding, estimating and applying some theory about numbers factors, multiples, primes and composites not all numbers are whole numbers representing, adding and subtracting rational numbers security is knowing why multiplying and dividing rational numbers believe you can, think, then solve a model for establishing a problem solving environment superstitious? not us the shape of things geometric figures and relationships seeing is believing constructing geometric ideas before you teach measurement attributes of measurement sizing it up the measurement of attributes making numbers count organizing, representing and interpreting data computers and mathematics instruction status and direction encouraging students growth assessment and diagnosis the end a your beginning toward effective instruction. Appendices: Scope and sequence chart for levels of instruction selected answers for think tank time exercises material sheets summary of the NCTM's curriculum and evaluation standards for school mathematics.
TL;DR: In this paper, the authors present a table indicating whether or not each of five positivity classes of matrices (positive definite Hermitian matrices, M -matrices, inverse M-matrices and totally positive matrices) is closed under each of seven algebraic operations (conventional multiplication, addition, powers, extraction of roots, Hadamard multiplication, the hadamard product of one element and the inverse of another, and LU factorization).
TL;DR: The design of a fast and area-efficient multiply-divide unit used in building a VLSI floating-point processor (FPU), conforming to the IEEE standard 754.
Abstract: This paper presents the design of a fast and area-efficient multiply-divide unit used in building a VLSI floating-point processor (FPU), conforming to the IEEE standard 754. Details of the algorithms, implementation techniques and design tradeoffs are presented, The multiplier and divider are implemented in 2 micron CMOS technology with two layers of metal, and occupy 23 square mm (23% of the entire FPU). We expect to perform extended-precision multiplication and division in 1.1 and 2.8 microseconds, respectively.
TL;DR: In this paper a method for the approximate matrix-vector multiplication is described which requires much less arithmetical work and the storage requirements are strongly reduced.
Abstract: In contrast to usual finite element methods the boundary element method leads to systems with full matrices. This fact seems to require much computational work for the definition of the matrix entries, for the solution of the system, and, in particular, for the matrix-vector multiplication, which always occurs as an elementary operation. In this paper a method for the approximate matrix-vector multiplication is described which requires much less arithmetical work. In addition, the storage requirements are strongly reduced.
TL;DR: In this article, it is shown that a multiplication of each unit code of a received code by power of the root α of the primitive polynomial in the Galois field is carried out first, then a series of operations of the addition of the result by power α to the next unit code and the multiplication of the resulting result thus added by the power α are repeatedly carried out to obtain syndromes, by a plurality of flip-flops disposed in parallel to one another.
Abstract: Syndrome calculating apparatus in which a operation of the multiplication of each of the unit codes of a received code by power of the root α of the primitive polynomial in the Galois field is carried out first, a series of operations of the addition of the result by power of the root α to the next unit code and the multiplication of the result thus added by power of the root α are repeatedly carried out secondly so as to finally obtain syndromes, by a plurality of flip-flops disposed in parallel to one another, a pair of XOR gates disposed in parallel to each other for each of the flip-flops and a pair of AND gates for each of the flip-flops for selectively supplying the output data of the XOR gates to the flip-flops.
TL;DR: An analysis and evaluation of the performance of a multicomputer system (SM3) in supporting two basic matrix operations, namely multiplication and inversion, and the system is compared quantitatively and qualitatively to a hypercube architecture.
Abstract: This paper presents an analysis and evaluation of the performance of a multicomputer system (SM3) in supporting two basic matrix operations, namely multiplication and inversion. The system supports the efficient execution of the above mentioned operations by 1) achieving a high-bandwidth data transfer among computers by switching main memory modules, 2) supporting network partitioning, 3) employing a hardware communication and synchronization scheme, 4) using a distributed control technique, and 5) providing means to dynamically transfer control. Timing equations are derived and evaluated in an attempt to analyze the performance. Different cases which arise due to the relative sizes of memory modules and matrices during matrix multiplication are analyzed. The cases of partial and maximal pivoting during inversion are also analyzed. The SM3 system is compared quantitatively and qualitatively to a hypercube architecture.
TL;DR: An optical system for real-time multiplication of the multiple matrix with 2-D light source array enables this system to perform a triple matrix multiplication without the unidirectional diffuser that was essential in the former system.
Abstract: An optical system for real-time multiplication of the multiple matrix is proposed. A 2-D light source array enables this system to perform a triple matrix multiplication without the unidirectional diffuser that was essential in the former system we proposed. The performance of this system is discussed based on a simplified model to show the guidelines for designing the system. The preliminary experiments of multiplying three 3 × 3 matrices are successfully carried out with a mean error of 2.0%. A method to multiply the multiple matrix with binary elements using our system is also shown.
TL;DR: In this paper, optical data processing systems for processing four NxN matrices A, B, C, D to calculate the expression CA-1B+D were presented.
Abstract: Optical data processing systems for processing four NxN matrices A, B, C, D to calculate the expression CA-1B+D. Multi-cell spatial light modulators (36, 38, 40, 42, 44 and 46) are employed in conjunction with control circuits to perform matrix inversion, multiplication and addition.
TL;DR: In this paper, the MLE constitution method was employed to improve the synchronization pull-in characteristic, the response characteristic and the convergence characteristic of the convergence by employing a variable amplitude circuit.
Abstract: PURPOSE:To improve the synchronization pull-in characteristic, the response characteristic and the convergence characteristic by employing the MLE constitution method in which the multiplication is implemented by taking the level of a signal and an error signal into account and the input signal of an adder circuit for integration is made zero, detecting inter-code interference quantity so as to vary the level of a multiplication output inputted to the adder circuit for integration. CONSTITUTION:The correlation between an error signal and an identification signal including not only the polarity (direction) but also the level is detected. Then a variable amplitude circuit 37 comprising an error quantity detection circuit 39 and a bit changeover device 38 has a function varying the result of multiplication in response to the level of the error signal, and increases the amplitude of the result of multiplication to output a large control signal thereby increasing the control variable of a transversal filter when the error quantity is large and decreases the control variable conversely when the error quantity is small. Moreover, the MLE method making an input signal to adder circuits for integration 34-36 zero is employed together. Thus, the synchronization pull-in characteristic, the response characteristic and the convergence characteristic are improved.
TL;DR: A serial arithmetic processor for ADPCM is described in this article, which includes a first common circuit which is arranged to take advantage of the realization that a large portion of the LOG, FLOAT and ANTILOG functions can be implemented in common hardware.
Abstract: A serial arithmetic processor arranged to perform the complex arithmetic functions of the Adaptive Differential Pulse Coded Modulation (ADPCM) algorithm. The serial arithmetic processor includes a first common circuit which is arranged to take advantage of the realization that a large portion of the LOG, FLOAT, and ANTILOG functions can be implemented in common hardware. The serial arithmetic processor further includes a second common circuit which is arranged to take advantage of the realization that large portions of the MULTIPLICATION and FLOATING POINT MULTIPLICATION functions can be implemented in other common hardware. A controller is provided for controlling logic and other circuitry in the first and second common circuits depending upon the desired function to be performed. In addition, a connection of the output of the first common circuit to the input of the second common circuit is preferably provided so that the result of a FLOAT operation can be directly used as the multiplier in a FLOATING POINT MULTIPLICATION operation.