TL;DR: The massively parallel processor system is designed to process satellite imagery at high rates and can occur at 6553 million operations per second (MOPS) and multiplication at 1861 MOPS.
Abstract: The massively parallel processor (MPP) system is designed to process satellite imagery at high rates. A large number (16,384) of processing elements (PE's) are configured in a square array. For optimum performance on operands of arbitrary length, processing is performed in a bit-serial manner. On 8-bit integer data, addition can occur at 6553 million operations per second (MOPS) and multiplication at 1861 MOPS. On 32-bit floating-point data, addition can occur at 430 MOPS and multiplication at 216 MOPS.
TL;DR: Methods are given for finding a sequence of ‘add’, ‘subtract’ and ‘shift’ instructions to multiply the contents of a register by an integer constant.
Abstract: Methods are given for finding a sequence of ‘add’, ‘subtract’ and ‘shift’ instructions to multiply the contents of a register by an integer constant. Each method generalizes the previous one and requires only a few intermediate or scratch registers. A variation of the last method is used in the PL.8 compiler and uses an unnoticeable amount of the overall compile time. Some statistics roughly indicating the effectiveness of the methods are presented.
TL;DR: An arithmetic operation portion 3 comprises a plurality of multipliers 311 and 312 connected directly with a memory portion 1 so that multiplication processing can be performed in parallel as mentioned in this paper. But the processing capacity for multiplication and addition can be increased and the throughput rate of data can be improved
Abstract: An arithmetic operation portion 3 comprises a plurality of multipliers 311 and 312 connected directly with a memory portion 1 so that multiplication processing can be performed in parallel. As a result, the processing capacity for multiplication and addition can be increased and the throughput rate of data can be improved.
TL;DR: A new algorithm for performing fast multiplication in GF(2^{m} ), which is O(m) in computation time and implementation area is presented and the bit-slice architecture of a serial-in-serial-out modulo multiplier is described.
Abstract: Multiplication in the finite field GF(2^{m} ) has particular computational advantages in data encryption systems. This paper presents a new algorithm for performing fast multiplication in GF(2^{m} ), which is O(m) in computation time and implementation area. The bit-slice architecture of a serial-in-serial-out modulo multiplier is described and the circuit details given. The design is highly regular, modular, and well-suited for VLSI implementation. The resulting multiplier will have application in algorithms based on arithmetic in large finite fields of characteristic 2, and which require high throughput.
TL;DR: In an experiment using verification task procedures, identical structural parameters were found to model reaction time accurately to both addition and multiplication problems, and both were self-terminated when an error in the units column was encountered.
Abstract: In an experiment using verification task procedures, 100 subjects responded to simple and complex problems of addition and multiplication. Identical structural parameters were found to model reaction time accurately to both addition and multiplication problems. Slope estimates for a memory network parameter did not differ significantly between simple and complex problems within an operation or between addition and multiplication problems. Both complex addition and complex multiplication problems were processed columnwise, with column sums or products being retrieved from an interrated memory network. The two types of complex problems included similar processes for carrying and for encoding of single digits, and both were self-terminated when an error in the units column was encountered. Addition and multiplication facts appear to be retrieved from a single interrelated memory network. A conceptual model for this interrelated network is discussed.
TL;DR: A new simple method for reducing multivalued functions is presented, based on an extension of the Quine-McCluskey minimization method used for binary logic functions.
Abstract: Discrete numerical values in digital processing systems may be encoded in two-level (binary) or higher-level (multilevel) representations. Multilevel coding can produce smaller and more efficient processors. In truth-table lookup processing, the number of entries (reference patterns) can be reduced using multilevel coding. Since parallel-input/parallel-output optical truth-table lookup processors can be constructed based on holographic content-addressable memories, it is essential to know the minimum storage required to implement various functions. A new simple method for reducing multivalued functions is presented. This method is based on an extension of the Quine-McCluskey minimization method used for binary logic functions. This minimization method is then applied to the truth tables representing (1) modified signed-digit addition, (2) residue addition, and (3) residue multiplication. A programmable logic array gate configuration for the modified signed-digit adder is presented.
TL;DR: The number of multiplications necessary and sufficient to compute a length-2nDFT is determined and the method of derivation is shown to apply to the multiplicative complexity results of Winograd for alength-pnDFT.
Abstract: The number of multiplications necessary and sufficient to compute a length-2nDFT is determined. The method of derivation is shown to apply to the multiplicative complexity results of Winograd for a length-pnDFT, for p an odd prime number. The multiplicative complexity of the one-dimensional DFT is summarized for many possible lengths.
TL;DR: Use of the residue system and logical minimization techniques to reduce the required number of reference patterns stored in a content-addressable memory is illustrated for 16-bit full-precision addition.
Abstract: The extension of truth-table look-up processing beyond primitive operations (such as addition) to higher-level operations (such as discrete matched filtering) is presented. Use of the residue system and logical minimization techniques to reduce the required number of reference patterns stored in a content-addressable memory is illustrated for 16-bit full-precision addition. Multilevel coding of the numbers is introduced as a method to achieve further truth-table reduction. The required number of reference patterns for implementing the residue addition and multiplication operations are provided for all moduli from 2 through 32 with 2-, 3-, and 5-level coding. An optical holographic implementation of a system that processes multilevel coded numbers is presented.
TL;DR: The stochastic gradient algorithm using a simplified arithmetic using a power-of-two quantizer is used for the input of the multiplier to reduce the multiplication to at most a simple shift.
Abstract: The stochastic gradient algorithm using a simplified arithmetic is analyzed in this paper. A power-of-two quantizer is used for the input of the multiplier to reduce the multiplication to at most a simple shift. In spite of its simple implementation, the performance is shown to be comparable to the classical LMS algorithm. A linearized approximation to the quantizer is first derived, followed by the analysis of an exact nonlinear model. The derivation is based on the Gaussian assumption, and the effects of removing the Gaussian assumption are later considered. The roundoff error due to the finite-bit computation is calculated. Computer simulation results are provided to support the analysis.
TL;DR: A new algorithm for implementation of radix 3, 6, and 12 FFT is introduced, derived from the fact that, if an input sequence is favorably reordered, rotating factors can be treated in pairs so that the rotating factors are conjugate to each other.
Abstract: A new algorithm for implementation of radix 3, 6, and 12 FFT is introduced. An FFT using this algorithm is computed in an ordinary (1,j) complex plane and the number of additions can be significantly reduced; the number of multiplication is also reduced. High efficiency of the algorithm is derived from the fact that, if an input sequence is favorably reordered, rotating factors can be treated in pairs so that the rotating factors are conjugate to each other.
TL;DR: It is shown that a significant improvement in both complexity and speed can be achieved in the problem of implementing a high-speed radix-4 RNS FFT.
Abstract: Recent advancements in residue arithmetic have given rise to a complex number system variant which better than halves RNS multiplication complexity. This advantage is applied to the problem of implementing a high-speed radix-4 RNS FFT. It is shown that a significant improvement in both complexity and speed can be achieved.
TL;DR: In this article, an apparatus for determining the degree of variation of a feature in a region of an image that is divided into discrete picture elements, the feature being represented by complex valued signals, one for each picture element, the signal phase representing the feature class and the signal magnitude representing the certainty in the feature assertion.
Abstract: The invention concerns an apparatus for determining the degree of variation of a feature in a region of an image that is divided into discrete picture elements, the feature being represented by complex valued signals, one for each picture element, the signal phase representing the feature class and the signal magnitude representing the certainty in the feature assertion. The apparatus includes a unit (3) for providing the complex valued signals (8) within the region and complex valued multiplication factor signals (9) corresponding to the complex valued signals. In a first summation unit (4), a first sum signal is generated by the magnitude products of the signals (8) and corresponding multiplication factor signals (9), and a second sum signal is generated by the complex valued scalar products of the signals (8) and corresponding complex conjugate of the multiplication factor signals. In a second summation unit (5), a third sum signal is generated by the signals (8) that are weighted with the magnitude of corresponding multiplication factor signals (9), and a fourth sum signal is generated by the complex conjugate of the multiplication factor signals (9) weighted with a magnitude of corresponding signals (8). A norming unit (6) is provided for norming the output signals from the summation units in a predetermined way (FIG. 2).
TL;DR: A bit slice multiplication circuit operating to slice a multiplier, produce products for the sliced multipliers and a multiplicand and sum the products to obtain the multiplication result is described in this paper.
Abstract: A bit slice multiplication circuit operating to slice a multiplier, produce products for the sliced multipliers and a multiplicand and sum the products to obtain the multiplication result The circuit includes a slicing unit for slicing the multiplicand, multiplying units corresponding in number to the number of sliced multiplicands, and adding units provided in correspondence to the multiplying units and implementing summation for multiplication results from corresponding multiplying units while shifting the sliced portions of the multiplicand at each multiplying operation for sliced multipliers and multiplicands by the multiplying units, the multiplication result being obtained by summing all summation results produced by the adding units
TL;DR: In this article, a data processing system having an arithmetic unit is designed for a multiplication of n-place numbers in 2's complement according to the Booth algorithm, and for division of unsigned numerals.
Abstract: A data processing system having an arithmetic unit is designed for a multiplication of n-place numbers in 2's complement according to the Booth algorithm, and for division of unsigned numerals. A 2n-stage shift register is connected over a logical control circuit to the operation code inputs of an ALU. The control circuit automatically forms instruction code signals to the ALU as a function of informational bits derived from the shift register, whereas other operation code input signals are directly connected to the operation code inputs. The control circuit is a sequential circuit having a multiplexer for the selective through-connection of the multiplication code signals, the division code signals, or other operation code signals to the operation code inputs of the ALU.
TL;DR: In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary.
Abstract: In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.
TL;DR: A new square-law multiplier is introduced that is useful for modulo P i multiplication where P i is any modulus and is expected to have important applications in RNS arithmetic computing hardware.
Abstract: Modulo P i multipliers are implemented by look-up tables when P i is small (5 bits or less) and by index calculus if P i is larger (6 bits or more). However, index calculus only works for prime moduli P i . In this letter, we introduce a new square-law multiplier that is useful for modulo P i multiplication where P i is any modulus. It is expected that this will have important applications in RNS arithmetic computing hardware.
TL;DR: The regular interconnection structures of the multiplier array cell elements, which are ideal for VLSI implementation, are described and the speed and hardware complexity of two new iterative array algorithms, both of which require n-cell delays, are compared.
Abstract: Algorithms for the parallel multiplication of two n- bit binary numbers by an iterative array of logic cells are discussed. The regular interconnection structures of the multiplier array cell elements, which are ideal for VLSI implementation, are described. The speed and hardware complexity of two new iterative array algorithms, both of which require n-cell delays for one n-bit × n-bit multiplication, are compared to a straightforward iterative array algorithm having a 2n-cell delay and its higher radix version having an n-cell delay.
TL;DR: In this paper, the authors investigate whether mini-mal realizations exist for discrete-event dynamic systems if only the input/output description is given by means of the impulse response.
Abstract: Recently an analogy between conventional linear system theory and the relatively new theory on discrete-event dynamic systems has been shown to exist. The system descrip-tion in the new theory resembles the one of the conventional theory, provided that the operations addition and multiplication are replaced by maximization and addition respectively. One also speaks of a system in the max-algebra, which is a semi-ring. In this paper we investigate, by pursuing the analogy mentioned above, whether mini-mal realizations exist for discrete-event dynamic system if only the input/output description is given by means of the impulse response. A constuction procedure is suggested. It turns out that the characteristic equation of a matrix in the max-al-gebra (to be defined) plays a crucial role.
TL;DR: In this paper, an additional adder unit and a selection network are added to the apparatus typically performing the arithmetic floating point function, which permits certain processes forming part of arithmetic operations to be executed in parallel.
Abstract: In a floating point arithmetic execution unit, an additional adder unit and a selection network are added to the apparatus typically performing the arithmetic floating point function. The additional apparatus permits certain processes forming part of arithmetic operations to be executed in parallel. For selected arithmetic operations, the final result can be one of two values typically related by an intermediate shifting operation. By performing the processes in parallel and selecting the appropriate result, the execution time can be reduced when compared to the execution of the process in a serial implementation. The fundamental arithmetic operations of addition, subtraction, multiplication and division can each have the execution time decreased using the disclosed additional apparatus.
TL;DR: An edge-addressed optical matrix processing system based in part on outer product decomposition is presented that not only can perform matrix–matrix multiplication but also many linear signal processing functions such as correlation and convolution.
Abstract: An edge-addressed optical matrix processing system based in part on outer product decomposition is presented that not only can perform matrix–matrix multiplication but also many linear signal processing functions such as correlation and convolution, the calculation of the cross-ambiguity function, the matrix inversion, and histogram generation. Techniques for handling complex data are also presented.
TL;DR: This paper provides an introduction to the concepts (and literature) of digital arithmetic and focuses on number systems and on algorithms for addition, subtraction, multiplication, and division, since these operations are the basis for current systems.
Abstract: Since development of the first computing system, there has been a continual quest for faster systems. This is a result of successful accomplishments with a given system which lead to desires to do even more, thereby requiring better performance. In many applications, better performance requires faster arithmetic. Recent progress in optical processing suggests that digital optical arithmetic may eliminate many of today’s performance bottlenecks. This paper provides an introduction to the concepts (and literature) of digital arithmetic. The primary emphasis is on number systems and on algorithms for addition, subtraction, multiplication, and division, since these operations are the basis for current systems.
TL;DR: An algorithm for multiplying an N × N recursive block Toeplitz matrix by a vector with cost O (N log N) is presented and its application to optimal surface interpolation is discussed.
TL;DR: In this article, an apparatus for detecting sudden changes of a feature in a region of an image that is divided into discrete picture elements, the feature being represented by complex valued signals, one for each picture element, the signal phase representing the feature class and the signal magnitude representing the certainty in the feature assertion.
Abstract: The invention concerns an apparatus for detecting sudden changes of a feature in a region of an image that is divided into discrete picture elements, the feature being represented by complex valued signals, one for each picture element, the signal phase representing the feature class and the signal magnitude representing the certainty in the feature assertion. The apparatus includes a first unit (3) for providing the complex valued signals (10) within the region and at least a second unit for providing at least two collections of complex valued multiplication factor signals (11, 12, 13) that corresponds to the complex valued signals. The apparatus further has a third unit (4) for forming a measurement signal for each collection of multiplication factors, the measurement signal consisting of the sum of the squares of the magnitudes of a first sum signal, which consists of the complex valued products of the complex valued signals (10) and corresponding complex valued multiplication factor signals (11, 12, 13), and a second sum signal, which consists of the complex valued products of the complex valued signals and corresponding conjugate of the complex valued multiplication factor signals. A fourth unit (5) is provided for a complex valued coefficient weighted summation of a function, which depends on the complex valued multiplication factor signals, of the measurement signals from the third unit (4).
TL;DR: In this article, a system of differential equations describing an avalanche multiplication in MIS-structure is derived and numerically solved in frames of the uniform electric field along the interface and infinitely thin inverse layer.
Abstract: A system of differential equations describing an avalanche multiplication in MIS-structure is derived and numerically solved in frames of the uniform electric field along the interface and infinitely thin inverse layer. The dynamic characteristic of the current, time dependence of the multiplication coefficient, and the current-voltage characteristics are in a good agreement with the experimental results.
[Russian Text Ignored].
TL;DR: In this paper, an iterative algorithm for general division in the symmetric residue number system is presented. But the algorithm is iterative in nature and requires the availability of two tables of symmetric residues representations of a certain kind of integer.
Abstract: In the residue number system, the arithmetic operations of addition, subtraction, and multiplication are executed in the same period of time without the need for interpositional carry. There is a hope for high-speed operation if residue arithmetic is used for digital computation. The division process, which is one of the difficulties of this operation, is developed in the symmetric residue number system. The method described here is iterative in nature and requires the availability of two tables of the symmetric residue representations of a certain kind of integer. An algorithm for general division is derived, and the way of choosing the entries which are used to find a quotient is discussed.
TL;DR: In this article, an optical system which performs the multiplication of binary numbers is described and proof-of-principle experiments are performed, where the simultaneous generation of all partial products, optical regrouping of bit products, and optical carry look-ahead addition are novel features of the proposed scheme which takes advantage of the parallel operations capability of optical computers.
Abstract: An optical system which performs the multiplication of binary numbers is described and proof-of-principle experiments are performed. The simultaneous generation of all partial products, optical regrouping of bit products, and optical carry look-ahead addition are novel features of the proposed scheme which takes advantage of the parallel operations capability of optical computers. The proposed processor uses liquid crystal light valves (LCLVs). By space-sharing the LCLVs one such system could function as an array of multipliers. Together with the optical carry look-ahead adders described, this would constitute an optical matrix–vector multiplier.
TL;DR: A hierarchical view of matrix operations is given and different optical architectures for implementing the basic operations of matrix algebra are surveyed in this article, where a hierarchical approach to matrix algebra is presented.
Abstract: Matrix algebra provides a mathematical language into which different classes of problems can be formulated in a consistent manner. These problems include those encountered in signal and image processing and numeric as well as symbolic computing. The fundamental operations of matrix algebra involve the arithmetic operations of multiplication and addition/subtraction along with global interconnections between one- or two-dimensional arrays of numbers. Both of these characteristics match the advantages provided by an optical system. A hierarchical view of the matrix operations is given and different optical architectures for implementing the basic operations of matrix algebra are surveyed.
TL;DR: The basic properties of multiplication algebras of non-associative algebra over rings are introduced in this article, along with a characterization of semisimple artinian multiplication algebra and a discussion of the simple factors of a multiplication algebra modulo its Jacobson radical.
Abstract: The basic properties of multiplication algebras of nonassociative algebras over rings are introduced, including a discussion of multiplication algebras of tensor products of algebras. A characterization of semisimple artinian multiplication algebras is given along with a discussion of the nature of the simple factors of a multiplication algebra modulo its Jacobson radical. A criterion distinguishing the multiplication algebras of certain associative algebras is proved. Examples are included to illustrate certain proved results.