TL;DR: A complex encoding architecture is suggested that incorporates notation-dependent activation of addition and multiplication facts, as well as interpenetration of number reading and number-fact retrieval processes.
TL;DR: A multilevel algorithm is applied to the solution of an integral equation using the conjugate gradient method and shows that the complexity of a matrix-vector multiplication is proportional to N (log(N))2.
TL;DR: The current implementation of the Davidson method for determining a few selected eigenvalues and their eigenvectors on either end of the spectrum of a large, real, symmetric matrix is presented.
TL;DR: This paper presents code sequences for division by arbitrary nonzero integer constants and run-time invariants using integer multiplication using a two's complement architecture, and treats unsigned division, signed division, and division where the result is known a priori.
Abstract: Integer division remains expensive on today's processors as the cost of integer multiplication declines. We present code sequences for division by arbitrary nonzero integer constants and run-time invariants using integer multiplication. The algorithms assume a two's complement architecture. Most also require that the upper half of an integer product be quickly accessible. We treat unsigned division, signed division where the quotient rounds towards zero, signed division where the quotient rounds towards -∞, and division where the result is known a priori to be exact. We give some implementation results using the C compiler GCC.
TL;DR: This article showed that the associative confusion effect is also present in elementary school children, although it varies with the size of the numbers in the pair and the age of the child, and that the temporal parameters that govern these effects are similar for children and adults.
TL;DR: This work designs and performs initial experiments for handling 8-bit MSD number addition and subtraction and presents the results, to confirm the underlining operational principles of the proposed optoelectronic shared content-addressable-memory MSD adder.
Abstract: Addition is the most primitive arithmetic operation in digital computation. Other arithmetic operations such as subtraction, multiplication, and division can all be performed by addition together with some logic operations. With the binary number system, addition speed is inevitably limited by the carry-propagation schemes. On the other hand, carry-free addition is possible when the modified signed-digit (MSD) number representation is used. We propose a novel optoelectronic scheme to handle the parallel MSD addition and subtraction operations. An optoelectronic shared content-addressable memroy is introduced. The shared content-addressable memory uses free-space optical processing to handle the large amount of parallel memory access operations and uses electronics to postprocess and derive logic decisions. We analyze the accuracy that the required optical hardware can deliver by using a statistical cross-talk-rate model that we propose. We also evaluate other important device and system performance parameters, such as the memory capacity or the maximum number of parallel bits the adder can handle in terms of a given cross-talk rate at a certain repetition rate, the corresponding diffraction-limited memory density, and the system’s power efficiency. To confirm the underlining operational principles of the proposed optoelectronic shared content-addressable-memory MSD adder, we design and perform initial experiments for handling 8-bit MSD number addition and subtraction and present the results.
TL;DR: In this paper, a method and apparatus for decoding Reed-Solomon codes in large Galois Fields GF(2n) is presented, where the finite field is represented as a quadratic extension field of one or more subfields.
Abstract: A method and apparatus for decoding Reed-Solomon codes in large Galois Fields GF(2n) represents the finite field as a quadratic extension field of one or more subfields GF(2m). This type of field representation allows embedded subfields, as well as the primary extension field to be simultaneously represented in normal form. The basic arithmetic operations for the extension field are written solely in terms of operations performed in one or more subfields. The operations of multiplication, inverse, square, square root and conjugation are performed in GF(2n), utilizing only operations from the subfield GF(2m).
TL;DR: A floating point arithmetic unit that executes a single compound instruction that produces the result A+B×C with A, B and C being floating point numbers is described in this article, where the result is identical to a result obtained by executing a separate instruction for the multiply, with normalized result, followed by an add instruction with a normalized result.
Abstract: A floating point arithmetic unit that executes a single compound instruction that produces the result A+B×C with A, B and C being floating point numbers. Arithmetic on the exponents of A, B and C provide a normalized result of the multiplication before the addition takes place producing a normalized result of the compound instruction. The final normalized result is identical to a result that would be obtained by executing a separate instruction for the multiply, with normalized result, followed by an add instruction with a normalized result.
TL;DR: It is proved that the functions computable in on-line by a finite automaton are piecewise affine functions whose coefficients are rational numbers.
Abstract: After a short introduction to on-line computing, we prove that the functions computable in on-line by a finite automaton are piecewise affine functions whose coefficients are rational numbers (i.e., the functions f(x)=ax+b, or f(x,y)=ax+by+c where a, b, and c are rational). A consequence of this study is that multiplication, division and elementary functions of operands of arbitrarily long length cannot be performed using bounded-size operators. >
TL;DR: In this article, a single polynomial arithmetic operation over finite fields is provided in an error correcting system for correcting Reed-Solomon codewords with mixtures of errors and erasures in an optical disk storage.
Abstract: A single module for performing polynomial arithmetic operations over finite fields is provided in an error correcting system for correcting Reed-Solomon codewords with mixtures of errors and erasures in an optical disk storage. The module comprises two-dimensional register arrays, which serve as a working area to store initial data and intermediate results of the polynomial operations. A set of multiplier-adder units performs multiplication and addition operations under the data supplied from the register arrays based on finite field arithmetic over the Galois field GF(2 8 ). A set of multiplexers routes the input data to the corresponding multiplier-adder units depending on the polynomial functions to be performed. In response to initial condition signals, a control system determines what polynomial functions are to be performed by the polynomial processing module and supplies the module with control signals to provide the data control in the register arrays and selection of the multiplexers.
TL;DR: In this article, the authors focused on schemes involved in solving two-step word problems in which one of the two operations is either addition or subtraction and the other is either multiplication or division.
Abstract: This study focuses on schemes involved in solving two-step word problems in which one of the two operations is either addition or subtraction and the other is either multiplication or division. Analysis of the problems yielded three possible basic compound schemes: (1) the Hierarchical Scheme, (2) the Shared-Whole Scheme, and (3) the Shared-Part Scheme. Twenty-one problems, in four different contexts, given to about 2000 students in Grades 3 to 6. It was found that ‘Schemes’, ‘Operations’ and their interactions affect the difficulty level of the problems and that the order of difficulty of the 21 problems remained constant at the various grade levels.
TL;DR: The systematic derivation of a static bit-level parallel algorithm to implement multiplication of integers serves both as an explanation of the circuit, and as a demonstration that it is correct ‘by construction’.
Abstract: This paper presents in some detail the systematic derivation of a static bit-level parallel algorithm to implement multiplication of integers, that is to say one which might be implemented as an electronic circuit. The circuit is well known, but the derivation shows that its design can be seen as the consequence of decisions made (and explained) in terms of the abstract algorithm. The systematic derivation serves both as an explanation of the circuit, and as a demonstration that it is correct ‘by construction’. We believe that the technique is applicable to a wide range of similar algorithms.
TL;DR: Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described.
Abstract: Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described. No assumption is made on the shape or ...
TL;DR: In this article, a computer unit arranged to receive the physical values of a user measured by the measuring units, the normal game score of the equipment and also to receive multiplication factors for certain physical values and units for displaying the scores produced by multiplication of the measured physical values on game scores with the multiplication factors supplied to the computer unit.
Abstract: A skill-testing, entertainment or game equipment operating on the basis of user's physical values comprises units for non-invasive measuring physical values of a user, a computer unit arranged to receive the physical values of a user measured by the measuring units, the normal game score of the equipment and also to receive multiplication factors for certain physical values and units for displaying the scores produced by multiplication of the physical values measured by the measuring units on game scores with the multiplication factors supplied to the computer unit.
TL;DR: Run-time optimization of array distribution and off-processor data fetching is reported on to reduce both the communication and computation time ofparse matrix-vector multiplication in iterative linear solvers.
Abstract: Sparse matrix-vector multiplication forms the heart of iterative linear solvers used widely in scientific computations (e.g., finite element methods). In such solvers, the matrix-vector product is computed repeatedly, often thousands of times, with updated values of the vector until convergence is achieved. In an SIMD architecture, each processor has to fetch the updated off-processor vector elements while computing its share of the product. In this paper, we report on run-time optimization of array distribution and off-processor data fetching to reduce both the communication and computation time. The optimization is applied to a sparse matrix stored in a compressed sparse row-wise format. Actual runs on test matrices produced up to a 35 percent relative improvement over a block distribution with a naive multiplication algorithm while simulations over a wider range of processors indicate that up to a 60 percent improvement may be possible in some cases.
TL;DR: Sharp upper bounds are obtained for the complexity of the most difficult n-variable function over certain bases and explicit nonlinear lower bounds are proved for the formula size of analog circuits over bases containing addition, subtraction, multiplication, the sign function and all real constants.
Abstract: We consider the complexity of computing Boolean functions by analog circuits of bounded fan-in, i.e. by circuits of gates computing real-valued functions, either exactly or as a sign-representation. Sharp upper bounds are obtained for the complexity of the most &@cult n-variable function over certain bases (signrepresentation by arithmetic circuits and exact computation by piecewise linear circuits). Bounds are given for the computational power gained by adding discontinuous gate functions and nondeterminism. We also prove explicit nonlinear lower bounds for the formula size of analog circuits over bases containing addition, subtraction, multiplication, the sign function and all real constants.
TL;DR: This paper generalizes the best of several implementations of parallel matrix-vector multiplication to mesh architectures, demonstrating that on hypercubes, the cost of communication can be overcome to a much larger extent than is often assumed.
Abstract: The critical bottlenecks in the implementation of the conjugate gradient algorithm on distributed memory computers are the communication requirements of the sparse matrix-vector multiply and of the vector recurrences. In a previous paper (G. Lewis et al., 1993), we described the data distribution and communication patterns of several implementations of parallel matrix-vector multiplication, demonstrating that on hypercubes, the cost of communication can be overcome to a much larger extent than is often assumed. In this paper, we generalize the best of those implementations to mesh architectures. We make no assumptions about the mesh being square or power-of-two. We also comment on the implications of our results for structured problems and on the scalability of our approach. Results are presented for the implementation of these algorithms on the Intel Touchstone Delta and Paragon mesh multicomputers. >
TL;DR: New algorithms based on cellular automata operations for performing fast multiplication and inversion in GF(2/sup m/) are presented, which is highly parallel, modular and well-suited for VLSI implementation.
Abstract: Finite fields have proved to be very useful in error correcting codes, combinatorial design and many cryptographic applications. Finding multiplication of any two elements in GF(2/sup m/) and finding the inverse of an element are most difficult and time consuming operations. In this paper, new algorithms based on cellular automata operations for performing fast multiplication and inversion in GF(2/sup m/) are presented. The new design is highly parallel, modular and well-suited for VLSI implementation. >
TL;DR: This work describes a floating-point arithmetic unit based on the CORDIC algorithm that computes a full set of high level arithmetic and elementary functions: multiplication, division, (co)sine, hyperbolic, square root, natural logarithm, inverse (hyperbolic) tangent, vector norm, and phase.
Abstract: This work describes a floating-point arithmetic unit based on the CORDIC algorithm. The unit computes a full set of high level arithmetic and elementary functions: multiplication, division, (co)sine, hyperbolic (co)sine, square root, natural logarithm, inverse (hyperbolic) tangent, vector norm, and phase. The chip has been integrated in 1.6 /spl mu/m double-metal n-well CMOS technology and achieves a normalized peak performance of 220 MFLOPS. >
TL;DR: A high-accuracy optical multiplier that uses an optical correlator and a symbolic substitution adder that uses the modified signed-digit number representation is described, showing it to be preferable to other optical techniques and to be competitive with digital technology.
Abstract: A high-accuracy optical multiplier that uses an optical correlator is described A symbolic substitution adder that uses the modified signed-digit number representation is used as the basic module Emphasis is placed on the multiplication of many long words in parallel with minimum latency The encoding method we employ in the adders permits the use of a new optical algorithm and architecture to generate partial products in symbolic form in parallel Our multiplication algorithm and architecture are shown to be preferable to other optical techniques and to be competitive with digital technology; they are also shown to be particularly attractive for matrix–vector multiplication applications
TL;DR: A fast iterative modular multiplication algorithm is proposed for modular exponentiation with a large modulus, such as the RSA cryptosystem, where the limit on partial products is raised to become less than 2/sup n/ instead of modulus N with n-bit length.
Abstract: A fast iterative modular multiplication algorithm is proposed for modular exponentiation with a large modulus, such as the RSA cryptosystem. The limit on partial products is raised to become less than 2n instead of modulus N with n-bit length. The naturally generated carry signifies when the computed partial product is greater than or equal to 2n, and a subtraction is subsequently performed. No magnitude comparisons are required.
TL;DR: In this article, a multiplication circuit for controlling an analog input voltage by the use of a switching signal created by a digital voltage so as to either generate an analog output or to cut-off the output is presented.
Abstract: A multiplication circuit for controlling an analog input voltage by the use of a switching signal created by a digital voltage so as to either generate an analog output or to cut-off the output A digital input signal having a plural number of bits with given weights are introduced by use of capacitive coupling, and the resulting total becomes the multiplication result
TL;DR: A modification of the Booth algorithm is presented that produces correct results when the radix is any power of 2 and the multipliers are of any size.
Abstract: The Booth multiplication algorithm produces incorrect results for some word sizes, when it is extended for higher radix, fixed-point multiplication. We present a modification of the Booth algorithm that produces correct results when the radix is any power of 2 and the multipliers are of any size. >
TL;DR: A Galois field multiplication method for a set of a finite number of elements which enables four arithmetical operations including an addition, a deduction, a multiplication and a division, and a multiplier utilizing the multiplication method are disclosed as discussed by the authors.
Abstract: A Galois field multiplication method for a set of a finite number of elements which enables four arithmetical operations including an addition, a deduction, a multiplication and a division, and a multiplier utilizing the multiplication method are disclosed. The Galois field multiplication method easily realizes various Galois field multipliers by ANDing respective items of a multiplicand with a corresponding one of the items of a multiplier factor in a stepwise manner, rotating left values resulted from the AND operation at the previous step, exclusively ORing the respective values resulted from the rotation with the respective corresponding values resulted from the AND operation at the current step, and operating on the highest polynomial term generated at the previous step in accordance with a generated polynomial.
TL;DR: An algorithm with weighted-shifted addition, parallel-array multiplication, and a two-stage-array complex operation is proposed to carry out the multiplication of two complex numbers.
Abstract: Based on a negative binary number system, an algorithm with weighted-shifted addition, parallel-array multiplication, and a two-stage-array complex operation is proposed to carry out the multiplication of two complex numbers. The complex multiplication is performed without signs, carries, and recoding. The algorithm is suitable for optical implementation, and an optical parallel architecture is suggested. The experimental result is also given.
TL;DR: In this article, an arithmetic processor employs two modes of non-pipeline operation and pipeline operation, and is provided with a redundant binary multiplication part for generating redundant binary multiplied results and a supplementary term, first to third intermediate latches for storing two pairs of partial product added results and the complementary term, a redundancy binary accumulation part for accumulating the results of first-to-third intermediate latch and a value of an accumulation result latch to store the thus accumulated result into the accumulative result latch, and an redundant binary/binary conversion part for converting into a binary numeral the
Abstract: An arithmetic processor employs two modes of nonpipeline operation and pipeline operation, and is provided with a redundant binary multiplication part for generating redundant binary multiplied results and a supplementary term, first to third intermediate latches for storing two pairs of partial product added results and the supplementary term, a redundant binary accumulation part for accumulating the results of first to third intermediate latches and a value of an accumulation result latch to store the thus accumulated result into the accumulation result latch, and a redundant binary/binary conversion part for converting into a binary numeral the result of the redundant binary multiplication part or the result of the redundant binary accumulation part. Thereby high-speed operations of multiplication instruction and product-sum operation instruction are achieved.
TL;DR: By using this number system, the novel method to implement the number system which uses the ratio of the number of one and zero pulses in a pulse stream, the study can implement artificial neural networks and represent the simulation results for the digit recognition problem.
Abstract: Stochastic computation uses pulse streams to represent numbers. In this paper, we have studied the novel method to implement the number system which uses the ratio of the number of one (high) pulses and the number of zero (low) pulses in a pulse stream. With this number system, if we let P be the probability that the pulse is one in a pulse stream, then the number Y we want to deal with is defined as Y=P/(1-P). We have studied the method to implement the basic operations such as an addition, a multiplication and a division using this number system and considered the error characteristics of such operations in stochastic computation. Also we introduce an averaging circuit to reduce the error which is inherent in stochastic computation. According to the study, by using this number system we can implement artificial neural networks and represent the simulation results for the digit recognition problem. >
TL;DR: A design of a multiplier for the multiplication of complex numbers that combines shared Booth encoding for the real and imaginary parts including only one combined modified Wallace tree of 4:2 adders for each part resulting in a faster multiplier circuit.
Abstract: In this article we consider a design of a multiplier for the multiplication of complex numbers. The complex numbers are packed into one 32-bit word. They are represented by two 13-bit parts with the same 6-bit exponent. Multiplication of complex numbers is examined from the perspectives of performance, complexity and silicon area. The design is unique and combines shared Booth encoding for the real and imaginary parts including only one combined modified Wallace tree of 4:2 adders for each part. The regular Wallace tree is compared with the tree of 4:2 adders. This design results in a more compact wiring structure and balanced delays resulting in a faster multiplier circuit. The number of adders used in the multiplier is also reduced. We consider VLSI CMOS technology and the relevant characteristics as they impact the implementation and performance.
TL;DR: In this paper, a design for logarithmic speed modular multiplication is given and a quantitative comparison made with the best implementations of existing, more standard algorithms is reported using a chip area at the limit of current technology.
Abstract: A design for logarithmic speed modular multiplication is given and a quantitative comparison made with the best implementations of existing, more standard algorithms. A 40-fold increase in speed is reported using a chip area at the limit of current technology.