TL;DR: It is shown that the sign/logarithm approach provides improved arithmetic quantization error performance for a given word size over FFT's implemented with conventional fixed or floating point arithmetic, and that its implementation is faster and less complex than conventional approaches.
Abstract: Sign/logarithm arithmetic is applicable to a variety of numerical applications where wide dynamic range and small wordsize are required. In this paper the basic sign/logarithm arithmetic operations required for signal processing (i.e., addition, subtraction, and multiplication) are reviewed, the computational errors are analyzed for FFT realization, and simulation results are presented which serve to verify the analysis. It is shown that the sign/logarithm approach provides improved arithmetic quantization error performance for a given word size over FFT's implemented with conventional fixed or floating point arithmetic, and that the sign/logarithm implementation is faster and less complex than conventional approaches.
TL;DR: The bilinear complexity of multiplying two arbitrary elements from an nth degree extension Φ of a finite field F, and the related problem of multiplying, over F, two polynomials of degree n − 1 with indeterminate coefficients is studied.
TL;DR: Every finite dimensional division algebra of minimal rank is a finite simple field extension and the structure of the variety of optimal algorithms for the computation of the multiplication in such fields modulo the isotropy group of the problem is investigated.
Abstract: The purpose of this paper is to show that every finite dimensional division algebra of minimal rank, i.e., of minimal complexity with respect to the noncommutative model of computation, is a finite simple field extension. Moreover, we investigate the structure of the variety of optimal algorithms for the computation of the multiplication in such fields modulo the isotropy group of the problem.
TL;DR: Some corrections are made for the original paper "A fast computational algorithm for the discrete cosine transform," 1 which contains some errors of indexes and of multiplication factors.
Abstract: Some corrections are made for the original paper "A fast computational algorithm for the discrete cosine transform," 1 which contains some errors of indexes and of multiplication factors.
TL;DR: Avalanche multiplication of signal charge in surface-channel charge-coupled devices is reported in this article, where the electrons are made to fall down a steep barrier of more than 8 V in an overlapping gate structure.
Abstract: Avalanche multiplication of signal charge in surface-channel charge-coupled devices is reported in this paper. Experimental observations show that avalanche multiplication takes place when the electrons are made to fall down a steep barrier of more than 8 V in an overlapping gate structure. For a 16-V fall, the gain in charge is about 3-percent per transfer. A simple model is developed which explains the experimental data reasonably well. The upper limit to the amplitude of clock voltages that can be applied to a CCD is likely to be determined by this avalanche multiplication mechanism rather than the oxide breakdown criterion.
TL;DR: The network, with its area 0(N) and operation time 0(√N), matches, within a constant factor, the known theoretical Ω(N2) lower bound to the area × (time)2 measure of complexity in the VLSI model of computation.
Abstract: This paper describes a VLSI network for the multiplication of two N-bit integers, for very large N. The network, with its area 0(N) and operation time 0(√N), matches, within a constant factor, the known theoretical Ω(N2) lower bound to the area × (time)2measure of complexity in the VLSI model of computation. The network, which is based on the discrete Fourier transform, has an extremely regular mesh structure, and thus all wires have approximately the same length.
TL;DR: In this paper, a digital signal processing system applies numbers to the multiplied or divided to address a memory storing a logarithm table to produce the logrithms of the numbers.
Abstract: A digital signal processing system applies numbers to the multiplied or divided to address a memory storing a logarithm table to produce the logarithms of the numbers. These logarithms are added for multiplication or substracted for division to produce the logarithm of the result number which is applied to address a memory storing an antilogarithm table to produce the result number. The base of the logarithms is selected in accordance with the magnitudes of the numbers to be multiplied or divided so as to utilize substantially the entire range of magnitudes of the digital representation of the logarithms.
TL;DR: In this paper, an improved technique for determining the multiplication of highly subcritical systems that uses the moments of the counting distribution from a neutron detector is presented. But the technique is limited to the case of a single subcritical system.
TL;DR: A survey is made of different algorithms and architectures for optical matrix algebraic processors and they are compared with respect to their hardware requirements and computational efficiencies.
TL;DR: In this paper, an optical computing apparatus and method for high speed multiplication of numerical array, wherein the arrays to be multiplied are arranged according to a systolic processing or engagement processing format, and wherein the element multiplication is performed by analog convolution.
Abstract: An optical computing apparatus and method for high speed multiplication of numerical array, wherein the arrays to be multiplied are arranged according to a systolic processing or engagement processing format, and wherein the element multiplication is performed by analog convolution. In a preferred embodiment of the invention, the multiplication is implemented with first and second spacial light modulated devices which provide the selected processing format in one spacial dimension and binary multiplication by analog convolution in a second spacial dimension.
TL;DR: A variant of the so-called “binary” algorithm for finding the GCD (greatest common divisor) of two numbers which requires no comparisons is investigated and it is shown that when implemented with carry-save hardware, it can be used to find the modulo B inverse of an n-bit binary integer in a time proportional to n.
Abstract: We investigate a variant of the so-called “binary” algorithm for finding the GCD (greatest common divisor) of two numbers which requires no comparisons We show that when implemented with carry-save hardware, it can be used to find the modulo B inverse of an n-bit binary integer in a time proportional to n, using only registers of length proportional to n Such a hardware implementation of this algorithm set up for finding inverses with respect to a 336 bit modulus B would have applications in the currently expanding field of secure data transmission and storage In such an implementation, multiplication in linear time-both modulo B and ordinary—would come along as a by-product because multiplication can be achieved by a sequence of nine inversions, some additions and negations
TL;DR: In this article, a digital multiplication circuit for a microprocessor utilizes a modified Booth algorithm for implementing the digital multiplication of two numbers and includes a Booth recoder for recoding the multiplier into a selected number, n, of Booth operation sets where n is a positive integer that equals one-half the number of bits in the multiplier.
Abstract: A digital multiplication circuit for a microprocessor utilizes a modified Booth algorithm for implementing the digital multiplication of two numbers and includes a Booth recoder for recoding the multiplier into a selected number, n, of Booth operation sets where n is a positive integer that equals one-half the number of bits in the multiplier. Each operation set is applied to a second plurality of n partial products selectors which are connected in cascade arrangement according to multiplicand sets and wherein each partial product selector multiplicand set implements one of the recoded Booth operation sets. The outputs of the partial product selectors are summed by a summation means and a domino circuit means provides an evaluation pulse for each member of the partial product selector at the completion of the Booth operation set that is connected to the partial product selector.
TL;DR: VLSI computing arrays for matrix multiplication and covariance matrix inversion have applications in many fields and a properly chosen configuration can significantly reduce the computing time of the multiplication array.
Abstract: VLSI computing arrays for matrix multiplication and covariance matrix inversion have applications in many fields. Under the constraint of limited I/O bandwidth of the host system or the computing array, three configurations for the interfacing and controlling of a multiplication array to achieve optimal performance under different adverse situations are examined. The three configurations are multiplexing loading, processor row loading, and processor column group loading. A properly chosen configuration can significantly reduce the computing time of the multiplication array.
TL;DR: Multiple shoots formation and elongation was induced from stem explants of Sapium seedlings on media containing cytokinins and rooting of isolated shoots by treatment with an auxin mixture and transfer of the plantlets to field have been successful.
Abstract: Multiple shoots formation and elongation was induced from stem explants of Sapium seedlings on media containing cytokinins.
TL;DR: In this paper, the mapping of matrix x matrix multiplication on to both word and bit level systolic arrays has been investigated and a detailed description of the circuit which emerges is given and some details relating to its practical implementation are discussed.
Abstract: Westglen Engineering Ltd., 4 Mercia Way,Bell's Close Industrial Estate,Newcastle upon Tyne NE15 6UF, EnglandAbstractThe mapping of matrix x matrix multiplication on to both word and bit level systolicarrays has been investigated. It has been found that well defined word and bit level dataflow constraints must be satisified within such circuits. An efficient and highly regularbit level array has been generated by exploiting the basic compatibilities in data flowsymmetries at each level of the problem. A detailed description of the circuit which emergesis given and some details relating to its practical implementation are discussed.IntroductionConsiderable progress has been achieved in recent years in the development of algorithmsand architectures which exploit the potential computational power of VLSI. In particular,the systolic array approach [1] is gaining increasing popularity and has now been applied toa wide range of problems. Most of the effort has been concentrated at the word or systemlevel and this has produced computational structures for a number of problems in linearalgebra. These include arrays for the solution of linear equations [2], least squaresproblems [3]and singular value decomposition [4]. The work which we have undertaken hasbeen concentrated mainly at the other end of the spectrum and we have recognised that thesystolic array approach applied at the bit level provides one with an extremely powerfulapproach to VLSI chip design. In particular, we have shown that many important signal anddata processing functions can be implemented using highly repetitive patterns of simple bitlevel cells having little or no long range connectivity [5 -7] and some of these ideas havesince been implemented as integrated circuits [8,9]. The basis of our approach is that wetreat problems from the outset at the bit level and usually no subset of cells within theresulting arrays can be associated with a specific multiplication or addition at word level.Given the attraction of the bit level approach (for example, the ability to completely tilea plane of silicon with simple cells) the,question arises as to how one can best map a givenword level problem on to a bit level array. In this paper we investigate this matter withinthe context of matrix x matrix multiplication and demonstrate for this example how maintain-ing similar data flow geometries at different levels of the problem produces an efficientand highly regular bit level array.The organisation of this paper is as follows. In section 2 we break matrix x matrixmultiplication down from sums of word level products to sums of bit level products and thendiscuss the data flow constraints at each level. Details of the structure which emerges arethen described in section 3, where we also discuss the envisaged implementation of thecircuit. The important conclusions which can be drawn from the work are given in section 4.Analysis of problem at word and bit levelThe multiplication of two n x n matrices A = (aik) and B = (bk.) to form a matrix productC = (cij) is defined bycii = aik bkj
TL;DR: From the range of the redundant significand and the absolute error of on-line operations, the MRRE (maximum relative representation error) is defined and analyzed for floating-point on- line addition and multiplication.
Abstract: The properties of redundant number system in significand (mantissa) representation are studied and the range of redundant significand is derived. From the range of the redundant significand and the absolute error of on-line operations, the MRRE (maximum relative representation error) is defined and analyzed for floating-point on-line addition and multiplication.
TL;DR: A PROGRAMMABLE SINGLE-CHIP 23K gate Digital Signal Processor (DSP), affording loons 16b by 16b multiplication and flexible memory operation, will be described.
Abstract: A PROGRAMMABLE SINGLE-CHIP 23K gate Digital Signal Processor (DSP), affording loons 16b by 16b multiplication and flexible memory operation, will be described. The processor integrates 91,000 transistors and has been fabricated by using double aluminum-layer 2 . 3 ~ gate length Si gate CMOS technology. A hierarchical design methodology based on a standard cell approach was applied for this logic VLSI development. In the communication field, a DSP LSI that can provide multiplication and addition capability is a key device for the implementation of more sophisticated and high-speed data processing systems. A block diagram of the DSP LSI is shown in Figure 1. It consists of a program counter, an instruction decoder, an address unit, an arithmetic and logic unit (ALU), a multiplier, an IjO controller and memories.
TL;DR: The hypothesis that underestimations of area and volume judgments are due primarily to errors in mental multiplication is supported.
Abstract: Area and volume judgments are usually underestimates of true size. The possibility that these underestimations are more dependent on cognitive factors than on sensory ones was investigated in two experiments. In Experiment 1, subjects judged the products of multiplication problems varying in number of multiplications and number of digits per number. Judgments were analyzed using power functions. The power function exponents for problems requiring one multiplication and two multiplications were very similar to previously obtained exponents for judgments of area and volume, respectively. In Experiment 2, subjects judged three-number multiplication problems and volume ratios of depicted, rectangular boxes. Power-function exponents for these two kinds of tasks had similar means and standard deviations and were significantly correlated. These results support the hypothesis that underestimations of area and volume judgments are due primarily to errors in mental multiplication.
TL;DR: High-speed hardware function generation using table look-up in ROM and high-speed multiplication is considered for determining the optimal partitioning of the interval [a, b] including an estimate of the number of ROM units required.
Abstract: High-speed hardware function generation using table look-up in ROM and high-speed multiplication is considered. The reduced interval of interest, [a, b], is split into several large partitions. Within each large partition the functionf(x) is evaluated by piecewise polynomials of the same low degree whose coefficients are stored in ROM. Four basic architectures for such a scheme are considered. A nonlinear programming problem is solved for determining the optimal partitioning of the interval [a, b]. The objective function is the average number of multiplications, which takes into account the probability distribution r(x) = 1/(x ln β), for the mantissas of normalized floating-point numbers where β is the radix of the number system. The constraint is the available number of ROM words. The particular case of f(x) = 1/x and β = 2 is considered in detail and results are presented including an estimate of the number of ROM units required.
TL;DR: In this paper, a meteorological monitoring station was operated at Blytheville, Arkansas from April 1978 to April 1980, and direct normal, global, and diffuse sky radiation were monitored.
Abstract: A meteorological monitoring station was operated at Blytheville, Arkansas, from April 1978 to April 1980. Direct normal, global, and diffuse sky radiation were monitored. From these data, models have been developed for the prediction of solar radiation, and discussions of several diffuse solar radiation models are included herein. Comparisons are made with these current diffuse models, and the correlation is quite good. In addition, instantaneous shadow band correction factors are presented which will allow a more accurate correction to be applied to the measured diffuse sky reading. The instantaneous correction factors are keyed to the global radiation measurement. Instead of applying a fixed correction factor to the diffuse measurement, regardless of sky condition, a variable factor can be applied. This will solve some of the current errors observed in diffuse measurements, because the current factors overpredict the diffuse radiation on cloudy days and underpredict the diffuse on clear days.
TL;DR: An instructional toy device for multiplication computation formed by a string of cuboid blocks having an interconnecting elastic string permitting equal segments of multiple blocks to be folded in a back and forth arrangement of aligned rows is described in this article.
Abstract: An instructional toy device for multiplication computation formed by a string of cuboid blocks having an interconnecting elastic string permitting equal segments of multiple blocks to be folded in a back and forth arrangement of aligned rows, the blocks each being individually marked with a number progressing in an ordinary arithmetic series of increasing units from an end block marked 1 to an end block marked with the number of the total blocks in the string, a multiplication computation being represented by a selected number equalling the number of blocks in a segment and a multiple equalling the number of rows, the product of which is the numerical marking of the last block in the segment of the last row.
TL;DR: It is shown that floating point (or integer) multiplication can be reduced to the evalution of a very large class of functions including most of the nontrivial functions used in practice.
Abstract: It is shown that floating point (or integer) multiplication can be reduced to the evalution of a very large class of functions including most of the nontrivial functions used in practice. That means that whenever any such function can be evaluated by boolean circuits of size S(n), then multiplication can be done with circuits of size O(S(n)). as well.
TL;DR: The arithmetic complexity of computing the p th Kronecker power of an n × n matrix is studied to obtain an algorithm that achieves the optimal rate of one multiplication per output at the expense of increasing the number of additions.
TL;DR: In this article, the leading zeros prefixing the highest order significant digit in both a multiplier and a multiplicand are identified, counted and removed, and to the resultant partial product a number of zeros are prefixed equal to the number of zero numbers originally stripped from the multiplier and multiplicands.
Abstract: Processor apparatus is described for performing binary and decimal arithmetic operations. In performing decimal multiplication with the processor apparatus, to reduce the amount of processing to be done with the apparatus and thereby speed up the performance of the decimal multiplication, the leading zeroes prefixing the highest order significant digit in both a multiplier and a multiplicand are identified, counted and removed. Decimal multiplication is then performed using the stripped multiplier and multiplicand, and to the resultant partial product a number of zeroes are prefixed equal to the number of zeroes originally stripped from the multiplier and multiplicand. The result is the product of the original multiplier and multiplicand.
TL;DR: In this article, the sum of the exponents of the two operands is determined by the use of a single adder and a carry signal of "1" is applied whenever addition is carried out.
Abstract: In floating-point multiplication, the sum of the exponents of the two operands is determined by the use of a single adder. The exponents are modified either before they are inputted to the adder or at the output of the adder. A carry signal of "1" is applied whenever addition is carried out. A signal indicative of occurrence of underflow or overflow is also obtained.
TL;DR: In this article, the authors discuss the problem of summing a set of positive numbers in an increasing order of magnitude, where x denotes the largest floating point number that can be represented in the computer.
Abstract: This chapter discusses the evaluation of functions. As a first example of the analysis of a nontrivial numerical algorithm, the chapter discusses the process of summing a set of numbers. Even the elementary computation is seen to have unsuspected complications. An informal language for describing algorithms is also discussed. An algorithm is discussed that adds together a set of positive numbers in an increasing order of magnitude. In this algorithm, xxx denotes the largest floating point number that can be represented in the computer. Many times the purpose for computing is to see whether it is zero. One of the most common numerical computations is the evaluation of polynomials. The chapter describes two algorithms for doing this and shows how a careful analysis of accuracy and efficiency can be used to select the best algorithm. The most complicated kind of function that can be evaluated using only addition, subtraction, and multiplication is a polynomial function. A special kind of polynomial that arises very frequently in applications is a truncated power series.
TL;DR: The authors introduce a VLSI-compatible architecture called the concurrent data-loading array processor (CDLAP) that can execute the multiplication for dense matrices without data reshaping, and the utilization of processing elements is virtually the best achievable for large matrices.
Abstract: The authors introduce a VLSI-compatible architecture called the concurrent data-loading array processor (CDLAP). Many matrix operations on systolic arrays have the matrix data reshaping problem. The CDLAP can execute the multiplication for dense matrices without data reshaping. A partitioned multiplication algorithm is also presented for matrices larger than the array size. Based on the design in this paper, the utilization of processing elements is virtually the best achievable for large matrices. The CDLAP, with small variations, can be used for band matrix multiplications. The performance, taking into account the total computation time and data transfer bandwidth, is found better than systolic arrays. 11 references.
TL;DR: The LSI Products Division of TRW is currently developing a third chip for its growing family of 22-bit floating point arithmetic devices, which will be the registered arithmetic logic unit (RALU), built in TRW's dual-metal one-micron bipolar "Omicron-B" process.
Abstract: The LSI Products Division of TRW is currently developing a third chip for its growing family of 22-bit floating point arithmetic devices. Joining the adder and the multiplier later this year will be the registered arithmetic logic unit (RALU), built in TRW's dual-metal one-micron bipolar "Omicron-B" process. Operating at a guaranteed (military temperature and supply voltage ranges) speed of 6 MHz, this device will be able to store, retrieve, add, subtract, and normalize 22-bit floating point numbers, convert between 22-bit floating point and 16-bit fixed point formats, and add, subtract, and perform logical operations on 16-bit fixed point numbers. With its built-in shifters and controls, it can also perform a fixed point multiplication or division or a floating point division in 16 clock cycles. The architecture of the RALU is very similar to that of the widely used 2901 four-bit microprocessor slice. The bus widths have been widened from 4 to 22 bits and the instruction set has been expanded to encompass the eight standard 2901 functions (for fixed point) and eight additional floating point and fixed-float conversion operations. The 2901's internal dual port RAM has been retained and widened for a 22-bit word size.
TL;DR: An electro-optical engagement-array architecture for performing matrix-matrix multiplication using twos complement arithmetic is pre-sented to offer a convenient means for handling bipolar numbers and a means for improvement in accuracy over conventional optical analog techniques.
Abstract: A digital optical architecture for performing matrix algebraRichard P. BockerKeith BromleyStanley R. ClaytonSignal Processing Technology Branch, Naval Ocean Systems Center271 Catalina Boulevard, San Diego, California 92152AbstractAn electro-optical engagement -array architecture for performing matrix -matrix multiplication using twos complement arithmetic is pre-sented. Twos complement arithmetic offers a convenient means for handling bipolar numbers, avoids the need for matrix partitioning whenthe matrices are real, and offers a means for improvement in accuracy over conventional optical analog techniques.IntroductionIn this paper we describe a technique for improving the accuracy of matrix multiplications performed optically through the use of thetwos complement fixed -point binary number representation. The twos complement representation also offers a convenient means for han-dling both positive and negative numbers (without the need for matrix partitioning) with the use of sign bits. The ideas will be illustratedusing an electro-optical engagement -array architecture. However, these concepts should be easily extendable to other optical architecturessuch as acousto-optical systolic or outer -product architectures.BackgroundIn recent publications, acousto-optical systolic -array architectures for performing matrix- vector and matrix-matrix multiplication (ref. 1,2) and acousto-optic outer -product architectures for performing matrix-matrix multiplication (ref. 3) have been described. Most recently, anelectro-optical engagement -array architecture (ref. 4), referred to as the RUBIC Cube Processor, for performing matrix -matrix multiplicationhas also been described. One variation of the RUBIC Cube Processor is depicted in Figure 1. Essential components of this processing struc-ture include a pulsed noncoherent light source, a 2D photodetector array, two 2D spatial light modulators operating in a reflective mode, anda single polarizing beam splitter. Collimating and imaging optics, as well as polarizers and waveplates, may be required but are not shownhere. The exact electro-optical configuration required would be highly dependent on the actual spatial lieht modulators employed in theprocessor. A similar engagement -array architecture employing transmissive spatial light modulators (ref. 5) eliminates the need for thepolarizing beam -splitter. The architecture shown in Figure 1 allows for the multiplication of two 3 by 3 matrices. That is,