TL;DR: A detailed model of how such "mindless" processes might lead to intelligent choices of strategies in one common situation: that in which people need to choose between stating a retrieved answer and using a backup strategy is described.
Abstract: Many intelligent strategy choices may be accomplished through relatively low-level cognitive processes. This article describes a detailed model of how such "mindless" processes might lead to intelligent choices of strategies in one common situation: that in which people need to choose between stating a retrieved answer and using a backup strategy. Several experiments testing the model's applicability to children's single-digit multiplication are reported. These include tests of predictions about when different strategies are used and how early experience shapes later performance. Then, the sufficiency of the model to generate both performance at any one time and changes in performance over time is tested through the medium of a running computer simulation of children's multiplication. The simulation acquires a considerable amount of multiplication knowledge, and its learning and performance parallel those of children in a number of ways. Finally, several implications of the model for understanding cognitive self-regulation and cognitive development are discussed.
TL;DR: In this article, the authors present the Multiplication modules and theorems of mori and mott for algebraic multiplication modules and derive theorem 1.1.2.
Abstract: (1988). Multiplication modules and theorems of mori and mott. Communications in Algebra: Vol. 16, No. 4, pp. 781-796.
TL;DR: This paper found that children in both grade levels apply rules to solve problems that involved multiplication by 1 or 0; however, not all students reported using these rules, and those students who did report the use of these rules were not consistent in the application of the rules.
Abstract: Strategies used by third and fourth graders to perform simple multiplication problems were examined. Children participated in an interview and then completed a timed production task in which they solved single-digit multiplication problems. Analyses of children's verbal reports and of solution latency data were found to support the view that the acquisition of mental multiplication begins with the use of counting strategies. By the fourth grade, however, there was a marked transition toward the use of a retrieval strategy. Children in both grade levels were found to apply rules to solve problems that involved multiplication by 1 or 0; however, not all students reported using these rules, and those students who did report the use of these rules were not consistent in the application of the rules. Comparisons between groups' and individual subjects' performance revealed that some important individual differences were obscured when the group served as the unit of analysis. Although several discrepancies were...
TL;DR: In this paper, Cramer's rule and Cayley-Hamilton theorem are provided in the so-called max algebra, which consists of the set of reals provided with two operations: maximization and addition.
TL;DR: In this paper, a CORDIC subsystem for multiplication of two complex digital numbers B and C, where one number is the sum of real and imaginary data portions, expressed in rectangular form (say Cr or CI), and the other number can be expressed in the rectangular form or can be represented by magnitude data expressed in polar form, is presented.
Abstract: A CORDIC (COordinate Rotation DIgital Computer) subsystem for multiplication of two complex digital numbers B and C, where one number is the sum of real and imaginary data portions, expressed in rectangular form (say Cr or CI), and the other number can be expressed in the rectangular form or can be represented by magnitude data, expressed in polar form (say, |B|, φ). An N-stage CORDIC portion of either recursive or pipeline sequential form, but devoid of multipliers, is used to rotate the I and Q terms of the first number through a phase angle φ of the polar-form multiplier number of the equivalent, taken from the rectangular form. The final computed data are the real and imaginary parts of the product.
TL;DR: This work outlines the design of a C * compiler for a hypercube multicomputer and aims to minimize the amount of time spent synchronizing, limit the number of interprocessor communications, and make each physical processor's emulation of a set of virtual processors as efficient as possible.
Abstract: A data parallel language such as C* has a number of advantages over conventional hypercube programming languages The algorithm design process is simpler, because (1) message passing is invisible, (2) race conditions are nonexistent, and (3) the data can be put into a one-to-one correspondence with the virtual processors Since data are mapped to virtual processors, rather than physical processors, it is easier to move algorithms implemented on one size hypercube to a larger or smaller system We outline the design of a C* compiler for a hypercube multicomputer Our design goals are to minimize the amount of time spent synchronizing, limit the number of interprocessor communications, and make each physical processor's emulation of a set of virtual processors as efficient as possible We have hand translated three benchmark programs and compared their performance with that of ordinary C programs All three programs—matrix multiplication, LU decomposition, and hyperquicksort—achieve reasonable speedup on a commercial hypercube, even when solving problems of modest size On a 64-processor NCUBE/7, the C* matrix multiplication program achieves a speedup of 27 when multiplying two 64 × 64 matrices, the hyperquicksort program achieves a speedup of 10 when sorting 16,384 integers, and LU decomposition attains a speedup of 7 when decomposing a 256 × 256 system of linear equations We believe the degradation in machine performance resulting from the use of a data parallel language will be more than compensated for by the increase in programmer productivity
TL;DR: In this article, a two parallel branches are used to perform multiplication and accumulation operations for the even and odd lines of the transform coefficient matrix, each branch includes an input circuit (SEM1, SOT1) whereby the contributions of the opposing index columns of the matrix may be added ; a multiplication circuit (ERM, ORM) which performs multiplication for each matrix column by means of an addition and shifting operation for each coefficient; and an accumulation circuit for the intermediate products of each column.
Abstract: The circuit consists of two parallel branches which perform multiplication and accumulation operations for the even and odd lines of the transform coefficient matrix. Each branch includes: an input circuit (SEM1, SOT1) whereby the contributions of the opposing index columns of the matrix may be added ; a multiplication circuit (ERM, ORM) which performs multiplication for each matrix column by means of an addition and shifting operation for each matrix coefficient; and accumulation circuit for the intermediate products of each matrix column (Figure 1).
TL;DR: A syStolic architecture, called systolic arrays with tags input (SATIN), is proposed, where not only data but also tags are pumped into the SATIN.
Abstract: A systolic architecture, called systolic arrays with tags input (SATIN), is proposed. Not only data but also tags are pumped into the SATIN. The SATIN can efficiently solve the stored-I/O and the time-variant functions problems often existing in conventional systolic arrays. For systematically designing a SATIN, the dependence-graph approach has to be extended to incorporate the tag-assignment procedure. Two design examples, matrix-matrix multiplication and matrix transpose, are used to illustrate this design method. >
TL;DR: It is shown that they oscillate with period equal to 2, which is an extension of the convergence property concerning max-min transitive fuzzy matrices.
TL;DR: In this paper, the multiplicand is segmented into a series of 8-bit slices and the multiplier is modified-Booth recoded into 3-bit groups, and corresponding partial product terms are reduced in a regular array of small carry-save adder cells.
Abstract: In a high-speed binary multiplier circuit, the multiplicand is segmented into a series of 8-bit slices and the multiplier is modified-Booth recoded into 3-bit groups The corresponding partial product terms are reduced in a regular array of small carry-save adder cells Iterative use of the CSA array provides the Wallace tree function in one-seventh the chip area or number of adders of a conventional implementation The multiplier is pipelined internally, driven by a fast, two-phase internal clock that is transparent to the user The internal clock stops and restarts upon loading new operand and instruction data to synchronize the internal clock to the system clock Other aspects of the invention include high-speed absolute value subtract circuitry for exponent calculations and normalizing floating point results
TL;DR: Four new arrays for signed number multiplication and multiplication/ addition based on full 2's complement representation are presented and it is shown that these units achieve the same speed advantages as other similar units which use redundant representations for the results.
Abstract: This paper presents four new arrays for signed number multiplication and multiplication/ addition. In these structures, it is assumed that the factors are expressed in 2's complement while the addend and the result are expressed in redundant notation. Two arrays operate in a serial-parallel way, since one of the factors is input in parallel, while the second factor and the addend (in the case of multiplication/addition) are entered digit by digit starting from the most significant. The other two arrays are fully serial because all the input numbers are processed digit by digit, starting with the most signifcant one. In all the arrays presented the results are produced in a serial manner from left to right. The arithmetic units introduced in this paper can be used as basic blocks of special purpose processors performing functions such as non-recursive digital filtering, signal correlation and matrix multiplication. It is shown that our units achieve the same speed advantages as other similar units which use redundant representations for the results, with a cost equivalent to their counterparts based on full 2's complement representation.
TL;DR: In this article, the authors describe algorithms and optical processor architectures for implementing a two-dimensional truth-table look-up processor using discrete orthogonal transforms such as one of the Walsh transforms, the Rademacher-Walsh transform, the Walsh-Kaczmarz transform, or the Haar transform.
Abstract: Algorithms and optical processor architectures for implementing a two-dimensional truth-table look-up processor are disclosed. An optical holographic medium stores the spectral expansion coefficients that map two-dimensional digital inputs of a binary truth-table into two-dimensional outputs of that binary truth table. Several algorithms are described using discrete orthogonal transforms such as one of the Walsh transforms (the Walsh-Hadamard transform, the Rademacher-Walsh transform, the Walsh-Kaczmarz transform) or the Haar transform. These transforms are used to find the corresponding spectral vectors and the corresponding boolean basis vector. The inner product multiplication of the spectral vector with the boolean basis vector yields the digital outputs of the binary truth-table. Another algorithm uses the Reed-Muller expansion which is a non-orthogonal transform. Various architectures of digital optic two-dimensional truth-table look-up processors are also disclosed. They include a coded phase correlator, a matrix multiplication optical processor, a bipolar coded phase optical correlator and a bipolar matrix multiplication optical processor.
TL;DR: In this paper, it was shown that a bounded analytic function / on the unit disk is in the little Bloch space if and only if the uniformly closed algebra on the disk generated by H°° and / does not contain the complex conjugate of any interpolating Blaschke product.
Abstract: We prove that a bounded analytic function / on the unit disk is in the little Bloch space if and only if the uniformly closed algebra on the disk generated by H°° and / does not contain the complex conjugate of any interpolating Blaschke product. A version of this result is then used to prove that if / and g are bounded analytic functions on the unit disk such that the commutator TfT* —TgTf (here Tf denotes the operator of multiplication by / on the Bergman space of the disk) is compact, then (1 — |z|2) min{|/'(z)|, |g'(z)|} —* 0 as \\z\\ t 1.
TL;DR: A well-known algorithm for complex multiplication which requires three real multiplications and five real additions is observed not to require commutativity, and the resulting extension of its applicability to complex matrices is examined.
Abstract: A well-known algorithm for complex multiplication which requires three real multiplications and five real additions is observed not to require commutativity. The resulting extension of its applicability to complex matrices is examined. The computational savings are shown to approach 1/4. even if a real multiplication is not more computationally costly than a real addition. The computational cost function used is based on the number of equivalent real additions, with every real multiplication counted as equivalent to r real additions. >
TL;DR: Possibilities and limitations are considered of realizing integrated analog filters with cut off frequencies below 10 Hz and it is recognized that for a given supply voltage electronic multiplication reduces the dynamic range because of noise-and DC-offset multiplication.
Abstract: Possibilities and limitations are considered of realizing integrated analog filters with cut off frequencies below 10 Hz. To arrive at an acceptable low chip-area, electronic multiplication is required to enhance the time-constant(s). It is recognized that for a given supply voltage electronic multiplication reduces the dynamic range because of noise-and DC-offset multiplication. As an example a 10 Hz low-pass filter has been successfully integrated on 0.4 sq. mm of chip area in a BICMOS process.
TL;DR: A programmable VLSI architecture with regular, modular, expansible features is designed for computing AB mod N, AB+C mode N, and polynomial evaluation modulo N to improve the security of cryptosystems without making any change to its control circuit.
Abstract: A programmable VLSI architecture with regular, modular, expansible features is designed for computing AB mod N, AB+C mode N, and polynomial evaluation modulo N. The size of the resultant circuit can be easily expanded to improve the security of cryptosystems without making any change to its control circuit. The computing procedures for all N throughout the range of 0 >
TL;DR: A simple but very-high-performance systolic architecture, the Superprocessor for Matrix Problems (S-MP), that satisfies constraints is presented, and implementation alternatives for the linear syStolic array for matrix-vector multiplication, which forms the core of S-MP are described.
Abstract: Limitations of current systolic designs are pointed out, and constraints are imposed to make systolic solutions practical. Matrix multiplication is used as an illustration, and a simple but very-high-performance systolic architecture, the Superprocessor for Matrix Problems (S-MP), that satisfies these constraints is presented. Implementation alternatives for the linear systolic array for matrix-vector multiplication, which forms the core of S-MP are described. >
TL;DR: Error detection can be accomplished by applying arithmetic codes to the multiplier hardware in different ways, and low-cost residue codes are applied to three different error detection architectures for both serial-parallel and fully bit-serial processing elements.
Abstract: Special-purpose architectures have been proposed to provide high processing rates for signal processing applications. These architectures use highly concurrent structures on VLSI circuits to achieve billions of multiply/add operations per second. Both serial-parallel and fully bit-serial multiplier elements have been proposed for highly concurrent signal processing arrays. Error detection can be accomplished by applying arithmetic codes to the multiplier hardware in different ways. Here, low-cost residue codes are applied to three different error detection architectures for both serial-parallel and fully bit-serial processing elements. The error performance of these different implementations is studied through computer simulation. The cost of using these codes in terms of silicon area and circuit complexity is also investigated. >
TL;DR: The authors proved that F20 = 2220 +1, which had been the smallest Fermat number of unknown character, is composite, and this computation, written entirely in Cray Fortran and called Cray library functions for the FFT's, would be impossible to do even on supercomputers without fast Fourier transform techniques for integer multiplication.
Abstract: The twentieth Fermat number, F20 = 2220 + 1, has been proven composite by machine computation. The Fermat numbers are the numbers Fn = 224 + 1, orginally conjectured by Fermat to be prime for all n. In fact, only for n equal to 0 through 4 are they known to be prime, and small factors of F9, Fl1, Fl2, Fl5, Fl6, have been known for some time. As part of a long-term test of the hardware reliability of the Cray-2 supercomputer at the Supercomputing Research Center, the authors proved that F20 = 2220 +1, which had been the smallest Fermat number of unknown character, is composite. The test for compositeness was the standard technique of Pepin [5]: For n > 1,Fn is prime if and only if 3(F-)/2 = -1 (mod Fn). This test for compositeness does not, of course, produce factors of the number, but was the test used for proving the compositeness of F7, F8, Flo, F13, F14 [2], [3], [4], [6], [7]. The result of the computation on the Cray-2 has been verified by performing the same computation on a Cray X-MP belonging to Cray Research. The total computation time on the Cray-2 was about 10 CPU days; the time on the Cray XMP was 82 hours. Both programs ran as single-processor programs on any available CPU of the respective machines; the ability of either computer to run in parallel on multiple CPU's was not used. The time needed to test Fn, for n in the range 10 through 20, is just slightly more than four times the time needed to test Fn-: The number of multiplications doubles in incrementing n, and the time required for each multiplication doubles, being dependent almost entirely on the length of the operands. Our programs would thus determine the character of F22, which is now the smallest Fermat number of unknown character, in a little more than 16 times the time needed for our computation on F20. The table below summarizes what is now known about the Fermat numbers for n less than or equal to 22. A status list for larger n appears in [1]. This computation, roughly one million squarings modulo a one million bit number, would be impossible to do even on supercomputers without fast Fourier transform techniques for integer multiplication. Since one reason for performing this computation was to verify hardware reliablity and not to minimize the execution time, the program was written entirely in Cray Fortran and called Cray library functions for the FFT's. The program itself was very simple and only about 200 lines long, much of which was used for checkpointing and restarting the program. The program was called into execution every time the Cray-2 was restarted, and so Received April 6, 1987; revised June 5, 1987. 1980 Mathematics Subject Classification (1985 Revnsion). Primary IlYII, llA51, 11-04. (D1988 American Mathematical Society 0025-5718/88 $1.00 + $.25 per page
TL;DR: An error correction circuit comprises a plurality of Galois body operation units coupled in cascade through a bus but operated in parallel as discussed by the authors, each of the units includes a Galois BCH multiplication and addition circuits.
Abstract: An error correction circuit comprises a plurality of Galois body operation units coupled in cascade through a bus but operated in parallel. Each of the units includes a Galois body multiplication circuit, a Galois body addition circuit and a plurality of registers, thereby generating and decoding a BCH code.
TL;DR: In this paper, a primitive data processing operation is performed on the first and second stochastic signals to produce an output signal that may be converted back into non-stochastic form.
Abstract: A stochastic data processing technique in which a conversion signal is generated representing a pseudorandom sequence of numbers. The conversion signal is then used to convert input signals representing a pair of input operands into respective first and second stochastic signals. The same conversion signal is used to encode both operands. A primitive data processing operation is then performed on the first and second stochastic signals to produce an output signal that may be converted back into nonstochastic form. The primitive data processing operations include MAX, MIN, absolute difference, arithmetic mean and multiplication. For multiplication, one of the stochastic signals is subjected to a predetermined delay.
TL;DR: In this paper, necessary and sufficient conditions for a multiplication module to be distributive were proved and proved. But they did not consider the case of a single multiplication module with two multiplication modules.
TL;DR: In this article, a division apparatus for executing division operation by means of a convergence algorithm is described, the apparatus including a ROM having a table of inverse values stored therein which respectively correspond to all possible values of a first approximation for a divisor.
Abstract: A division apparatus for executing division operation by means of a convergence algorithm, the apparatus including a ROM having a table of inverse values stored therein which respectively correspond to all possible values of a first approximation for a divisor. Division is executed by reading out a corrsponding inverse value from the ROM, and performing a small number of successive multiplication operations on the divided and this inverse value, in conjunction with simple addition or subtraction operations which are determined by the specific form of the alogorithm which is used. A considerable increase in division speed is attainable, with a simple system configuration.
TL;DR: Study group XII was requested to assess the performance of digital circuit multiplication equipment (DCME), and the authors discuss DCME terminology, applications, testing alternatives, and issues that need to be addressed.
Abstract: The CCITT expert group on speech quality, formed in part to derive methodologies for evaluating new speech technologies, is emphasizing the assessment of digital circuit multiplication and packetized voice systems. Study group XII was requested to assess the performance of digital circuit multiplication equipment (DCME). The authors discuss DCME terminology, applications, testing alternatives, and issues that need to be addressed. >
TL;DR: Methods are presented for finding reductions between the computations of certain arithmetic functions that preserve asymptotic Boolean complexities (circuit depth or size) and it is shown that, with respect to depth and size simultaneously, multiplication is reducible to any nonlinear and division to anynonpolynomial algebraic function.
Abstract: Methods are presented for finding reductions between the computations of certain arithmetic functions that preserve asymptotic Boolean complexities (circuit depth or size). They can be used to show, for example, that all nonlinear algebraic functions are as difficult as integer multiplication with respect to circuit size. As a consequence, any lower or upper bound (e.g., O(n log n log log n)) for one of them applies to the whole class. It is also shown that, with respect to depth and size simultaneously, multiplication is reducible to any nonlinear and division to any nonpolynomial algebraic function.
TL;DR: A systolic algorithm is presented which allows the parallel execution of iterative methods for solving systems of linear equations on a processor array based on a repeated matrix-vector multiplication.
Abstract: A systolic algorithm is presented which allows the parallel execution of iterative methods for solving systems of linear equations on a processor array. These methods are based on a repeated matrix-vector multiplication. In order to achieve an efficient realization on VLSI circuits special regard is given to the sparse property of the system matrix which is found in many applications. The arising transportation problem is solved by a two-dimensional systolic sorting procedure which determines the array structure and the time complexity of one matrix-vector multiplication. Therefore, the solution of a linear system with n equations requires n times the time complexity of the sorting algorithm and an area complexity of O(e) where e denotes the number of the nonzero elements in the system matrix. >
TL;DR: In this paper, a direct current level correction circuit, a sink chip detection circuit, an integrator, a subtractor, a low order bit integration circuit, and a second coefficient multiplication circuit are used to remove line flicker.
Abstract: PURPOSE:To freely set clamp precision at the time of applying PWM, and to remove line flicker by providing a direct current level correction circuit, a sink chip detection circuit, a subtracter, a first coefficient multiplication circuit, a low order bit integration circuit, a second coefficient multiplication circuit, a pulse width modulation circuit, and an integrator. CONSTITUTION:The direct current level of an input analog video signal is corrected by the output of the integrator 508 by the correction circuit 501, and the input analog video signal is A/D-converted 502. The sink chip level of the converted output is detected by the detector 503, and a reference level is subtracted 504 from it. The output of the subtracter is multiplied by a prescribed coefficient by the coefficient multiplication circuit 505, and the portion of the signal of the prescribed number of bits from the least significant bit is integrated by the low order bit integration circuit 101 before this portion is rejected by the coefficient multiplication circuit 505, and a carrier signal is added to a high order bit, and the output of the integration circuit is multiplied by the prescribed coefficient by the coefficient multiplication circuit 506. As the result, the clamp precision can be set freely and correctly independently of clock frequency at the time of the PWM. If RWM input is fixed to '0' at the time of using a steady state, the direct current level of the output of the correction circuit never varies, and the flicker is never caused.