Top 195 papers published in the topic of Multiplication in 2000

Showing papers on "Multiplication published in 2000"

Journal Article•

All Pairs Shortest Paths using Bridging Sets and Rectangular Matrix Multiplication

[...]

Uri Zwick¹•Institutions (1)

01 Jan 2000-Electronic Colloquium on Computational Complexity

TL;DR: In this paper, the APSP problem for weighted directed graphs was solved in O(n2+μ) time, where μ satisfies the equation ω(1, μ, 1) = 1 + 2μ and ω is the exponent of the multiplication of an n × nμ matrix by an nμ × n matrix.

...read moreread less

Abstract: We present two new algorithms for solving the All Pairs Shortest Paths (APSP) problem for weighted directed graphs. Both algorithms use fast matrix multiplication algorithms.The first algorithm solves the APSP problem for weighted directed graphs in which the edge weights are integers of small absolute value in O(n2+μ) time, where μ satisfies the equation ω(1, μ, 1) = 1 + 2μ and ω(1, μ, 1) is the exponent of the multiplication of an n × nμ matrix by an nμ × n matrix. Currently, the best available bounds on ω(1, μ, 1), obtained by Coppersmith, imply that μ 0 is an error parameter and W is the largest edge weight in the graph, after the edge weights are scaled so that the smallest non-zero edge weight in the graph is 1. It returns estimates of all the distances in the graph with a stretch of at most 1 + ϵ. Corresponding paths can also be found efficiently.

...read moreread less

253 citations

Journal Article•

A scalable and unified multiplier architecture for finite fields GF(p) and GF(2m)

[...]

Erkay Savas¹, Alexandre F. Tenca¹, Çetin Kaya Koç¹•Institutions (1)

Oregon State University¹

01 Jan 2000-Lecture Notes in Computer Science

TL;DR: In this article, a scalable and unified architecture for a Montgomery multiplication module which operates in both types of finite fields GF(p) and GF(2 m ) is described. But the authors do not consider the concurrency in the Montgomery multiplication operation by employing a pipelining design methodology.

...read moreread less

Abstract: We describe a scalable and unified architecture for a Montgomery multiplication module which operates in both types of finite fields GF(p) and GF(2 m ) The unified architecture requires only slightly more area than that of the multiplier architecture for the field GF(p). The multiplier is scalable, which means that a fixed-area multiplication module can handle operands of any size, and also, the wordsize can be selected based on the area and performance requirements. We utilize the concurrency in the Montgomery multiplication operation by employing a pipelining design methodology. The upper limit on the precision of the scalable and unified Montgomery multiplier is dictated only by the available memory to store the operands and internal results, and the module is capable of performing infinite-precision Montgomery multiplication in both types of finite fields.

...read moreread less

154 citations

Book Chapter•10.1007/3-540-44499-8_2•

Implementation of Elliptic Curve Cryptographic Coprocessor over GF(2m) on an FPGA

[...]

Souichi Okada¹, Naoya Torii¹, Kouichi Itoh¹, Masahiko Takenaka¹•Institutions (1)

Fujitsu¹

17 Aug 2000

TL;DR: For speeding-up an elliptic scalar multiplication, this work developed a novel configuration of a multiplier over GF(2m), which enables the multiplication of any bit length by using the data conversion method.

...read moreread less

Abstract: We describe the implementation of an elliptic curve cryptographic (ECC) coprocessor over GF(2m) on an FPGA and also the result of simulations evaluating its LSI implementation. This coprocessor is suitable for server systems that require efficient ECC operations for various parameters. For speeding-up an elliptic scalar multiplication, we developed a novel configuration of a multiplier over GF(2m), which enables the multiplication of any bit length by using our data conversion method. The FPGA implementation of the coprocessor with our multiplier, operating at 3 MHz, takes 80 ms for 163-bit elliptic scalar multiplication on a pesudo-random curve and takes 45 ms on a Koblitz curve. The 0.25 µm ASIC implementation of the coprocessor, operating at 66 MHz and having a hardware size of 165 Kgates, would take 1.1 ms for 163-bit elliptic scalar multiplication on a pesudo-random curve and would take 0.65 ms on a Koblitz curve.

...read moreread less

105 citations

Book•

Teaching Number Sense

[...]

Julia Anghileri

1 Jan 2000

TL;DR: In this article, the empty number line is used to make sense of numbers and counting and coming to know numbers, and the system of symbols is used for teaching and teaching approaches.

...read moreread less

Abstract: 1. Making sense of numbers 2. Counting and coming to know numbers 3. Towards a system of symbols. 4. Addition and subtraction 5. The empty number line 6. Multiplication and Division 7. Written calculations 8. Teaching approaches.

...read moreread less

98 citations

Journal Article•

Implementation of Elliptic curve cryptographic coprocessor over GF(2m) on an FPGA

[...]

Souichi Okada¹, Naoya Torii¹, Kouichi Itoh¹, Masahiko Takenaka¹•Institutions (1)

Fujitsu¹

01 Jan 2000-Lecture Notes in Computer Science

TL;DR: In this article, the authors describe the implementation of an elliptic curve cryptographic (ECC) coprocessor over GF(2 m ) on an FPGA and also the result of simulations evaluating its LSI implementation.

...read moreread less

Abstract: We describe the implementation of an elliptic curve cryptographic (ECC) coprocessor over GF(2 m ) on an FPGA and also the result of simulations evaluating its LSI implementation This coprocessor is suitable for server systems that require efficient ECC operations for various parameters For speeding-up an elliptic scalar multiplication, we developed a novel configuration of a multiplier over GF(2 m ) which enables the multiplication of any bit length by using our data conversion method The FPGA implementation of the coprocessor with our multiplier, operating at 3 MHz, takes 80 ms for 163-bit elliptic scalar multiplication on a pesudo-random curve and takes 45 ms on a Koblitz curve The 025 pm ASIC implementation of the coprocessor, operating at 66 MHz and having a hardware size of 165 Kgates, would take 11 ms for 163-bit elliptic scalar multiplication on a pesudo-random curve and would take 065 ms on a Koblitz curve

...read moreread less

81 citations

Journal Article•10.3758/BF03213008•

Adults' strategy choices for simple addition: Effects of retrieval interference

[...]

Jamie I. D. Campbell¹, Jennifer C. Timm¹•Institutions (1)

University of Saskatchewan¹

01 Dec 2000-Psychonomic Bulletin & Review

TL;DR: It is demonstrated that adults’ use of procedural strategies for simple addition is substantially influenced by retrieval interference.

...read moreread less

Abstract: Simple addition (e.g., 3 + 2, 7 + 9) may be performed by direct memory retrieval or by such procedures as counting or transformation. The distribution of associations (DOA) model of strategy choice (Siegler, 1988) predicts that procedure use should increase as retrieval interference increases. To test this, 100 undergraduates performed simple addition problems, either after blocks of simple multiplication (high-interference context) or after blocks of simple division problems (low-interference context). Addition took longer and was more error prone after multiplication; in particular, there were more multiplication confusion errors on the relatively easy, small-number addition problems (e.g., 3 + 2 = 6, 4 + 3 = 12), but not on the more difficult, large-number additions. Consistent with the DOA, participants reported greater use of procedures for addition after multiplication, but more so for small addition problems. The findings demonstrate that adults’ use of procedural strategies for simple addition is substantially influenced by retrieval interference.

...read moreread less

77 citations

Proceedings Article•10.1109/ISCAS.2000.857166•

Stochastic pulse coded arithmetic

[...]

Sergio Toral¹, Jose M. Quero¹, Leopoldo G. Franquelo¹•Institutions (1)

University of Seville¹

28 May 2000

TL;DR: A division circuit and a square-root circuit are presented that extend traditional stochastic algebra and are able to process analog input signals with a simple and complete processing system.

...read moreread less

Abstract: Among the different pulse codification techniques, stochastic pulse codification has its own arithmetic based on the similarity between Boolean algebra and statistical algebra. Summation and multiplication are the two basic arithmetic operations treated in depth in the literature. In this paper we present two digital stochastic circuits that extend traditional stochastic algebra: a division circuit and a square-root circuit, and the interfaces between the analog and stochastic domain. As a result, we are able to process analog input signals with a simple and complete processing system. These circuits can be implemented in low-cost and low-power digital programmable devices.

...read moreread less

72 citations

Patent•

Pipelined linear array of processor elements for performing matrix computations

[...]

Alan Joel Greenberger¹•Institutions (1)

Agere Systems¹

21 Apr 2000

TL;DR: A pipelined linear array of processor elements (PEs) for matrix computations in an efficient manner is presented in this article, where each PE includes arithmetic circuitry for performing multiply, combine and accumulate operations and a register file for storing inputs and outputs of the arithmetic circuitry.

...read moreread less

Abstract: A pipelined linear array of processor elements (PEs) for performing matrix computations in an efficient manner. The linear array generally includes a head PE and a set of regular PEs, the head PE being a functional superset of the regular PE, with interconnections between nearest neighbor PEs in the array and a feedback path from a non-neighbor regular PE back to the head PE. Each PE includes arithmetic circuitry for performing multiply, combine and accumulate operations, and a register file for storing inputs and outputs of the arithmetic circuitry. The head PE further includes a non-linear function generator. Each PE is pipelined such that the latency for an arithmetic operation to complete is a multiple of the period with which new operations can be initiated. A Very Large Instruction Word (VLIW) program or other type of program may be used to control the array. The array is particularly efficient at performing complex matrix operations, such as, e.g., the solution of a set of linear equations, matrix inversion, matrix-matrix multiplication, and computation of covariance and cross correlation.

...read moreread less

69 citations

Journal Article•10.1109/12.863045•

Look-up table-based large finite field multiplication in memory constrained cryptosystems

[...]

M.A. Hasan¹•Institutions (1)

University of Waterloo¹

01 Jul 2000-IEEE Transactions on Computers

TL;DR: In this paper, an algorithm for GF(2/sup n/) multiplication is proposed which can alleviate the problem of large memory space or do not fully utilize the resources of the processor on which the software is executed.

...read moreread less

Abstract: Many cryptographic systems use multiplication in the finite field GF(2/sup n/) for their underlying computations. In the recent past, a number of look-up table-based algorithms have been proposed for the software implementation of GF(2/sup n/) multiplication. Look-up table-based algorithms can provide speed advantages, but they either require a large memory space or do not fully utilize the resources of the processor on which the software is executed. In this work, an algorithm for GF(2/sup n/) multiplication is proposed which can alleviate this problem. In each iteration of the proposed algorithm, a group of bits of one of the input operands are examined and two look-up tables are accessed. The groupsize determines the table sizes, but does not affect the utilization of the processor resources. It can be used for both software and hardware realizations and is particularly suitable for implementations in memory constrained environment, such as smart cards and embedded cryptosystems.

...read moreread less

64 citations

Book Chapter•10.1007/3-540-44499-8_4•

Fast Implementation of Elliptic Curve Defined over GF(pm) on CalmRISC with MAC2424 Coprocessor

[...]

Jae Wook Chung¹, Sang Gyoo Sim¹, Pil Joong Lee¹•Institutions (1)

Pohang University of Science and Technology¹

17 Aug 2000

TL;DR: A fast finite field and elliptic curve (EC) algorithms useful for embedding cryptographic functions on high performance device such that most instructions take just one cycle are proposed.

...read moreread less

Abstract: In this paper, we propose fast finite field and elliptic curve (EC) algorithms useful for embedding cryptographic functions on high performance device such that most instructions take just one cycle. In such case, the integer multiplications and additions have the same computational cost so that the computational cost analyses that were previously done in traditional manner may be invalid and in some cases the new algorithms should be introduced for fast computation. In our implementation, column major method for field multiplication and BP inversion algorithm are used for fast field arithmetic, and mixed coordinates method is used for efficient EC exponentiation. We give here analyses on various algorithms that are useful for implementing EC exponentiation on CalmRISC microcontroller with MAC2424 coprocessor, as well as new exact analyses on BP (Bailey-Paar) inversion algorithm and EC exponentiation. Using techniques shown in this paper, we implemented EC exponentiation for various coordinate systems and the best result took 122ms, assuming 50ns clock cycle.

...read moreread less

54 citations

Journal Article•10.1109/92.820767•

Two systolic architectures for modular multiplication

[...]

Wei-Chang Tsai¹, C.B. Shung¹, Sheng-Jyh Wang¹•Institutions (1)

National Chiao Tung University¹

01 Feb 2000-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The authors present two systolic architectures to speed up the computation of modular multiplication in RSA cryptosystems by eliminating the one-clock-cycle gap between iterations by pairing off the double-layer architecture.

...read moreread less

Abstract: The authors present two systolic architectures to speed up the computation of modular multiplication in RSA cryptosystems. In the double-layer architecture, the main operation of Montgomery's algorithm is partitioned into two parallel operations after using the precomputation of the quotient bit. In the non-interlaced architecture, we eliminate the one-clock-cycle gap between iterations by pairing off the double-layer architecture. We compare our architectures with some previously proposed Montgomery-based systolic architectures, on the basis of both modular multiplication and modular exponentiation. The comparisons indicate that our architectures offer the highest speed, lower hardware complexity, and lower power consumption.

...read moreread less

Proceedings Article•10.1109/EURMIC.2000.874640•

Constant coefficient multiplication in FPGA structures

[...]

Kazimierz Wiatr, Ernest Jamro

5 Sep 2000

TL;DR: Investigates different architectures implementing bit-parallel constant-coefficient multiplication in FPGA structures, and a novel algorithm for the conversion from two's-complement to CSD representation is presented.

...read moreread less

Abstract: Investigates different architectures implementing bit-parallel constant-coefficient multiplication in FPGA structures. First, multiplierless multiplication (MM) architectures employing canonic sign digit (CSD) and sub-structure sharing methods are addressed, and a novel algorithm for the conversion from two's-complement to CSD representation is presented. In the second part of this paper, lookup table-based multiplication (LM) is investigated. Correspondingly, the usage of different memory modules and finding the optimal combination of the memory and adders are considered. The LM architecture also considers reduction of the address width for each memory cell and the possibility of memory sub-structure sharing. Finally, implementation results for the Xilinx XC4000 and Virtex families are presented. As a result, MM generally surpasses the LM architecture. However, the actual choice between these two architectures is coefficient- and input parameter-dependent.

...read moreread less

Patent•

Method and apparatus for dot product calculation

[...]

Zhengou Gu¹•Institutions (1)

Texas Instruments¹

23 Feb 2000

TL;DR: In this paper, a dot product operator (30) uses adder trees (10) of L-1adders and no multiplication circuits, where L is the length of the parallel dot product operators.

...read moreread less

Abstract: A dot product operator (30) uses adder trees (10) of L-1 adders and no multiplication circuits, where L is the length of the parallel dot product operator. Exclusive-or gates 12 provide the function of multiplication by ±1, with the carry-in ports of adders (14, 16, 18, 20, 32, 34, 38, 44) being used to form the two's complement, resulting in an extremely efficient design in terms of area and power.

...read moreread less

Journal Article•10.1080/00927870008826980•

Some remarks on multiplication ideals, ii

[...]

D. D. Anderson¹•Institutions (1)

University of Iowa¹

01 Jan 2000-Communications in Algebra

TL;DR: In this article, it was shown that if S is a commutative R-algebra and ψ: M→an R-module homomorphism, then Sψ(M) is a multiplication S-module.

...read moreread less

Abstract: Let R bea commutative ring with identity. An R-module (ideal of R) A is called a multiplication module (ideal) if for each submodule N of A there exists an ideal I of R with N = I A. We give several characterizations of multiplication modules. Using the method of idealization we show how to reduce questions concerning multiplication modules to multiplication ideals. For example, we show that if S is a commutative R-algebra and ψ: M→an R-module homomorphism where M is a multiplication R-module and N is an S-module, then Sψ(M) is a multiplication S-module.

...read moreread less

Patent•

Apparatus and method for performing multiplication operations

[...]

Alexander Edward Nancekievill

27 Dec 2000

TL;DR: In this paper, an instruction decoder is provided which is responsive to a multiply instruction to control the multiplying circuit to generate a multiplication result for the computation M×N, where M and N are W bit data words.

...read moreread less

Abstract: The present invention provides an apparatus and method for processing data using a multiplying circuit for performing a multiplication of a W/2 bit data value by a W bit data value. An instruction decoder is provided which is responsive to a multiply instruction to control the multiplying circuit to generate a multiplication result for the computation M×N, where M and N are W bit data words. The multiplying circuit is arranged to execute a first operation in the which the data word N is multiplied by the most significant W/2 bits of the data word M to generate a first intermediate result having 3W/2 bits, and to then execute a second operation in which the data word N is multiplied by the least significant W/2 bits of the data word M to generate a second intermediate result having 3W/2 bits. The first intermediate result is shifted by W/2 with respect to the second intermediate result and added to the second intermediate result to generate the multiplication result. By performing the two parts of the multiplication in reverse order to the conventional approach, it has been found that the complexity of the circuitry can be reduced, and a reduction in power consumption can be achieved.

...read moreread less

Proceedings Article•10.1109/ICPP.2000.876144•

Matrix-matrix multiplication on heterogeneous platforms

[...]

Olivier Beaumont¹, Vincent Boudet, Fabrice Rastello, Yves Robert•Institutions (1)

École normale supérieure de Lyon¹

21 Aug 2000

TL;DR: This paper addresses the issue of implementing matrix-matrix multiplication on heterogeneous platforms with a (polynomial) column-based heuristic, which turns out to be very satisfactory: the theoretical performance guarantee for the heuristic is derived, and its practical usefulness is assessed through MPI experiments.

...read moreread less

Abstract: In this paper, we address the issue of implementing matrix-matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations, and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different-speed resources while minimizing the communication volume. We formally state this problem and prove its NP-completeness. Next we introduce a (polynomial) column-based heuristic, which turns out to be very satisfactory: we derive a theoretical performance guarantee for the heuristic, and we assess its practical usefulness through MPI experiments.

...read moreread less

Journal Article•10.1006/JMAA.2000.6794•

Noncommutative and nonassociative pseudo-analysis and its applications on nonlinear partial differential equations

[...]

Endre Pap, Doretta Vivona¹•Institutions (1)

Sapienza University of Rome¹

15 Jun 2000-Journal of Mathematical Analysis and Applications

TL;DR: In this paper, the pseudo-linear principle is applied for solving nonlinear equations ODE, PDE, difference equations, etc. using pseudo-addition and pseudo-multiplica.

...read moreread less

Patent•

Multi-dimensional galois field multiplier

[...]

David Hoyle¹•Institutions (1)

Texas Instruments¹

18 Feb 2000

TL;DR: In this article, an implementation of a multi-dimensional Galois field multiplier and a method of Galois Field Multi-dimensional multiplication which are able to support many communication standards having various symbol sizes, different GFs, and different primitive polynomials, in a cost-efficient manner is disclosed.

...read moreread less

Abstract: An implementation of a multi-dimensional Galois field multiplier and a method of Galois field multi-dimensional multiplication which are able to support many communication standards having various symbol sizes, different GFs, and different primitive polynomials, in a cost-efficient manner is disclosed. The key to allow a single implementation to perform for all different GF sizes is to align the input data such that the Galois field symbols of the operands are aligned to the left most significant bit (MSB) position of the input data field. Similarly, the primitive polynomial used to create a selected Galois field is aligned to the left MSB position. A polynomial multiply is performed. The product polynomial is then conditionally divided by the primitive polynomial starting with the most significant bit, the condition being if the left most bit of the product is a 1. In other words, if the product polynomial has an MSB of 1, then divide the product with the primitive polynomial. Perform this step until the MSB is 0. In addition, for fields smaller than a maximum size Galois field, the sequence of conditional divisions is further conditioned with a predetermined mask in dependence upon the size of the GF. The resultant product is aligned to the left MSB.

...read moreread less

Journal Article•10.1016/S0024-3795(00)00232-9•

Symmetric centrosymmetric matrix-vector multiplication

[...]

Aaron Melman¹•Institutions (1)

University of San Francisco¹

15 Nov 2000-Linear Algebra and its Applications

TL;DR: A method for the multiplication of an arbitrary vector by a symmetric centrosymmetric matrix, requiring 5 4 n 2 + O (n) floating-point operations, rather than the 2n 2 operations needed in the case of an arbitrarily matrix.

...read moreread less

Journal Article•10.1016/S0732-3123(00)00050-X•

Long-term effects of building on informal knowledge in a complex content domain: the case of multiplication of fractions

[...]

Nancy K. Mack¹•Institutions (1)

National Louis University¹

01 Jul 2000-The Journal of Mathematical Behavior

TL;DR: In this paper, four students participated in a 2-year study (fifth and sixth grades) that examined the development of their understanding of multiplication of fractions and found that students consistently drew on their informal knowledge of partitioning to reconceptualize and partition units to solve problems involving multiplication of fraction in meaningful ways.

...read moreread less

Patent•

Clock control circuit

[...]

Atsushi Fujita¹•Institutions (1)

Fujitsu¹

30 Mar 2000

TL;DR: In this article, a clock state control circuit provides a control to, stop the output of clock to the outside, switch the clock to a clock other than those output by the PLL oscillation circuit, change the multiplication factor in the pll oscillation circuits, switch a clock to clock output after the pLL output clock is stabilized, and restart output of the clock.

...read moreread less

Abstract: In a clock control circuit, a multiplication factor setting unit outputs a multiplication factor. A buffer circuit holds a previous multiplication factor and the multiplication factor output by the multiplication factor setting unit and compares the two multiplication factors. When the multiplication factors are different from each other, a clock state control circuit provides a control to, stop the output of clock to the outside, switch the clock to a clock other than those output by the PLL oscillation circuit, change the multiplication factor in the PLL oscillation circuit, switch the clock to clock output by the PLL oscillation circuit after the PLL output clock is stabilized, and restart output of the clock to the outside.

...read moreread less

Book Chapter•10.1007/3-540-44706-7_3•

The Software-Oriented Stream Cipher SSC2

[...]

Muxiang Zhang, Christopher R. Carroll, Agnes Hui Chan¹•Institutions (1)

Northeastern University¹

10 Apr 2000

TL;DR: Theoretical analysis demonstrates that the keystream sequences generated by SSC2 have long period, large linear complexity, and good statistical distribution.

...read moreread less

Abstract: SSC2 is a fast software stream cipher designed for wireless handsets with limited computational capabilities. It supports various private key sizes from 4 bytes to 16 bytes. All operations in SSC2 are word-oriented, no complex operations such as multiplication, division, and exponentiation are involved. SSC2 has a very compact structure that makes it easy to implement on 8-,16-, and 32-bit processors. Theoretical analysis demonstrates that the keystream sequences generated by SSC2 have long period, large linear complexity, and good statistical distribution.

...read moreread less

Proceedings Article•10.1109/MWSCAS.2000.951694•

Charge-mode parallel architecture for matrix-vector multiplication

[...]

Roman Genov¹, Gert Cauwenberghs•Institutions (1)

Johns Hopkins University¹

8 Aug 2000

TL;DR: An internally analog, externally digital architecture for matrix-vector multiplication is presented, which allows for high data throughput and minimal latency, and is tailored for high-density and low power VLSI implementation.

...read moreread less

Abstract: An internally analog, externally digital architecture for matrix-vector multiplication is presented. Fully parallel processing allows for high data throughput and minimal latency. The analog architecture incorporates an array of charge-mode analog computational cells with dynamic storage and row-parallel flash analog-to-digital converters (ADC). Each of the cells includes a dynamic storage element and a charge injection device computing binary inner product of two arguments. The matrix elements are stored in the array of computational cells in bit-parallel fashion, and the input vector is presented bit-serially. Digital post-processing is then performed on the ADC outputs to construct the resulting product with precision higher than that of each conversion. The analog architecture is tailored for high-density and low power VLSI implementation, and matrix dimensions of 128/spl times/512 and ADC resolution of 6 bits for an overall resolution in excess of 8 bits are feasible on a 3 mm/spl times/3 mm chip in standard CMOS 0.5 /spl mu/m technology.

...read moreread less

Proceedings Article•10.1109/LEOS.2000.890647•

Avalanche photodiodes with an impact-ionization-engineered multiplication region

[...]

Shuling Wang, X. Sun, Xiaoguang Zheng, Archie L. Holmes, Joe C. Campbell, Ping Yuan - Show less +2 more

13 Nov 2000

TL;DR: In this article, a new multiplication region structure that achieves very low multiplication noise by impact ionization engineering (I/sup 2/E), which utilizes heterojunctions to achieve greater localization of impact ionisation than spatially uniform structures.

...read moreread less

Abstract: The avalanche photodiode (APD) is frequently the photodetector of choice for high-bit-rate, long-haul fiber optic communications, owing to its internal gain, which provides a sensitivity margin compared to PIN photodiodes. Since the multiplication region of an APD plays a critical role in determining the gain, the multiplication noise, and the gain-bandwidth product, numerous research programs have focused on optimizing the multiplication region in order to improve the APD performance. We describe a new multiplication region structure that achieves very low multiplication noise by impact ionization engineering (I/sup 2/E), which utilizes heterojunctions to achieve greater localization of impact ionization than spatially uniform structures. By placing thin layers with relatively low threshold energy (multiplication layer) on each side of a region with higher ionization coefficients (the separation layer), impact ionization is enhanced at the edges in the twin multiplication layers and is suppressed in the center, where the carriers are energized in transit.

...read moreread less

Proceedings Article•10.1109/ASAP.2000.862384•

Implementing 1,024-bit RSA exponentiation on a 32-bit processor core

[...]

Braden J. Phillips¹, N. Burgess²•Institutions (2)

University of Adelaide¹, Cardiff University²

10 Jul 2000

TL;DR: This paper describes how long-wordlength (1024-bit) modular exponentiation may be implemented on a standard 32-bit microprocessor core with a total execution lime of under 1 second.

...read moreread less

Abstract: This paper describes how long-wordlength (1024-bit) modular exponentiation may be implemented on a standard 32-bit microprocessor core with a total execution lime of under 1 second. The design does not use a long-wordlength arithmetic co-processor. Instead all arithmetic operations are reduced to 32-bit additions, subtractions and binary shifts, and the processor is augmented with a small hardware enhancement to significantly accelerate accumulation of shifted multi-precision numbers. Target performance is achieved by trading fast arithmetic hardware for extra RAM, to facilitate pre-computation of digit multiples and powers. Signed sliding window algorithms are introduced for exponentiation, multiplication and reduction operations, and attention is paid to the integration of enhanced security features such as blinding and verification.

...read moreread less

Journal Article•10.1109/4.839928•

A low logic depth complex multiplier using distributed arithmetic

[...]

A. Berkeman¹, Viktor Öwall, Mats Torkelson•Institutions (1)

Lund University¹

01 Apr 2000-IEEE Journal of Solid-state Circuits

TL;DR: A combinatorial complex multiplier has been designed for use in a pipelined fast Fourier transform processor and a new architecture based on distributed arithmetic, Wallace-trees, and carry-lookahead adders has been developed.

...read moreread less

Abstract: A combinatorial complex multiplier has been designed for use in a pipelined fast Fourier transform processor. The performance in terms of throughput of the processor is limited by the multiplication. Therefore, the multiplier is optimized to make the input-to-output delay as short as possible. A new architecture based on distributed arithmetic, Wallace-trees, and carry-lookahead adders has been developed. The multiplier has been fabricated using standard cells in a 0.5-/spl mu/m process and verified for functionality, speed, and power consumption. Running at 40 MHz, a multiplier with input wordlengths of 16+16 times 10+10 bits consumes 54% less power compared to an distributed arithmetic array multiplier fabricated under equal conditions.

...read moreread less

Patent•

Scheme for arithmetic operations in finite field and group operations over elliptic curves realizing improved computational speed

[...]

Kazumaro Aoki¹, Kazuo Ohta¹•Institutions (1)

Nippon Telegraph and Telephone¹

18 Jan 2000

TL;DR: In this paper, a scheme for arithmetic operations in finite field and group operations over elliptic curves capable of realizing a very fast implementation was proposed, where the multiplicative inverse calculation and multiplication in the finite field GF(22n) can be realized as combinations of multiplications, additions, and a multiplier in the subfield GF(2n).

...read moreread less

Abstract: A scheme for arithmetic operations in finite field and group operations over elliptic curves capable of realizing a very fast implementation. According to this scheme, by using a normal basis [α α+1], the multiplicative inverse calculation and the multiplication in the finite field GF(22n) can be realized as combinations of multiplications, additions and a multiplicative inverse calculation in the subfield GF(2n). Also, by using a standard basis [1 α], the multiplication, the square calculation, and the multiplicative inverse calculation in the finite field GF(22n) can be realized as combinations of multiplications, additions and a multiplicative inverse calculation in the subfield GF(2n). These arithmetic operations can be utilized for calculating rational expressions expressing group operations over elliptic curves that are used in information security techniques such as elliptic curve cryptosystems.

...read moreread less

Proceedings Article•10.1109/ITCC.2000.844199•

Implementation image data convolutions operations in FPGA reconfigurable structures for real-time vision systems

[...]

Kazimierz Wiatr, Ernest Jamro

27 Mar 2000

TL;DR: In this paper different architectures for real time image constant coefficients convolutions are considered and the choice between these architectures depends on given coefficients values, however in most cases the MC preferable.

...read moreread less

Abstract: In this paper different architectures for real time image constant coefficients convolutions are considered. Accordingly, look-up-table (LUT) based multiplication/convolution, LUT based distributed arithmetic (DA) convolution and multiplierless convolution (MC) implementations into FPGA structures has been investigated. In one result, the choice between these architectures depends on given coefficients values, however in most cases the MC preferable. Furthermore the change of coefficient values in real-time systems is also considered. This work is a contribution to worldwide intense research on developing reconfigurable and user dedicated custom computing machines (CCM).

...read moreread less

Patent•

Hardware implementation for modular multiplication using a plurality of almost entirely identical processor elements

[...]

Chin-Long Chen¹, Vincenzo Condorelli¹, Camil Fayad¹•Institutions (1)

IBM¹

19 Dec 2000

TL;DR: In this article, the modular exponentiation function used in public key encryption and decryption systems is implemented in a standalone engine having at its core modular multiplication circuits which operate in two phases which share overlapping hardware structures.

...read moreread less

Abstract: The modular exponentiation function used in public key encryption and decryption systems is implemented in a standalone engine having at its core modular multiplication circuits which operate in two phases which share overlapping hardware structures. The partitioning of large arrays in the hardware structure, for multiplication and addition, into smaller structures results in a multiplier design comprising a series of nearly identical processing elements linked together in a chained fashion. As a result of the two-phase operation and the chaining together of partitioned processing elements, the overall structure is operable in a pipelined fashion to improve throughput and speed. The chained processing elements are constructed so as to provide a partitionable chain with separate parts for processing factors of the modulus. In this mode, the system is particularly useful for exploiting characteristics of the Chinese Remainder Theorem to perform rapid exponentiation operations. A checksum mechanism is also provided to insure accurate operation without impacting speed and without significantly increasing complexity. While the present disclosure is directed to a complex system which includes a number of features, the present application is particularly directed to the structure and linking of a plurality of almost identical processing elements.

...read moreread less

Journal Article•10.1109/12.859536•

On the design of IEEE compliant floating point units

[...]

Guy Even¹, Wolfgang J. Paul•Institutions (1)

Tel Aviv University¹

01 May 2000-IEEE Transactions on Computers

TL;DR: To the best of the knowledge, this design is the first publication that deals with detecting exceptions and trapped overflow and underflow exceptions as an integral part of the rounding unit in a floating point unit.

...read moreread less

Abstract: Engineering design methodology recommends designing a system as follows: Start with an unambiguous specification, partition the system into blocks, specify the functionality of each block, design each block separately, and glue the blocks together. Verifying the correctness of an implementation then reduces to a local verification procedure. We apply this methodology for designing a provably correct IEEE rounding unit that can be used for various operations, such as addition and multiplication. First, we provide a mathematical and, hopefully, unambiguous definition of the IEEE Standard which specifies the functionality. We give explicit and concise rules for gluing the rounding unit with a floating-point adder and multiplier. We then present floating-point addition and multiplication algorithms that use the rounding unit. To the best of our knowledge, our design is the first publication that deals with detecting exceptions and trapped overflow and underflow exceptions as an integral part of the rounding unit in a floating point unit. Our abstraction level avoids bit-level representations and arguments to help clarify the functionality of the algorithm.

...read moreread less

...

Expand