Symposium on Computer Arithmetic

Conference Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1109/ARITH.1987.6158699•

Fast area-efficient VLSI adders

[...]

Tackdon Han¹, David A. Carlson¹•Institutions (1)

University of Massachusetts Amherst¹

18 May 1987

TL;DR: A new graph representation for prefix computation is presented that leads to the design of a fast, area-efficient binary adder, and its area is close to known lower bounds on the VLSI area of parallel prefix graphs.

...read moreread less

Abstract: In this paper, we study area-time tradeoffs in VLSI for prefix computation using graph representations of this problem. Since the problem is intimately related to binary addition, the results we obtain lead to the design of area-time efficient VLSI adders. This is a major goal of our work: to design very low latency addition circuitry that is also area efficient. To this end, we present a new graph representation for prefix computation that leads to the design of a fast, area-efficient binary adder. The new graph is a combination of previously known graph representations for prefix computation, and its area is close to known lower bounds on the VLSI area of parallel prefix graphs. Using it, we are able to design VLSI adders having area A = 0(n log n) whose delay time is the lowest possible value, i. e. the fastest possible area-efficient VLSI adder.

...read moreread less

362 citations

Proceedings Article•10.1109/ARITH.1999.762841•

Efficient VLSI implementation of modulo (2/sup n//spl plusmn/1) addition and multiplication

[...]

R. Zimmermann¹•Institutions (1)

ETH Zurich¹

14 Apr 1999

TL;DR: It is shown that the parallel-prefix adder architecture is well suited to realize fast end-around-carry adders used for modulo addition, and a high-performance modulo multiplier-adder for the IDEA block cipher is presented.

...read moreread less

Abstract: New VLSI circuit architectures for addition and multiplication modulo (2/sup n/-1) and (2/sup n/+1) are proposed that allow the implementation of highly efficient combinational and pipelined circuits for modular arithmetic. It is shown that the parallel-prefix adder architecture is well suited to realize fast end-around-carry adders used for modulo addition. Existing modulo multiplier architectures are improved for higher speed and regularity. These allow the use of common multiplier speed-up techniques like Wallace-tree addition and Booth recoding, resulting in the fastest known modulo multipliers. Finally, a high-performance modulo multiplier-adder for the IDEA block cipher is presented. The resulting circuits are compared qualitatively and quantitatively, i.e., in a standard-cell technology, with existing solutions and ordinary integer adders and multipliers.

...read moreread less

321 citations

Proceedings Article•10.1109/ARITH.2003.1207666•

Decimal floating-point: algorism for computers

[...]

M. F. Cowlishaw¹•Institutions (1)

University of Warwick¹

15 Jun 2003

TL;DR: This work introduces a new approach to decimal floating point which not only provides the strict results which are necessary for commercial applications but also meets the constraints and requirements of the IEEE 854 standard.

...read moreread less

Abstract: Decimal arithmetic is the norm in human calculations, and human centric applications must use a decimal floating point arithmetic to achieve the same results. Initial benchmarks indicate that some applications spend 50% to 90% of their time in decimal processing, because software decimal arithmetic suffers a 100/spl times/ to 1000/spl times/ performance penalty over hardware. The need for decimal floating point in hardware is urgent. Existing designs, however, either fail to conform to modern standards or are incompatible with the established rules of decimal arithmetic. We introduce a new approach to decimal floating point which not only provides the strict results which are necessary for commercial applications but also meets the constraints and requirements of the IEEE 854 standard. A hardware implementation of this arithmetic is in development, and it is expected that this will significantly accelerate a wide variety of applications.

...read moreread less

304 citations

Proceedings Article•10.1109/ARITH.2001.930115•

Algorithms for quad-double precision floating point arithmetic

[...]

Yozo Hida¹, Xiaoye S. Li², David H. Bailey²•Institutions (2)

University of California, Berkeley¹, Lawrence Berkeley National Laboratory²

11 Jun 2001

TL;DR: In this paper, the algorithms for various arithmetic operations (including the four basic operations and various algebraic and transcendental operations) on quad-double numbers are presented, implemented in C++.

...read moreread less

Abstract: A quad-double number is an unevaluated sum of four IEEE double precision numbers, capable of representing at least 212 bits of significand. We present the algorithms for various arithmetic operations (including the four basic operations and various algebraic and transcendental operations) on quad-double numbers. The performance of the algorithms, implemented in C++, is also presented.

...read moreread less

279 citations

Proceedings Article•10.1109/ARITH.1993.378085•

Fast implementations of RSA cryptography

[...]

M. Shand, Jean Vuillemin

29 Jun 1993

TL;DR: The authors detail and analyze the critical techniques that may be combined in the design of fast hardware for RSA cryptography: chinese remainders, star chains, Hensel's odd division, carry-save representation, quotient pipelining, and asynchronous carry completion adders.

...read moreread less

Abstract: The authors detail and analyze the critical techniques that may be combined in the design of fast hardware for RSA cryptography: chinese remainders, star chains, Hensel's odd division (also known as Montgomery modular reduction), carry-save representation, quotient pipelining, and asynchronous carry completion adders. A fully operational PAM (programmable active memory) implementation of RSA that combines all of the techniques presented here delivers an RSA secret decryption rate over 600-kb/s for 512-b keys, and 165-kb/s for 1-kb keys. This is an order of magnitude faster than any previously reported running implementation. While the implementation makes full use of the PAM's reconfigurability, it is possible to derive from the (multiple PAM designs) implementation a (single) gate-array specification with estimated size under 100 K gates and speed over 1 Mb/s for RSA 512-b keys. Matching gains in software performance which are also analyzed. >

...read moreread less

265 citations

...

Expand

Year	Papers
2021	19
2020	19
2019	36
2018	21
2017	32
2016	23

Conference Tools

Papers published on a yearly basis

Papers

Fast area-efficient VLSI adders

Efficient VLSI implementation of modulo (2/sup n//spl plusmn/1) addition and multiplication

Decimal floating-point: algorism for computers

Algorithms for quad-double precision floating point arithmetic

Fast implementations of RSA cryptography

Performance Metrics