TL;DR: A recursive construction technique that extends any d point multiplier into an n=d/sup k/ point multiplier with area that is subquadratic and delay that is logarithmic in the bit-length n is presented.
Abstract: We introduce a generalized method for constructing subquadratic complexity multipliers for even characteristic field extensions. The construction is obtained by recursively extending short convolution algorithms and nesting them. To obtain the short convolution algorithms, the Winograd short convolution algorithm is reintroduced and analyzed in the context of polynomial multiplication. We present a recursive construction technique that extends any d point multiplier into an n=d/sup k/ point multiplier with area that is subquadratic and delay that is logarithmic in the bit-length n. We present a thorough analysis that establishes the exact space and time complexities of these multipliers. Using the recursive construction method, we obtain six new constructions, among which one turns out to be identical to the Karatsuba multiplier. All six algorithms have subquadratic space complexities and two of the algorithms have significantly better time complexities than the Karatsuba algorithm.
TL;DR: The binary algorithm invented by Stein is extended and novel iterative division algorithms over GF(2/sup m/) are proposed for systolic VLSI realization and it is shown that algorithms EBd and EBdf can be mapped to parallel-in parallel-out syStolic circuits with low area-time complexities.
Abstract: We extend the binary algorithm invented by Stein and propose novel iterative division algorithms over GF(2/sup m/) for systolic VLSI realization. While algorithm EBg is a basic prototype with guaranteed convergence in at most 2m - 1 iterations, its variants, algorithms EBd and EBdf, are designed for reduced complexity and fixed critical path delay, respectively. We show that algorithms EBd and EBdf can be mapped to parallel-in parallel-out systolic circuits with low area-time complexities of O(m/sup 2/loglogm) and O(m/sup 2/), respectively. Compared to the systolic designs based on the extended Euclid's algorithm, our circuits exhibit significant speed and area advantages.
TL;DR: Two new hardware architectures are proposed for performing multiplication in GF(p) and GF (2/sup n/), which are the most time-consuming operations in many cryptographic applications, and provide new alternatives that offer faster computation of multiplication and useful features.
Abstract: Two new hardware architectures are proposed for performing multiplication in GF(p) and GF (2/sup n/), which are the most time-consuming operations in many cryptographic applications. The architectures provide very fast and efficient execution of multiplication in both GF(p) and GF(2/sup n/), and can be mainly used in elliptic curve cryptography. Both architectures are scalable and therefore can handle operands of any size. They can be configured to the available area and/or desired performance. The algorithm implemented in the architectures is the Montgomery multiplication algorithm which proved to be very efficient in both fields. The first architecture utilises a precomputation technique that reduces the critical path delay at the expense of using extra logic, which has a limited negative impact on the silicon area for operand precisions of cryptographic interest. The second architecture computes multiplication faster in GF(2/sup n/) than GF(p), which conforms with the premise of GF(2/sup n/) for hardware realisations. Both architectures provide new alternatives that offer faster computation of multiplication and useful features.
TL;DR: In this article, a low-cost coprocessor for smartcards which supports all necessary mathematical operations for a fast calculation of the Elliptic Curve Digital Signature Algorithm (ECDSA) based on the finite field GF(2 m ).
Abstract: In this article we present a low-cost coprocessor for smartcards which supports all necessary mathematical operations for a fast calculation of the Elliptic Curve Digital Signature Algorithm (ECDSA) based on the finite field GF(2 m ). These ECDSA operations are GF(2 m ) addition, 4-bit digit-serial multiplication in GF(2 m ), inversion in GF(2 m ), and inversion in GF(p). An efficient implementation of the multiplicative inversion which breaks the 11:1 limit regarding multiplications makes it possible to use affine instead of projective coordinates for point operations on elliptic curves. A bitslice architecture allows an easy adaptation for different bit lengths. A small chip area is achieved by reusing the hardware registers for different operations.
TL;DR: In this article, the authors describe a switch box implementation in a cryptographic application, where an input to the S-box is converted from a Galois field representation GF(N 2 ) to a GFG(N) 2, where the input is converted using a generating polynomial of the form x 2 +Ax+B, where A and B are elements in GF(n) and A has a value other than unity.
Abstract: Methods and systems for implementing a switch box (S-box) in a cryptographic application are described. An input to the S-box is converted from a Galois field representation GF(N 2 ) to a Galois subfield representation GF(N) 2 . The input is converted using a generating polynomial of the form x 2 +Ax+B, where A and B are elements in GF(N) and where A has a value other than unity. The multiplicative inverse of the Galois subfield representation GF(N) 2 is determined. The multiplicative inverse is converted back to the Galois field representation GF(N 2 ). An affine transformation of the multiplicative inverse is then performed.
TL;DR: In this paper, the authors presented a new sequential normal basis multiplier over GF(2 m ) where m =163,233,283,409,571, where m is the five recommended fields by NIST for elliptic curve cryptography.
Abstract: We present a new sequential normal basis multiplier over GF(2 m ) The gate complexity of our multiplier is significantly reduced from that of Agnew et al and is comparable to that of Reyhani-Masoleh and Hasan, which is the lowest complexity normal basis multiplier of the same kinds On the other hand, the critical path delay of our multiplier is same to that of Agnew et al Therefore it is supposed to have a shorter or the same critical path delay to that of Reyhani-Masoleh and Hasan Moreover our method of using a Gaussian normal basis makes it easy to find a basic multiplication table of normal elements So one can easily construct a circuit array for large finite fields, GF(2 m ) where m=163,233,283,409,571, ie the five recommended fields by NIST for elliptic curve cryptography
TL;DR: In this article, the first general multiplication algorithm in GF(2) with subquadratic area complexity of O(k) = O (k) was proposed, based on Montgomery's multiplication applied to the ring formed by the direct product of the trinomials.
Abstract: We propose the first general multiplication algorithm in GF(2) with a subquadratic area complexity of O(k) = O(k). Using the Chinese Remainder Theorem, we represent the elements of GF(2); i.e. the polynomials in GF(2)[X] of degree at most k − 1, by their remainder modulo a set of n pairwise prime trinomials, T1, . . . , Tn, of degree d and such that nd ≥ k. Our algorithm is based on Montgomery’s multiplication applied to the ring formed by the direct product of the trinomials.
TL;DR: This work presents an algorithm and architecture that integrates modular division and multiplication in both GF(p) and GF(2/sup n/) fields (unified) and uses carry-save unified adders for reduced critical path delay, making the proposed architecture faster than other previously proposed designs.
Abstract: This work presents an algorithm and architecture that integrates modular division and multiplication in both GF(p) and GF(2/sup n/) fields (unified). The algorithm is based on the extended binary GCD algorithm for modular division and on the Montgomery's method for modular multiplication. For the division operation, the proposed algorithm uses a counter to keep track of the difference between two field elements and this way eliminate the need for comparisons which are usually expensive and time-consuming. The proposed architecture efficiently supports all the operations in the algorithm and uses carry-save unified adders for reduced critical path delay, making the proposed architecture faster than other previously proposed designs. Experimental results using synthesis for AMI 0.5 /spl mu/m CMOS technology are shown and compared with other dividers and multipliers.
TL;DR: A unified algorithm to compute modular division in both GF(p) and GF(2/sup n/) fields is presented, which uses a counter variable to keep track of the difference between two field elements, and eliminates the need for comparisons.
Abstract: A unified algorithm to compute modular division in both GF(p) and GF(2/sup n/) fields is presented. It uses a counter variable to keep track of the difference between two field elements, and in this way eliminates the need for comparisons which are usually expensive and time-consuming. The computations in both fields are performed using additions/subtractions and bit shifts, besides using a simple control flow, which makes it suitable for hardware implementation.
TL;DR: An application of reconfigurable computers to developing a low-latency implementation of Elliptic Curve Cryptosystems, an emerging class of public key cryptosSystems used in secure Internet protocols, such as IPSec is presented.
Abstract: Reconfigurable Computers are general-purpose high-end computers based on a hybrid architecture and close system-level integration of traditional microprocessors and Field Programmable Gate Arrays (FPGAs). In this paper, we present an application of reconfigurable computers to developing a low-latency implementation of Elliptic Curve Cryptosystems, an emerging class of public key cryptosystems used in secure Internet protocols, such as IPSec. An issue of partitioning the description between C and VHDL, and the associated trade-offs are studied in detail. End-to-end speed-ups in the range of 895 to 1300 compared to the pure microprocessor execution time are demonstrated.
TL;DR: In this paper, an efficient FPGA implementation for modular multiplication in the finite field GF(2^m) that is suitable for implementing Elliptic Curve Cryptosystems is presented.
Abstract: This paper describes an efficient FPGA implementation for modular multiplication in the finite field GF(2^m) that is suitable for implementing Elliptic Curve Cryptosystems. We have developed a systolic array implementation of a~Montgomery modular multiplication. Our solution is efficient for large finite fields (m=160-193), that offer a high security level, and it can be scaled easily to larger values of m. The clock frequency of the implementation is independent of the field size. In contrast to earlier work, the design is not restricted to field representations using irreducible trinomials, all one polynomials or equally spaced polynomials.
TL;DR: This paper presents the design of a generic parallel finite-field GF (2/sup m/) multiplier targeted at DSP and embedded processors that has the ability to utilize different primitive polynomials as an input, thereby, being able to be programmable.
Abstract: Block (cyclic) channel coding standards for third generation cellular networks require the implementation of high-performance burst-error detection and correction algorithms. Galois field (GF) arithmetic is commonly used in this architecture for encoding and decoding error codes, however, many architectures still do not support dedicated functional units. This paper presents the design of a generic parallel finite-field GF (2/sup m/) multiplier targeted at DSP and embedded processors. As opposed to previous research, this design has the ability to utilize different primitive polynomials as an input, thereby, being able to be programmable. Moreover, a design is presented that is a combined binary and finite-field GF (2/sup m/) multiplier. Area, delay, and power dissipation results are presented from several ASIC libraries.
TL;DR: It is shown that reconfigurability with the reduction polynomial significantly benefits from the addition of a low latency divider unit and scalar point multiplication in affine coordinates.
Abstract: This paper focuses on designing elliptic curve crypto-accelerators in GF(2/sup m/) that are cryptographically scalable and hold some degree of reconfigurability. Previous work in elliptic curve crypto-accelerators focused on implementations using projective coordinate systems for specific field sizes. Their performance, scalar point multiplication per second (kP/s) was determined primarily by the underlying multiplier implementation. In addition, a multiplier only implementation and a multiplier plus divider implementation are compared in terms of critical path, area and area time (AT) product. Our multiplier only design, designed for high performance, can achieve 6314 kP/s for GF(2/sup 571/) and requires 47876 LUTs. Meanwhile our multiplier and divider design, with a greater degree of reconfigurability, can achieve 44 kP/s for GF(2/sup 571/). However, this design requires 27355 LUTs, and has a significantly higher AT product. It is shown that reconfigurability with the reduction polynomial significantly benefits from the addition of a low latency divider unit and scalar point multiplication in affine coordinates. In both cases the performance is limited by a critical path in the control logic.
TL;DR: An universal VLSI architecture for bit-parallel computation in GF(2/sup m/) is presented, based on Montgomery multiplication algorithm, which is suitable for multiple class of GF (2/Sup m/) with arbitrary field degree m.
Abstract: An universal VLSI architecture for bit-parallel computation in GF(2/sup m/) is presented The proposed architecture is based on Montgomery multiplication algorithm, which is suitable for multiple class of GF(2/sup m/) with arbitrary field degree m Due to the highly regular and modular property, our proposed universal architecture can meet VLSI design requirement After implemented by 018/spl mu/m 1P6M process, our universal architecture can work successfully at 125MHz clock rate For the finite field multiplier, the total gate count is 14K for GF(2/sup m/) with any irreducible polynomial of field degree m/spl les/8, whereas the inverse operation can be achieved by the control unit with gate count of 03K
TL;DR: A generalization of the concept of scalable and unified architectures for multiplication in GF(p) and GF(2^m) that is adjustable for the silicon area available, and does not limit the precision of the operands (variable precision).
Abstract: The design of multiplication units that are reusable and scalable is of interest for cryptographic applications, where the operand size in bits is usually large, and may significantly change depending on the required level of security or the specific cryptosystem (eg, RSA or Elliptic Curve) The use of the Montgomery multiplication (MM) method combined with techniques for time and space scheduling generates efficient and general solutions in this arena MM has proven to be useful in both GF(p) and GF(2^m), and opened up the door for unified architectures designed to accommodate both fields The scalable design does not rely on particular characteristics of the fields, it is adjustable for the silicon area available, and it does not limit the precision of the operands (variable precision) This way, the design lasts longer This paper presents a generalization of the concept of scalable and unified architectures for multiplication in GF(p) and GF(2^m) A design framework is initially presented, and followed by a design example of a radix-8 processing element for a scalable and unified MM architecture Experimental results show the potential of this method
TL;DR: A very large-scale integration implementation of Galois field arithmetic for high-speed error-control coding applications that is based on the field GF(p/sup m/) with m a small integer such as 2 or 3 and p a prime of sufficient value to generate the required field size.
Abstract: This paper presents a very large-scale integration implementation of Galois field arithmetic for high-speed error-control coding applications that is based on the field GF(p/sup m/) with m a small integer such as 2 or 3 and p a prime of sufficient value to generate the required field size. In this case, the Galois field arithmetic operations of addition, multiplication, and inversion are based on architectures using blocks that perform integer arithmetic modulo p. These integer arithmetic operations modulo p have previously been implemented with low delay power products through the use of one hot coding and barrel shifters circuits based on transistor arrays. In this paper, the same one hot coding and barrel shifters circuits are used to construct circuits that implement addition, multiplication, and inversion over GF(p/sup m/). The circuits for GF(p/sup m/) addition and multiplication with p/spl ne/2, achieve a lower power-delay product than designs based on GF(2/sup m/). Also, the architecture for GF(p/sup m/) inversion can be efficiently implemented when m=2 or m=3.
TL;DR: In this paper two Finite Field multiplier architectures and VLSI implementations are proposed using the Montgomery Multiplication Algorithm and have more than adequate results in comparison with other known multipliers.
Abstract: Finite Field arithmetic is becoming increasingly a very prominent solution for calculations in many applications. The most demanding Finite Field arithmetic operation is multiplication. In this paper two Finite Field multiplier architectures and VLSI implementations are proposed using the Montgomery Multiplication Algorithm. The first architecture (Folded) is optimized in order to minimize the silicon covered area (gate count) and the second (Pipelined) is optimized in order to reduce the multiplication time delay. Both architectures are measured in terms of gate count-chip covered area and multiplication time delay and have more than adequate results in comparison with other known multipliers.
TL;DR: A new architecture for an arithmetic unit (AU) for applications that operate over GF(2/sup m/), in particular elliptic curve cryptography, which offers potentially large improvements when considering the area-time product and, therefore, improved efficiency.
Abstract: This paper proposes a new architecture for an arithmetic unit (AU) for applications that operate over GF(2/sup m/), in particular elliptic curve cryptography. The AU is completely scalable enabling it to operate over any field degree without the need to reconfigure hardware. Operands are considered as a series of w-bit words, where w can be set to meet design requirements. By transferring the complexity of control to software, whilst retaining the generic functions of division and multiplication in hardware, a low area, highly flexible implementation can be attained. A proof-of concept AU was implemented and tested in FPGA. Theoretical results were calculated for scalar multiplication, which were compared to a less scalable implementation. Though the AU cannot achieve the computational speed attained by the other implementation it offers potentially large improvements when considering the area-time product and, therefore, improved efficiency.
TL;DR: It is shown here that if a matroid is not representable over GF(5), then this can be verified by a short proof, here a "short proof" is a proof whose length is bounded by some polynomial in the number of elements of the matroid.
TL;DR: This work proposes a new class of combinatorially developed codes obtained by properly combining Reed-Solomon type parity-check matrices and sparse parity- check matrices based on permutation matrices, and introduces a new decoding algorithm based on matrix representations of the underlying field, which trades performance for complexity.
Abstract: It is well known that random-like low-density parity-check (LDPC) codes over the extension fields GF(2/sup m/) of GF(2), for m>1, tend to outperform their binary counterparts of comparable length and rate. At the same time, structured LDPC codes offer the advantage of reduced implementation and storage complexity, so that it is of interest to investigate mathematical design methods for codes on graphs over fields of large order. We propose a new class of combinatorially developed codes obtained by properly combining Reed-Solomon (RS) type parity-check matrices and sparse parity-check matrices based on permutation matrices. The proposed codes have large girth and minimum distance. In order to further reduce the decoding complexity of the proposed scheme, we introduce a new decoding algorithm based on matrix representations of the underlying field, which trades performance for complexity. The particular field representation described in this abstract is based on a power basis generated by a companion matrix of a primitive polynomial of the field GF(2/sup m/). It is observed that the choice of the primitive polynomial influences the cycle distribution of the code graph.
TL;DR: The proposed algorithm reduces the number of multiplications required to compute the multiplicative inversion by precomputing the inversion in GF(2^n) for small value n and then by decomposing m-1 into several factors and a small remainder.
TL;DR: Numerical results indicate that a combination of symbol-interleaving and the EF decoding offers the best performance even for imperfect interleaving, which can compensate for the superiority of perfect symbol- Interleaving to perfect bit-Interleaving addressed by Wicker (1992).
Abstract: We analytically compare the performance of imperfectly symbol- and bit-interleaved block codes over GF(2/sup m/) on a first-order Markovian channel in terms of the error probability of a received word. The analytical method developed in Sakakibara (2000) is extended, so that binary transmission of block codes over GF(2/sup m/) can be incorporated with the assumption of negligible probabilities of decoding error. Expressions are derived for two decoding strategies; independent bounded-distance (IBD) decoding and error-forecasting (EF) decoding. In the IBD decoding, channel errors up to half of the minimum distance can be decoded in each received word. On the other hand, combining an erasures-and-errors decoding algorithm, more errors may be corrected in the EF decoding. The derived expressions are examined on two typical classes of two-state Markovian channels. Numerical results indicate that a combination of symbol-interleaving and the EF decoding offers the best performance even for imperfect interleaving. This can compensate for the superiority of perfect symbol-interleaving to perfect bit-interleaving addressed by Wicker (1992). It is also found that the optimum depths of symbol- and bit-interleaver is approximately given by 2b and 4b, respectively, where b is the average length of burst errors in bits.
TL;DR: A new construction method of bit-parallel multipliers over GF(2/sup m/) for two classes of finite fields is presented and it is proved that the method reduces the area requirements of the multipliers with respect to other similar multipliers.
Abstract: Galois fields GF(2/sup m/) are used in a wide number of applications such as cryptography, digital signal processing and error-control codes. The multiplication is considered the most important and one of the most complex GF(2/sup m/) operations, so efficient multiplier architectures are highly desired. A new construction method of bit-parallel multipliers over GF(2/sup m/) for two classes of finite fields is presented. Our approach determines groups of subexpressions that can be shared among the product coordinates. General expressions are given, and the theoretical complexity analysis proves that our multipliers reduce the best time complexities known to date. The multipliers have been implemented on Xilinx Virtex FPGAs. The experiments prove that our method reduces the area requirements of the multipliers with respect to other similar multipliers.
TL;DR: New variants of MMH and SQUARE universal hash functions families over the finite field (Galois field) GF (2/sup n/) can be utilized to provide efficient and secure message authentication.
Abstract: This paper proposes variants of MMH and SQUARE universal hash functions families over the finite field (Galois field) GF (2/sup n/). These new variants are suited for implementation on platforms where there are no built-in specialized algorithms for modular multiplication. There are especially suited on platforms, which have already an embedded GF (2/sup n/)-based cryptosystem. These variants can be utilized to provide efficient and secure message authentication.
TL;DR: The multiplier provides a fast and a hardware efficient architecture for multiplication of two elements in GF(2 m ) for large m and has twice higher throughput rate.
Abstract: We present new designs of low complexity and low latency systolic arrays for multiplication in GF(2 m ) when there is an irreducible all one polynomial (AOP) of degree m. Our proposed bit parallel array has a reduced latency and hardware complexity compared with previously proposed designs. For a cryptographic purpose, we derive a linear systolic array using our algorithm and show that our design has a latency m/2+1 and a throughput rate 1/(m/2+1). Compared with other linear systolic arrays, we find that our design has at least 50 percent reduced hardware complexity and latency, and has twice higher throughput rate. Therefore our multiplier provides a fast and a hardware efficient architecture for multiplication of two elements in GF(2 m ) for large m.
TL;DR: The ringed bit-parallel systolic multiplier over the class of GF(2m) is free of global connections and requires fewer gates and input pins than the other relative multipliers proposed in Liu et al. (2000).
TL;DR: A general algorithm to design fast parallel multipliers in any basis over GF(2/sup m/), avoiding any basis-dependent procedure or "ad hoc" optimization, as usually proposed in literature is presented.
Abstract: We present a general algorithm to design fast parallel multipliers in any basis over GF(2/sup m/), avoiding any basis-dependent procedure or "ad hoc" optimization, as usually proposed in literature. Although the total number of gates is not guaranteed to be the absolute minimum, the algorithm is aimed at minimizing the number of XOR gates, reaching the minimum for the AND gate number. For the sake of comparison, lower and upper bounds to space and time complexities have been explicitly evaluated. As a significant example, for several in of practical interest, the algorithm has been applied to Gaussian normal basis parallel multipliers.
TL;DR: A new architecture is presented that can simultaneously process modular multiplication and squaring using the Montgomery algorithm over GF(2 m ) in m clock cycles based on a cellular automata, and can be utilized efficiently for the implementation of VLSI.
Abstract: Exponentiation in the Galois Field GF(2 m ) is a primary operation for public key cryptography, such as the Diffie-Hellman key exchange, ElGamal. The current paper presents a new architecture that can simultaneously process modular multiplication and squaring using the Montgomery algorithm over GF(2 m ) in m clock cycles based on a cellular automata. The proposed architecture makes use of common-multiplicand multiplication in LSB-first modular exponentiation over GF(2 m ). In addition, modular exponentiation, division, and inversion architecture can also be implemented, and since cellular automata architecture is simple, regular, modular, and cascadable, it can be utilized efficiently for the implementation of VLSI.