TL;DR: This paper presents a transformation method to implement low-complexity Montgomery multipliers for all-one polynomials and trinomials that are highly appropriate for VLSI systems because of their regular interconnection pattern, modular structure, and fully inherent parallelism.
Abstract: Recently, cryptographic applications based on finite fields have attracted much interest. This paper presents a transformation method to implement low-complexity Montgomery multipliers for all-one polynomials and trinomials. Using this method, we propose a new bit-parallel systolic architecture for computing multiplications over GF(2/sup m/). These new multipliers have a latency m+1 clock cycles and each cell incorporates at most one 2-input AND gate, two 2-input XOR gates, and four 1-bit latches. Moreover, these new multipliers are shown to exhibit significantly lower latency and circuit complexity than the related systolic multipliers and are highly appropriate for VLSI systems because of their regular interconnection pattern, modular structure, and fully inherent parallelism.
TL;DR: In this paper, the authors presented two multipliers for all irreducible trinomials based on GF(2/sup n/), where the multipliers match the best results.
Abstract: Based on a new representation of GF(2/sup n/), we present two multipliers for all irreducible trinomials. Space complexities of the multipliers match the best results. The time complexity of one multiplier is T/sub A/ + (1 + [log/sub 2/ n])T/sub X/ for all irreducible trinomials, where T/sub A/ and T/sub X/ are the delay of one 2-input AND and XOR gates, respectively.
TL;DR: Analysis shows that the computational delay time of the proposed architecture is significantly less than the previously proposed digit-serial systolic multiplier, and since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementation.
Abstract: In this paper, an efficient digit-serial systolic array is proposed for multiplication in finite field GF(2/sup m/) using the standard basis representation. From the least significant bit first multiplication algorithm, we obtain a new dependence graph and design an efficient digit-serial systolic multiplier. If input data come in continuously, the proposed array can produce multiplication results at a rate of one every /spl lceil/m/L/spl rceil/ clock cycles, where L is the selected digit size. Analysis shows that the computational delay time of the proposed architecture is significantly less than the previously proposed digit-serial systolic multiplier. Furthermore, since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementation.
TL;DR: A generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation and it is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w.r.t. other designs.
Abstract: Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures. This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF (2) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein’s) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved. Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w.r.t. other designs while being more flexible, reliable and with minimum number of intercell interconnects. The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delay structures. Further a novel restoring mechanism for result sequences of EEA is proposed
TL;DR: This is the first paper that quantifies performance of standard NIST and SECG elliptic curves over GF(2/sup m/) on an 8-bit microprocessor equipped with a dual field multiplier.
Abstract: We describe and analyze architectural extensions to accelerate the public key cryptosystem elliptic curve cryptography (ECC) on 8-bit microprocessors. We show that simple extensions of the data path suffice to efficiently support ECC over GF(2/sup m/). These extensions include an extended multiplier that generates results for both integer multiplications and multiplications in fields GF(2/sup m/) and a multiply-accumulate instruction for efficiently performing multiple precision multiplications. To our knowledge, this is the first paper that quantifies performance of standard NIST and SECG elliptic curves over GF(2/sup m/) on an 8-bit microprocessor equipped with a dual field multiplier. On the ATmegal28 microprocessor running at 8 MHz we measured an execution time of 0.29 s for a 163-bit ECC point multiplication over GF(2/sup m/), 0.81s for a 160-bit ECC point multiplication over GF(p), and 11 s for a 1024-bit RSA private key operation - the chosen key sizes provide equivalent security strength.
TL;DR: The main idea is to combine the redundant representation and the Karatsuba method to design an efficient bit-parallel multiplier for the finite field GF(2/sup m/) defined by an irreducible all-one polynomial.
Abstract: This paper presents a new bit-parallel multiplier for the finite field GF(2/sup m/) defined by an irreducible all-one polynomial. In order to reduce the complexity of the multiplier, we introduce a redundant representation and use the well-known multiplication method proposed by Karatsuba. The main idea is to combine the redundant representation and the Karatsuba method to design an efficient bit-parallel multiplier. As a result, the proposed multiplier requires about 25 percent fewer AND/XOR gates than the previously proposed multipliers using an all-one polynomial, while it has almost the same time delay as the previously proposed ones.
TL;DR: A class of universal unidirectional bit serial systolic architectures for multiplicative inversion and division over Galois field GF(2/sup m/) is presented and the field elements are represented with polynomial (standard) basis.
Abstract: A class of universal unidirectional bit serial systolic architectures for multiplicative inversion and division over Galois field GF(2/sup m/) is presented. The field elements are represented with polynomial (standard) basis. These systolic architectures have no carry propagation structures and are suitable for hardware implementations where the dimension of the field is large and may vary. This is the typical case for cryptographic applications. These architectures are independent of any defining irreducible polynomial of a given degree as well. The time complexity is constant and area complexity is linear (w.r.t. field dimension) and these measures are equivalent to or exceed similar proposed designs.
TL;DR: This work proposes the first general multiplication algorithm in GF(2/sup k/) with a subquadratic area complexity of O(k/sup 8/5/) = O( k/sup 1.6/) using the Chinese remainder theorem.
Abstract: We propose the first general multiplication algorithm in GF(2/sup k/) with a subquadratic area complexity of O(k/sup 8/5/) = O(k/sup 1.6/). Using the Chinese remainder theorem, we represent the elements of GF(2/sup k/); i.e. the polynomials in GF(2) [X] of degree at most k-1, by their remainder modulo a set of n pairwise prime trinomials, T/sub 1/,...,T/sub n/, of degree d and such that nd /spl ges/ k. Our algorithm is based on Montgomery's multiplication applied to the ring formed by the direct product of the trinomials.
TL;DR: A multiplexer-based algorithm for double-exponentiation in GF(2^m) that only requires m multiplications and saves about 66% time complexity while comparing with the ordinary binary method.
TL;DR: This paper deals with an FPGA implementation of an efficient serial multiplier over the binary extension fields GF(2193) and GF(2239), which are included among the ones recommended by NIST standards for Elliptic Curve Cryptography.
Abstract: Arithmetic operations over finite fields GF(2m) are widely used in cryptography, error-correcting codes and signal processing. In particular, multiplication is especially relevant since other arithmetic operators, such as division or exponentiation, which they usually utilize multipliers as building blocks. Hardware implementation of field multiplication may provide a great speedup in procedure's performance, which easily exceeds the one observed in software platforms. In this paper we deal with an FPGA implementation of an efficient serial multiplier over the binary extension fields GF(2193) and GF(2239). Those extension fields are included among the ones recommended by NIST (National Institute of Standards and Technology) standards for Elliptic Curve Cryptography. Our multiplier is of type Serial/Parallel LSB-first and operates with a latency of m-clock cycles, where m is the length of the field word. We calculate the space complexity attending the number of slices used in the FPGA
TL;DR: The results in this paper helps in deciding what kind of primitive polynomial should be chosen and which should be discarded in terms of cryptographic applications and involve important theoretical identities in Terms of t-nomial multiples which were not known earlier.
TL;DR: A new digit-serial systolic multiplier over GF(2m) for cryptographic applications that has the features of regularity, modularity, and unidirectional data flow, and is well suited to VLSI implementations.
Abstract: This paper presents a new digit-serial systolic multiplier over GF(2/sup m/) for cryptographic applications. When input data come in continuously, the proposed array produces multiplication results at a rate of one every [m/D] + 2 clock cycles, where D is the selected digit size. Since the inner structure of the proposed array is tree-type, critical path increases logarithmically proportional to D. Therefore, the computation delay of the proposed architecture is significantly less than previously proposed digit-serial systolic multipliers whose critical path increases proportional to D. Furthermore, since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementations.
TL;DR: The new bit-serial and bit-parallel architectures proposed have the same throughput and latency but smaller hardware cost and shorter critical path delay than the best comparable architectures proposed previously.
Abstract: We propose in-place systolic bit-serial, bit-parallel, and folded bit-parallel architectures for inversion in GF(2m). Our bit-serial architectures have the highest throughput 1/m of the three types but use more hardware than the other two types. Our bit-parallel architectures have throughput of 1/(2m − 1) with interleaved inputs and 1/(4m − 2) without interleaving. The new bit-serial and bit-parallel architectures proposed have the same throughput and latency but smaller hardware cost and shorter critical path delay than the best comparable architectures proposed previously. We also propose novel folded versions of our bit-parallel architectures which achieve 1/(4m − 2) non-interleaved throughput with even less hardware than our bit-parallel architectures. To the best of our knowledge, no comparable scheme has been proposed previously. The circuitry in each cell of our bit-serial architectures and the (folded and unfolded) bit-parallel architectures with distributed ring counters is the same for all values of m. Since there are no global control or data signals either, these architectures have excellent scalability properties and are very suitable for applications where m is large or variable. Implementation details using the TSMC Avanti 0.18 µm CMOS standard cell library are provided.
TL;DR: This paper presents ringed bit-parallel systolic multipliers for computing AB+C over a class of finite fields GF(2^m), in which all elements are represented using a root of an all-one polynomial or an equally spacedPolynomial, and proposed a general rule to plan the multipliers.
TL;DR: In this paper, a Weierstrass semigroup of the point at infinity for the case q=2, r>=3 is calculated and a new record-giving [32,16,>=12]-code over GF(8) was presented.
Abstract: In this paper we studied generalization of Hermitian function field proposed by A.Garcia and H.Stichtenoth. We calculated a Weierstrass semigroup of the point at infinity for the case q=2, r>=3. It turned out that unlike Hermitian case, we have already three generators for the semigroup. We then applied this result to codes, constructed on generalized Hermitian function fields. Further, we applied results of C.Kirfel and R.Pellikaan to estimating a Feng-Rao designed distance for GH-codes, which improved on Goppa designed distance. Next, we studied the question of codes dual to GH-codes. We identified that the duals are also GH-codes and gave an explicit formula. We concluded with some computational results. In particular, a new record-giving [32,16,>=12]-code over GF(8) was presented.
TL;DR: A generalized version of the plus-minus algorithm is used for implementing dividers over GF(p/sup n/) and the results of FPGA implementations are reported, and a comparison is made between dividers in the general case and in the particular cases of GF(2/Sup n/) and GF( p).
Abstract: A generalized version of the plus-minus algorithm is used for implementing dividers over GF(p/sup n/). Generic dividers have been synthesized in the general case of GF(p/sup n/) and in the particular cases of GF(2/sup n/) and GF(p). The theoretical costs are O(logN) being N the number of field elements, and the theoretical computation times are O(logN) in the case of dividers over GF(p/sup n/) and GF(2/sup n/), and O((logN)/sup 2/) in the case of dividers over GF(p). Finally, the results of FPGA implementations are reported, and a comparison is made between dividers over GF(p), GF(2/sup n/) and GF(p/sup n/).
TL;DR: In this paper, a quaternary systolic product-sum computation circuit for GF((2^2)m) using voltage-mode vMOSFETs is presented, which is composed of four basic cells connected in a pipelined fashion.
TL;DR: This work proposes a new modular multiplication algorithm for GF(P) which has a complexity of only n2 + 7n, which to the knowledge this is superior to the complexity values of any other modular multiplication algorithms forGF(P).
Abstract: The performance of today’s public key cryptosystems depends mainly on the efficiency of the underlying finite field arithmetic, especially the modular multiplication. In this work we propose a new modular multiplication algorithm for GF(P ) which has a complexity of only n2 + 7n. To our knowledge this is superior to the complexity values of any other modular multiplication algorithm for GF(P ).
TL;DR: This paper examines certain properties and elucidate certain alternative strategies of and on the Itoh Tsujii algorithm that will make it suitable for this emerging scenario of high word size modular arithmetic operations.
Abstract: Modular arithmetic operations especially modular multiplication have extensive applications in elliptic curve cryptanalysis, error control coding and linear recurring sequences. These operations have steadily grown in the word size in the past. Current designs and approaches may not be the most efficient for such high word sizes. Also usually, most approaches optimize for either area or speed, not both. In this paper, we examine certain properties and elucidate certain alternative strategies of and on the Itoh Tsujii algorithm (Guajardo and Paar, 2002) that will make it suitable for this emerging scenario. These strategies take a holistic approach to the problem, and aims at optimizing both speed and area for a given word length. These claims are supported by mathematical analysis, simulation and synthesis of a prototype of the suggested strategy. We also examine various enhancements that can be effected in the given architecture.
TL;DR: A compact and efficient FPGA architecture for ECC over finite fields of even characteristic is presented and the implementation is balanced in order to increase the security w.r.t. simple side channel attacks.
Abstract: This paper proposes efficient algorithms for elliptic curve cryptography (ECC). As an example a compact and efficient FPGA architecture for ECC over finite fields of even characteristic is presented. The implementation is balanced in order to increase the security w.r.t. simple side channel attacks.
TL;DR: In this article, a technique for performing Galois field arithmetic to detect errors in digital data stored on disks is described, where two 12-bit or two 10-bit numbers are multiplied together using tower arithmetic.
Abstract: Techniques are provided for performing Galois field arithmetic to detect errors in digital data stored on disks. Two 12-bit numbers or two 10-bit numbers are multiplied together in Galois field using tower arithmetic. In the 12-bit embodiment, a base field GF(2) is first extended to GF(2 3 ), GF(2 3 ) is extended to a first quadratic extension GF(2 6 ), and GF(2 6 ) is extended to a second quadratic extension GF(2 12 ). In the 10-bit embodiment, the base field GF(2) is first extended to GF(2 5 ), and GF(2 5 ) is extended to a quadratic extension GF(2 10 ). Each of the extensions for the 10-bit and 12-bit embodiments is performed using an irreducible polynomial. All of the polynomials used to generate the first and the second quadratic extensions of the Galois field are in the form x 2 +x+K, where K is an element of the ground field whose absolute trace equals 1.
TL;DR: A new method to construct GF(2m), where m > 0, cyclic low-density parity-check codes is presented, which can achieve performance close to the sphere-packing-bound constrained for binary transmission.
Abstract: Based on the ideas of cyclotomic cosets, idempotents and Mattson-Solomon polynomials, we present a new method to construct GF(2^m), where m>0 cyclic low-density parity-check codes. The construction method produces the dual code idempotent which is used to define the parity-check matrix of the low-density parity-check code. An interesting feature of this construction method is the ability to increment the code dimension by adding more idempotents and so steadily decrease the sparseness of the parity-check matrix. We show that the constructed codes can achieve performance very close to the sphere-packing-bound constrained for binary transmission.
TL;DR: A recursion formula and a searching algorithm are introduced by which one can find the certain number of such matrices and generate an n×n ergodic matrix Q_g only by a n-dimension vector g over GF(2~k).
Abstract: Discussed the matrix over GF(2~k) what is called "ergodic matrix".By the analyses that we done in this paper,one can find the ergodic matrix has a number of good features that can be applied to cryptography.In order to look for the required ergodic matrix,this paper introduced a recursion formula and giving a searching algorithm.by which one can find the certain number of such matrices and generate an n×n ergodic matrix Q_g only by a n-dimension vector g over GF(2~k).So that enables using a n-dimension vector to express a n×n ergodic matrix.thereby saving the storage and bandwidth.
TL;DR: A fast inversion algorithm over GF(2m) with the polynomial basis representation is proposed that executes in about 27.5% or 45.6% less iterations than the extended binary gcd algorithm (EBGA) or the montgomery inverse algorithm (MIA) overGF(2163), respectively.
Abstract: The performance of public-key cryptosystems is mainly appointed by the underlying finite field arithmetic. Among the basic arithmetic operations over finite field, the multiplicative inversion is the most time consuming operation. In this paper, a fast inversion algorithm over GF(2m) with the polynomial basis representation is proposed. The proposed algorithm executes in about 27.5% or 45.6% less iterations than the extended binary gcd algorithm (EBGA) or the montgomery inverse algorithm (MIA) over GF(2163), respectively. In addition, we propose a new hardware architecture to apply for low-complexity systems. The proposed architecture takes approximately 48.3% or 24.9% less the number of reduction operations than [4] or [8] over GF(2239), respectively. Furthermore, it executes in about 21.8% less the number of addition operations than [8] over GF(2163).
TL;DR: The proposed linear systolic arrays for multiplication in GF(2m) for cryptographic applications using irreducible trinomials xm+xk+1 have the features of regularity and modularity and are well suited to VLSI implementations.
Abstract: Many of the cryptographic schemes over small characteristic finite fields are efficiently implemented by using a trinomial basis. In this paper, we present new linear systolic arrays for multiplication in GF(2m) for cryptographic applications using irreducible trinomials xm+xk+1. It is shown that our multipliers with trinomial basis require approximately 20 percent reduced hardware resources compared to previously proposed linear systolic multipliers using general irreducible polynomials. The proposed linear systolic arrays have the features of regularity and modularity, therefore, they are well suited to VLSI implementations.
TL;DR: The paper focuses on the design of a new dual field divider that can achieve performance of 1/m throughput and is intended to be used in an elliptic curve crypto-accelerator for GF(2/sup m/) and GF(p).
Abstract: The paper focuses on the design of a new dual field divider that can achieve performance of 1/m throughput. This dual field division unit can operate at 118 MHz with a latency of 7m-2 cycles and has an area requirement 15 XOR2, 40 AND2, 29 MUX2, and 7 INV gates per processing element with a total of 2m processing elements. It is intended to be used in an elliptic curve crypto-accelerator for GF(2/sup m/) and GF(p). The actual performance for scalar point multiplication in GF(2/sup 571/) running at 100 MHz would be 20.4 kP/s. The actual performance for scalar point multiplication in GF(p) with |p| = 521 running at 100 MHz would be 24.4 kP/s.
TL;DR: A vector multiply-accumulate (MAC) architecture over the binary extension field GF(2/sup m/) capable of supporting multiple precisions simultaneously and utilizing an existing scalar structure for performing multiple operations in vector mode is presented.
Abstract: Finite field arithmetic is useful in the implementation of error-correcting codes as well as cryptographic protocols. Large finite field numbers are particularly important in the implementation of elliptic curve cryptography. This paper presents a vector multiply-accumulate (MAC) architecture over the binary extension field GF(2/sup m/) capable of supporting multiple precisions simultaneously. The vector MAC can perform one GF(2/sup m/) or two GF(2/sup m/) multiply-accumulates using essentially the same hardware as a scalar GF(2/sup m/) Mastrovito-type multiplier. The vector capability is enabled by inserting mode-dependent masks in the bit product and reduction arrays of the GF(2/sup m/) MAC. This architecture leverages an existing scalar structure for performing multiple operations in vector mode. Essentially the same hardware is shared between scalar and vector modes. Although there is a slight delay and area penalty for the mode-dependent masking, this overhead is relatively insignificant. We implemented both the stand-alone scalar GF(2/sup m/) MAC and the vector GF(2/sup m/) MAC in structural Verilog and synthesized the designs on a 0.18 micron standard cell library to compare the area and delay for different values of m. The vector MAC can be utilized in an environment where repeated GF(2/sup m/) multiplications that have no dependencies need to be performed. Instead of serializing these individual operations, they can be performed in pairs.
TL;DR: New multiplier in GF(2 8) is designed, which is simpler and faster than the classical GF (28)) multiplier, using the Galois subfield GF( 2 4) multiplier.
Abstract: A new RS (Reed Solomon) encoder design method, using Galois subfield GF(24) multiplier, is described. The encoder is designed using erasure correction method. Here new multiplier in GF(2 8) is designed, which is simpler and faster than the classical GF(28) multiplier, using the Galois subfield GF(2 4) multiplier
TL;DR: A novel VLSI architecture for division and multiplication in GF(2m), aimed at applications in low cost elliptic curve cryptographic processors that provides a high degree of flexibility and scalability with respect to the field size m.
Abstract: We present a novel VLSI architecture for division and multiplication in GF(2m), aimed at applications in low cost elliptic curve cryptographic processors. A compact and fast arithmetic unit (AU) was designed which uses substructure sharing between a modified version of the binary extended greatest common divisor (GCD) and the most significant bit first (MSB-first) multiplication algorithms. This AU produces division results at a rate of one per 2m–1 clock cycles and multiplication results at a rate of one per m clock cycles. Analysis shows that the computational delay time of the proposed architecture for division is significantly less than previously proposed bit-serial dividers and has the advantage of reduced chip area requirements. Furthermore, since this novel architecture does not restrict the choice of irreducible polynomials and has the features of regularity and modularity, it provides a high degree of flexibility and scalability with respect to the field size m.
TL;DR: A new algorithm and an architecture for it to compute the modular multiplication over GF(2m) based on the standard basis representation and use the property of irreducible all one polynomial as a modulus are proposed, suitable for VLSI implementation.
Abstract: This paper proposes a new algorithm and an architecture for it to compute the modular multiplication over GF(2m). They are based on the standard basis representation and use the property of irreducible all one polynomial as a modulus. The architecture, named SSM(Semi-Systolic Multiplier) has the critical path with 1-DAND+1-DXOR per cell and the latency of m+1. It has a lower latency and a smaller hardware complexity than previous architectures. Since the proposed architecture has regularity, modularity and concurrency, they are suitable for VLSI implementation.