TL;DR: This work investigates different solving techniques corresponding to different attacker models and eventually refine the attack when considering particular implementations of the multiplication, particularly on GF(2128) multiplication.
Abstract: In this paper, we study the side-channel security of the field multiplication in GF(2 n ). We particularly focus on GF(2128) multiplication which is the one used in the authentication part of \(\mathsf{AES}\textrm{-}\mathsf{GCM}\) but the proposed attack also applies to other binary extensions. In a hardware implementation using a 128-bit multiplier, the full 128-bit secret is manipulated at once. In this context, classical DPA attacks based on the divide and conquer strategy cannot be applied. In this work, the algebraic structure of the multiplication is leveraged to recover bits of information about the secret multiplicand without having to perform any key-guess. To do so, the leakage corresponding to the writing of the multiplication output into a register is considered. It is assumed to follow a Hamming weight/distance leakage model. Under these particular, yet easily met, assumption we exhibit a nice connection between the key recovery problem and some classical coding and Learning Parities with Noise problems with certain instance parameters. In our case, the noise is very high, but the length of the secret is rather short. In this work we investigate different solving techniques corresponding to different attacker models and eventually refine the attack when considering particular implementations of the multiplication.
TL;DR: A new inversion algorithm based on ternary representations that reduces the latency of inversion significantly for the fields recommended by NIST if hybrid-double multipliers are employed and is faster than the existing techniques by providing ASIC synthesis results using 65-nm CMOS technology.
Abstract: Fast inversion in finite fields is crucial for high-performance cryptography and codes. We present techniques to exploit the recently proposed hybrid-double multipliers for fast inversions in binary fields GF(2m) with normal bases. A hybrid-double multiplier computes a double multiplication, the product of three elements in GF(2m), with a latency comparable to the latency of single multiplication of two elements. Traditional approaches, such as Itoh-Tsujii, cannot utilize hybrid-double multipliers. We devise a new inversion algorithm based on ternary representations that exploits their potential. The algorithm reduces the latency of inversion significantly for the fields recommended by NIST if hybrid-double multipliers are employed. For example, the algorithm computes an inversion in GF(2163) with only five double multiplications whereas the Itoh-Tsujii algorithm requires nine single or double multiplications. We propose a new inverter architecture using this new algorithm and a hybrid-double multiplier. We show that it is faster than the existing techniques by providing ASIC synthesis results using 65-nm CMOS technology. For example, our inverter for GF(2163) achieves about 34 percent shorter computation time than an inverter using the Itoh-Tsujii algorithm and a single multiplier.
TL;DR: The paper proposes an architecture of a finite field multiplier that uses the Karatsuba-Ofman algorithm in order to reduce the latency of the finite field multiplication for larger key sizes, and to the authors' best knowledge, the proposed scalable ECPs are the fastest E CPs that can support all 5 pseudo-random or Koblitz curves recommended by NIST.
Abstract: This paper presents the architecture of a scalable elliptic curve cryptography (ECC) processor (ECP). Two versions of scalable ECPs are presented, one for binary field pseudo-random curves and one for binary field Koblitz curves. The implementations of these designs are able to support all 5 key sizes of pseudo-random or Koblitz curves recommended by the National Institute of Standards and Technology (NIST) without reconfiguring the hardware. The paper proposes an architecture of a finite field multiplier that uses the Karatsuba-Ofman algorithm in order to reduce the latency of the finite field multiplication for larger key sizes. As a result, the latency of the overall elliptic curve point multiplication (ECPM) is reduced compared to previous designs of the scalable ECPs. To the authors' best knowledge, the proposed scalable ECPs are the fastest ECPs that can support all 5 pseudo-random or Koblitz curves recommended by NIST.
TL;DR: The squarer is based on the generalised polynomial basis of GF(2 n ) and its gate delay matches the best results, whereas its XOR gate complexity is n + 1, which is only about two thirds of the current best results.
Abstract: Explicit formulae and complexities of bit-parallel GF(2
n
) squarers for a new class of irreducible pentanomials
x n
+
x n-1
+
x k
+
x
+ 1, where
n
is odd and 1 <;
k
<; (
n -
1)/2 are presented. The squarer is based on the generalised polynomial basis of GF(2
n
). Its gate delay matches the best results, whereas its XOR gate complexity is
n
+ 1, which is only about two thirds of the current best results.
TL;DR: In this paper, the complexity of bit-parallel GF(2¯¯¯¯ n ¯¯¯¯) squarer is investigated for a new class of irreducible pentanomials.
Abstract: Explicit formulae and complexities of bit-parallel GF(2
n
) squarers for a new class of irreducible pentanomials
x n
+
x n-1
+
x k
+
x
+ 1, where
n
is odd and 1 <;
k
<; (
n -
1)/2 are presented. The squarer is based on the generalised polynomial basis of GF(2
n
). Its gate delay matches the best results, whereas its XOR gate complexity is
n
+ 1, which is only about two thirds of the current best results.
TL;DR: A high-speed and pipelined bit-parallel multiplier over binary finite fields for elliptic curve cryptosystems and is efficient for FPGA and VLSI implementation.
Abstract: This paper presents a high-speed and pipelined bit-parallel multiplier over binary finite fields for elliptic curve cryptosystems. The architecture of this multiplier is based on a parallel structure and multiplication by 2, so that the two inputs apply to the circuit simultaneously and in parallel form. Furthermore, the structure of the proposed circuit is reconfigurable to the change of the field size. Our implementation is in the gate level by high-speed and low-cost combinational logic circuits. The pipelining technique is applied to the proposed architecture to shorten the critical path delay and to conclude the computations in one clock cycle. The proposed architecture is efficient for FPGA and VLSI implementation. This work has been successfully verified and synthesized using Xilinx ISE 11 by Virtex-4, XC4VLX200 FPGA.
TL;DR: This proposal parallelizes an algorithm coming from multivariate cryptography, and makes it efficient by optimizing the algorithm with GPU, and provides fast multiplications over GF(232), the core operation of QUAD.
Abstract: QUAD stream cipher is a symmetric cipher based on multivariate public-key cryptography(MPKC), which uses multivariate polynomials as encryption keys. It holds the provable security property based on the computational hardness assumption. More specifically, the security of QUAD depends on the hardness of solving non-linear multivariate quadratic systems over a finite field, which is known as an NP-complete problem. However, QUAD is slower than other stream ciphers, and an efficient implementation, which has a reduced computational cost, is required. In this paper, we propose some implementations of QUAD over GF(232) on Graphics Processing Units(GPU) and compare them. Moreover, we provide fast multiplications over GF(232), the core operation of QUAD. Our implementation gives the fastest throughput of QUAD as 24.827 Mbps. We propose an efficient implementation for computing with multivariate polynomials in multivariate cryptography on GPU and evaluate the efficiency of the proposal. GPU is considered to be a commodity parallel arithmetic unit. Our proposal parallelizes an algorithm coming from multivariate cryptography, and makes it efficient by optimizing the algorithm with GPU.
TL;DR: This paper proposes how the parallel tile assembly process could be used for computing the modular-square, modular-multiplication with two same inputs, over finite field GF(2(n)) and could obtain the final result within less steps than another molecular computing system designed in the previous study.
TL;DR: A modified low-power dual-field 4-to-2 carry-save adder that has internal logic structure that reduces the chance of glitches occurrence is proposed that is suitable for implementation where both area and performance are of concern.
Abstract: This paper presents new low-power, high-speed unified and scalable word-based radix 8 architecture for Montgomery modular multiplication in GF(P) and GF(2n). This architecture has some similarities to the architecture of Huang, but it achieves more reduction in area and power consumption. To speed up the modular multiplication process, the hardware architecture employs carry-save addition to avoid carry propagation at each addition operation of the add-shift loop. To reduce power consumption, some latches called glitch blockers are employed at the outputs of some circuit modules to reduce the spurious transitions and the expected switching activities of high fan-out signals in the architecture. Also, we proposed a modified low-power dual-field 4-to-2 carry-save adder that has internal logic structure that reduces the chance of glitches occurrence. An ASIC implementation of the proposed architecture shows that it can perform 1,024-bit modular multiplication (for word size w = 32) in about 5.45 μs. Also, the results show that it has smaller Area × Time values compared to all unified and scalable designs by ratios ranging from 12.2 to 66.8 %, which makes it suitable for implementation where both area and performance are of concern. Also, it has higher throughput over them by ratios ranging from 6.0 to 80.7 %. In addition, it achieves a decrease in power consumption compared to these designs by ratios ranging from 18.8 to 52.6 %. By comparing to the designs that are not unified, it has slightly higher Area × Time and lower throughput values compared to some of them. However, it achieves a significant low-power consumption compared to all of them.
TL;DR: In this paper, a method of generating an encoded data packet over GF(2) was proposed, which is based on determining a plurality of data packets in dependence on a Latin rectangle, and then bitwise XORing the determined plurality.
Abstract: Disclosed herein is a method of generating an encoded data packet over GF(2). The method comprises determining 303 a plurality of data packets in dependence on a Latin rectangle, wherein the plurality of data packets have equal length; and generating 305 an encoded data packet by bitwise XORing the determined plurality of data packets. The efficiency of encoding, decoding, and transmission over a network of data packets are all improved, as well as the security properties of the transmitted information.
TL;DR: From researches is made a better algorithm than Generalizations of Karatsuba Algorithm and it is expected to leads to generalize the form n-TermKaratsuba-Like Formulae.
Abstract: The process of multiplications in Finite Fields required huge resources If those implemented in the Elliptic Curve Cryptography (ECC), the need of resources would be inflated because those processes were enough to dominated in every ECC level There were many researches to found methods that could reduce the number of multiplications One method that was well-known and developed was Karatsuba ofman algorithm, where the development of this research were General Karatsuba Multiplier, Efficient Multiplier in GF((2n)4) Generalizations of Karatsuba Algorithm, 5-6-7 Term Karatsuba-Like Formulae, and improved modulo functions From those researches is made a better algorithm than Generalizations of Karatsuba Algorithm and it is expected to leads to generalize the form n-Term Karatsuba-Like Formulae
TL;DR: Theoretical and experimental results demonstrated that the proposed method allows decreasing the number of needed processing operations for exponentiation in more than 7 times, which opens wide possibilities of parallelizing the process of exponentiation.
Abstract: In this paper, a new method proposed for fast exponentiation in Galois fields with processing of several bits of the exponent and with using two tables of precomputations. They also describe, analyze and illustrate in details with examples the proposed technology of exponentiation in Galois fields. Choice of an optimal computation complexity is the number of simultaneously processed bits of the exponent. Theoretical and experimental results demonstrated that the proposed method allows decreasing the number of needed processing operations for exponentiation in more than 7 times. The obtained results proved that along with decreasing the number of operations, the proposed method opens wide possibilities of parallelizing the process of exponentiation. The proposed method of fast exponentiation in Galois fields oriented for using in cryptographic systems for information protection. Keyword: Galois Fields, Precomputations, exponentiation, cryptographic, complexity, discrete logarithm
TL;DR: This paper presents the efficient hardware implementation of cryptoprocessors that carry out the scalar multiplication kP over finite field GF(2 163) using two digit-level multipliers using Gaussian normal basis (GNB) representation.
Abstract: This paper presents an efficient hardware implementation of cryptoprocessors that perform the scalar multiplication kP over a finite field GF(2163) using two digit-level multipliers. The finite field arithmetic operations were implemented using the Gaussian normal basis (GNB) representation, and the scalar multiplication kP was implemented using the Lopez-Dahab algorithm, the 2-non-adjacent form (2-NAF) halve-and-add algorithm and the wNAF method for Koblitz curves. The processors were designed using a VHDL description, synthesized on the Stratix-IV FPGA using Quartus II 12.0 and verified using SignalTAP II and Matlab. The simulation results show that the cryptoprocessors provide a very good performance when performing the scalar multiplication kP. In this case, the computation times of the multiplication kP using the Lopez-Dahab algorithm, 2-NAF halve-and-add algorithm and 16NAF method for Koblitz curves were 13.37 µs, 16.90 µs and 5.05 µs, respectively.
TL;DR: A new bit-serial Montgomery multiplier architecture is proposed using a linear feedback shift register (LFSR) and complexity comparison has shown that the proposed multiplier is comparable to or has certain advantage over the best among the existing similar works in the literature.
Abstract: Montgomery multiplication in finite fields has been paid more and more attention recently since it shows advantageous over regular multiplication in speeding up elliptic curve cryptography based network security protocols. In this paper, a most-significant-bit first bit-serial Montgomery multiplication algorithm in GF(2m) using weakly dual bases is proposed for the first time. Then a new bit-serial Montgomery multiplier architecture is proposed using a linear feedback shift register (LFSR). Complexity comparison has shown that the proposed multiplier is comparable to or has certain advantage over the best among the existing similar works found in the literature.
TL;DR: A DNA computing model to compute integer power over finite field GF(2n), where the computation tiles performing five different functions assemble into the seed configuration with inputs to figure out the result.
Abstract: DNA-based cryptography is a new developing interdisciplinary area which combines cryptography, mathematical modeling, biochemistry and molecular biology. It is still an open question that how to implement the arithmetic operations used in cryptosystem based on DNA computing. This paper proposes a DNA computing model to compute integer power over finite field GF(2n). The computation tiles performing five different functions assemble into the seed configuration with inputs to figure out the result. It is given that how the computation tiles be coded in bits and how assembly rules work. The assembly time complexity is 2n2+n-1 and the space complexity is n4+n3. This model requires 6436 types of computation tiles and 12 types of boundary tiles.
TL;DR: A low complexity Montgomery multiplier in GF(2m) using Linear Feedback Shift Registers (LFSR) is proposed for the class of fields generated with an irreducible all-one polynomial and is shown to be lower than the best among existing works found in the literature.
Abstract: Montgomery multiplication (MM) in GF(2
m
) is a popular technique to speedup network security protocols such like digital signature provided by elliptic curve cryptography (ECC) and key distribution supported by ECC or Diffie-Hellman. MM in GF(2
m
) is defined as ABr
-1
mod f(x), where f(x) is the irreducible polynomial of degree m and r is a fixed element in the field. In this paper, a low complexity Montgomery multiplier in GF(2
m
) using Linear Feedback Shift Registers (LFSR) is proposed for the class of fields generated with an irreducible all-one polynomial. The latency of the proposed architecture is shown to be lower than the best among existing works found in the literature. Furthermore, highly regular architecture in LFSR and available LFSR based low power techniques make our proposal more attractive than non-LFSR architectures. On the other hand, the constraint of the new multiplier is that it will not have speed advantage when the system clock rate is higher than 2GHz.
TL;DR: The simulation results show that the cryptoprocessors present a very good performance using low area, and the computation times for calculating the scalar multiplication for w = 2, 4, 8 and 16 were 9.05 μs.
Abstract: This paper presents the design of cryptoprocessors using two multipliers over finite field GF(2163) with digit-level processing. The arithmetic operations were implemented in hardware using Gaussian Normal Bases (GNB) representation and the scalar multiplication kP was performed on Koblitz curves using window-τNAF algorithm with w = 2, 4, 8 and 16. The cryptoprocessors were designed using VHDL description, synthesized on the Stratix-IV FPGA using Quartus II 12.0, and verified using SignalTAP II and Matlab. The simulation results show that the cryptoprocessors present a very good performance using low area. In this case, the computation times for calculating the scalar multiplication for w = 2, 4, 8 and 16 were 9.88, 7.37, 6.17 and 5.05 μs.
TL;DR: This paper presents a measurement result of a bit-parallel multiplier over GF(24) using a secure dual-rail charge-sharing symmetric adiabatic logic to analyze the correlation of the current-to-data dependency in respect to the given input signal transitions for resistance against power analysis attack.
Abstract: This paper presents a measurement result of a bit-parallel multiplier over GF(2
4
) using a secure dual-rail charge-sharing symmetric adiabatic logic. The output functionality and the supply current traces of the fabricated LSI chip are measured in order to analyze the correlation of the current-to-data dependency in respect to the given input signal transitions for resistance against power analysis attack. Furthermore, the verification of the output signals of the LSI chip is measured at dynamic power clock frequency from 0.5-5 MHz.
TL;DR: A novel and efficient architecture for a versatile polynomial basis multiplier over GF(2m) is dealt with, which has its flexibility on arbitrary Galois field sizes and improvement of maximum clock frequency due to the lessening of critical path delay.
Abstract: A novel and efficient architecture for a versatile polynomial basis multiplier over GF(2 m ) is dealt with. The value m; of the irreducible polynomial degree, can be changed and so the multiplier can be configured and programmed. Thus versatility of the multiplier refers to its reconfigurable property. The architecture deals with an efficient execution of the Most Significant Bit (MSB)-First, bit serial multiplication for different operand lengths. The attractive features of the proposed architecture are (a) its flexibility on arbitrary Galois field sizes, (b) its hardware simplicity which results in small area implementation, (c) Low power consumption by employing the gated clock technique, power gating and Multi Vth optimization techniques (d) improvement of maximum clock frequency due to the lessening of critical path delay.
TL;DR: The proposed digit-serial architecture makes the hardware implementations of cryptographic systems more high-performance, and are thus much suitable for efficient applications such as the elliptic curve cryptography (ECC) and pairing computation.
Abstract: Recently, a shifted polynomial basis is a variation of polynomial basis representation. Such kind basis provides better performance in designing bit-parallel and subquadratic space complexity multipliers over binary extension fields. In this paper, we study a new shifted polynomial basis multiplication algorithm to implement a hybrid digit-serial multiplier. The proposed algorithm effectively integrates classic schoolbook multiplication, Karatsuba multiplication algorithms to reduce computational complexity, and the modular multiplication with the shifted polynomial basis reduction. We note that, comparably, the proposed architecture achieves lower computation time and higher bit-throughput compared to the best known digit-serial multipliers. Our proposed multipliers can be modular, regular, and suitable for very-large-scale integration (VLSI) implementations. The proposed digit-serial architecture makes the hardware implementations of cryptographic systems more high-performance, and are thus much suitable for efficient applications such as the elliptic curve cryptography (ECC) and pairing computation.
TL;DR: The performance evaluation shows better efficiencies of the proposed parallel algorithms compared to the traditional algorithms, and time complexities and speedup ratios of the parallel algorithms and the sequential algorithms are calculated to make the quantitative comparison.
Abstract: It becomes more and more important to design high-speed parallel cryptographic algorithms due to a growing need for information security. Conic curves cryptography is a new developing direction in the field of information security in recent years and there are less works focused on the parallel encryption algorithms for conic curves crypto system. This paper proposes four parallel algorithms for conic curves cryptosystem over finite field GF(2 n ). One parallel algorithm of modular-multiplication is designed by analyzing its data dependency and making some modifications of several steps. In order to figure out the average runtime, we consider the probability distributions of different cases to compute the mathematical expectation. The operations of point-addition, point-double and pointmultiplication, three fundamental point operations in conic curves cryptosystem over finite field GF(2 n ), are paralleled based on this parallel algorithm of modular-multiplication and two parallel algorithms we proposed before. Time complexities and speedup ratios of the parallel algorithms and the sequential algorithms are calculated to make the quantitative comparison. The performance evaluation shows better efficiencies of the proposed parallel algorithms compared to the traditional algorithms.
TL;DR: Results show that serial/sequential multipliers require less area and lead to a small computational drawback, whereas parallel/combinational multipliers consume more area but are faster, thus a trade-off between area and speed should be obtained using hybrid multipliers.
Abstract: Finite field multiplication is one of the most important operation in the finite field arithmetic. This paper presents a study that compares the architectures and the performances of some of the major GF (2m) multiplication algorithms. Hardware implementation on a reconfigurable circuit (FPGA) allowed assessment of the performance of architecture multipliers in terms of area and time complexities. Results show that serial/sequential multipliers require less area and lead to a small computational drawback, whereas parallel/combinational multipliers consume more area but are faster. Thus a trade-off between area and speed should be obtained using hybrid multipliers.
TL;DR: An efficient architecture for Finite Field Arithmetic with Montgomery multiplier is presented and the advantage of ECC (Elliptic Curve Cryptography), it is more secure for wireless communication.
Abstract: An efficient architecture for Finite Field Arithmetic with Montgomery multiplier is presented. Efficient implementation of Montgomery multiplier in the finite field arithmetic yields less area, power and delay. The advantage of ECC (Elliptic Curve Cryptography), it is more secure for wireless communication. Implemented with Xilinx ISE 13.2 and simulated with Modelsim. Keywords—Finite field Arithmetic, Montgomery Multiplier Elliptic Curve Cryptography, FPGA.
TL;DR: A new fast way for computing the minimal polynomial of an element in GF(pn) is derived through combining the structure with the trace computation.
Abstract: The paper described a novel method of computing the trace of an element from a given finite field GF(pn) By applying the Newton Formula, this method shows a simplex and linear structure of traces of different elements, and the usual procedure of computing the products and sums of elements of GF(pn) is not required The paper also studied The structure of the minimal polynomial of an element from GF(pn) A new fast way for computing the minimal polynomial of an element in GF(pn) is derived through combining the structure with the trace computation
TL;DR: A new PUF based on an AES Sbox GF(24) inversion functions that uses differences in power consumption at output nodes due to process variation to improve the properties of a PUF.
Abstract: As the usage of PUFs such as in digital fingerprinting is growing dramatically, we've developed a new PUF based on an AES Sbox GF(24) inversion functions that uses differences in power consumption at output nodes due to process variation It has several advantages such as being able to improve the properties of a PUF and it requires little additional resources
TL;DR: The RTL code is compiled and synthesized using Encounter RTL Compiler tool provided by the Cadence Design Systems and the multiplier which is proposed has less area and minimum number of gates.
Abstract: Finite field GF (2m) arithmetic plays a crucial role in applications like Computer algebra, Coding theory and Elliptic Curve Cryptography (ECC). The GF (2m) multiplication is considered significant building block among the finite field arithmetic operations. A new shift and add polynomial basis multiplier over GF (2m) is explained in this paper for irreducible GF (2m) generating polynomials f (x) = xm +xkt + xk t−1 + …… xk1 + 1. The multiplier which is proposed has less area and minimum number of gates. In this paper the RTL code is compiled and synthesized using Encounter RTL Compiler tool provided by the Cadence Design Systems. Synthesis is carried out using the TSMC 135nm, 65nm and 40nm technology files.
TL;DR: It is shown that by using parity prediction, error detection can be very simply constructed in hardware by employing a Gaussian normal basis multiplier.
Abstract: In this paper, we proposed an error detection in Gaussian normal basis multiplier over . It is shown that by using parity prediction, error detection can be very simply constructed in hardware. The hardware overheads are only one AND gate, n+1 XOR gates, and one 1-bit register in serial multipliers, and so n AND gates, 2n-1 XOR gates in parallel multipliers. This method are detect in odd number of bit fault in C = AB.
TL;DR: A new public key system based on polynomials over fields GF (2) with sufficiently large k is developed and has all features of ordinary public key schemes such as public key encryption and digital signatures.
Abstract: In this paper a new public key system based on polynomials over fields GF (2) is developed. The security of the system is based on the difficulty of finding discrete logarithms over GF (2) with sufficiently large k. The presented system has all features of ordinary public key schemes such as public key encryption and digital signatures. The security and implementation aspects of the presented system are also introduced along with comparison with other well known public key systems.
TL;DR: This paper describes the implementation of multipliers over finite field with different orders which make use of proposed finite field accumulator (FFA), and the relation is reduced in two forms to obtain the two finite field multiplier architectures.
Abstract: This paper describes the implementation of multipliers over finite field with different orders which make use of proposed finite field accumulator (FFA). Using the relation for finite field multiplication, various blocks along with FFA necessary to perform multiplication are derived to form unique multipliers. The relation is reduced in two forms to obtain the two finite field multiplier architectures. First one is bit serial parallel multiplier and Digit serial parallel multiplier. These obtained finite field multipliers are later verified by taking the different orders of m.
TL;DR: This paper proposes an efficient scalar multiplication using iterative Karatsuba-Offman multiplication algorithm (KMA) over GF(2m) based on the Xilinx Virtex-6 FPGA implementation for the NIST recommended binary field.
Abstract: Public key cryptography (PKC) is highly secure against threats compared to symmetric key cryptography (SKC). One of the PKC techniques, Elliptic curve cryptography has been gaining wider attention as compared to the popular RSA due to its lesser key size requirements in order to provide a similar security level. This paper details the hardware implementation modular multiplicative over binary field GF(2m). Efficient scalar point multiplication is a crucial part in elliptic curve cryptography. A scalar point multiplication consists of point doubling and point addition operations. Both of these operations inherently depend on addition, multiplication, squaring and inversion. Among these, the inversion operation is the most time consuming one. The computation of multiplicative inverse primarily consists of modular multiplication and modular squaring operations. This paper proposes an efficient scalar multiplication using iterative Karatsuba-Offman multiplication algorithm (KMA) over GF(2m). The performance comparison is based on the Xilinx Virtex-6 FPGA implementation for the NIST recommended binary field.