TL;DR: High-performance and high-speed field-programmable gate array (FPGA) implementations of polynomial basis Itoh–Tsujii inversion algorithm (ITA) over GF(2 m) constructed by irreducible trinomials and pentanomials are presented and improvement in the proposed architecture in terms of speed and performance is verified.
Abstract: In this study high-performance and high-speed field-programmable gate array (FPGA) implementations of polynomial basis Itoh–Tsujii inversion algorithm (ITA) over GF(2 m ) constructed by irreducible trinomials and pentanomials are presented. The proposed structures are designed by one field multiplier and k -times squarer blocks or exponentiation by 2 k , where k is a small positive integer. The k -times squarer blocks have an efficient tree structure with low critical path delay, and the multiplier is based on a proposed high-speed digit-serial architecture with minimum hardware resources. Furthermore, to reduce the computation time of ITA, the critical path of the circuit is broken to finer path using several registers. The computation times of the structure on Virtex-4 FPGA family are 0.262, 0.192 and 0.271 µs for GF(2163), GF(2193) and GF(2233), respectively. The comparison results with other implementations of the polynomial basis Itoh–Tsujii inversion algorithm verify the improvement in the proposed architecture in terms of speed and performance.
TL;DR: A comparative overview of the recent hardware architectures of FF multipliers for polynomial bases over GF(2m) is provided by classifying the most recent state-of-the-art research practices into three categories: bit-serial, bit-parallel and digit-serial multiplier.
Abstract: In elliptic curve cryptography (ECC), hardware architectures of finite field (FF) multipliers are frequently proposed for polynomial as well as for normal bases representations over GF(2m). Although the polynomial bases provide efficient FF multiplication as compared to normal bases, the performance of the entire elliptic cryptosystem mainly depends upon its FF multiplier. Consequently, this paper provides a comparative overview of the recent hardware architectures of FF multipliers for polynomial bases over GF(2m). This is achieved by classifying the most recent state-of-the-art research practices into three categories: bit-serial, bit-parallel and digit-serial multipliers. The comparison of multiple techniques in this article enables the designer to select a suitable multiplier according to different application requirements such as high speed/performance, constrained environments and high throughput/area applications.
TL;DR: Comparison results verify that the proposed architecture of Gaussian normal basis (GNB) multiplier over binary finite field GF(2 m) has better performance in terms of speed and hardware utilisation.
Abstract: In this paper, an efficient high-speed architecture of Gaussian normal basis (GNB) multiplierover binary finite field GF(2
m
) is presented. The structure is constructed by using some regular modules for computation of exponentiation by powers of 2 and low-cost blocks for multiplication by normal elements of the binary field. For the powers of 2 exponents, the modules are implemented by some simple cyclic shifts in the normal basis representation. As a result, the multiplier has a simple structure with a low critical path delay. The efficiency of the proposed multiplier is examined in terms of area and time complexity based on its implementation on Virtex-4 field programmable gate array family and also its application specific integrated circuit design in 180 nm complementary metal-oxide-semiconductor technology. Comparison results with other structures of the GNB multiplier verify that the proposed architecture has better performance in terms of speed and hardware utilisation.
TL;DR: This paper proposes systolic vector m-bit GF(p) and GF ( 2 m ) multipliers, where four numbers of m 2 -bit GF multiplications can be done in parallel, where the m can be varied from 2 to the maximum allowable value.
TL;DR: In this paper, the Boyar-Peralta (BP) heuristic does not take into account the matrix density and shows that the BP heuristic is not optimal on dense linear systems.
Abstract: Minimizing the Boolean circuit implementation of a given cryptographic function is an important issue. A number of papers [1] , [2] , [3] , [4] only consider cancellation-free straight-line programs for producing small circuits over GF(2). Cancellation is allowed by the Boyar–Peralta (BP) heuristic [5] , [6] . This yields a valuable tool for practical applications such as building fast software and low-power circuits for cryptographic applications, e.g. AES [5] , [7] , HMAC-SHA-1 [8] , PRESENT [9] , GOST [9] , and so on. However, the BP heuristic does not take into account the matrix density. In a dense linear system the rows can be computed by adding or removing a few elements from a “common path” that is “close” to almost all rows. The new heuristic described in this paper will merge the idea of “cancellation” and “common path”. An extensive testing activity has been performed. Experimental results of the new and the BP heuristic were compared. They show that the Boyar–Peralta results are not optimal on dense systems.
TL;DR: An improved parallel block Lanczos (IBL) algorithm is proposed to reduce the communication cost of solving large and sparse linear systems over GF(2), which is one of the most time-consuming steps of the GNFS algorithm.
TL;DR: From the synthesis results, it is shown that, based on iterative block recombination, the proposed TBTMVP-based multiplier involves less area, less area-delay product, and higher throughput compared with the existing digit-serial multipliers.
Abstract: In this paper, we have shown that a regular Toeplitz matrix-vector product (TMVP) can be transformed into a Toeplitz block TMVP (TBTMVP) using a suitable permutation matrix. Based on the TBTMVP representation, we have proposed a new $(a,b)$ -way TBTMVP decomposition algorithm for implementing a digit-serial multiplication. Moreover, it is shown that, based on iterative block recombination, we can improve the space complexity of the proposed TBTMVP decomposition. From the synthesis results, we have shown that the proposed TBTMVP-based multiplier involves less area, less area-delay product, and higher throughput compared with the existing digit-serial multipliers.
TL;DR: Building on prior work, two fully parallel polynomial n× n multipliers are presented with O(log2 n) latency, which use lookup tables to store modular reduction terms.
Abstract: Operations over polynomial Galois fields GF(2n) are employed in a variety of cryptographic systems. These operations include multiplication and reduction with respect to an irreducible polynomial modulus. Fast parallel multipliers can be designed but require substantial die area. Building on prior work, two fully parallel polynomial n× n multipliers are presented with O(log 2 n) latency, which use lookup tables to store modular reduction terms.
TL;DR: A uniform belief propagation (BP) processor which can take care of both massive MIMO detection and GF(2n) LDPC decoding and the unified BP representation for BP detection and decoding is first proposed.
Abstract: Both massive multiple-input multiple-output (MI-MO) technique and low-density parity-check (LDPC) codes have been considered as indispensable parts of 5G wireless. To improve the hardware efficiency and flexibility, a uniform belief propagation (BP) processor which can take care of both massive MIMO detection and GF(2n) LDPC decoding is proposed. To this end, the unified BP representation for BP detection and decoding is first proposed. Folding technique has been considered for further improvement of hardware efficiency and flexibility. For better illustration, a folded prototype of (8, 4) GF(2) LDPC coded 2×2 4-QAM MIMO system is implemented with FPGA. Numerical simulations and implementation results have demonstrated the feasibility of the proposed design methodology.
TL;DR: The new hardware architecture for the NB multiplier over GF has the best characteristics of area complexity and time complexity presented by Reyhani and Azarderakhsh and Reyhani.
Abstract: In this paper, we propose two new algorithms and their hardware implementations for the normal basis multiplication over GF( $p^{m}$ ), where $p \in \{2, 3\}$ . In this case, the proposed multipliers are designed using serial and digit-serial hardware architectures. The normal basis multipliers over GF( $2^{m}$ ) and GF( $3^{m}$ ) are based on two proposed algorithms to compute the multiplication matrices $T_{k}$ in order to speed-up the execution time and to reduce the area resources. It can be seen that the new hardware architecture for the NB multiplier over GF( $2^{m}$ ) has the best characteristics of area complexity presented by Reyhani [16] and time complexity presented by Azarderakhsh and Reyhani [31] . The proposed hardware architectures for the normal basis multipliers over GF(2163), GF(2233), GF(2283),GF(2409), GF(389) and GF(3233) were described in VHDL, and simulated and synthesized using Modelsim and Quartus Prime v16, respectively.
TL;DR: This paper presents a computer algebra based technique that extracts the irreducible polynomial P(x) used in the implementation of a multiplier in GF(2m) using NIST-recommended polynomials and optimal poynomials for different microprocessor architectures.
Abstract: Current techniques for formally verifying circuits implemented in Galois field (GF) arithmetic are limited to those with a known irreducible polynomial P(x). This paper presents a computer algebra based technique that extracts the irreducible polynomial P(x) used in the implementation of a multiplier in GF(2m). The method is based on first extracting a unique polynomial in Galois field of each output bit independently. P(x) is then obtained by analyzing the algebraic expression in GF(2m) of each output bit. We demonstrate that this method is able to reverse engineer the irreducible polynomial of an n-bit GF multiplier in n threads. Experiments were performed on Mastrovito and Montgomery multipliers with different P(x), including NIST-recommended polynomials and optimal polynomials for different microprocessor architectures.
TL;DR: A number of algorithms and optimization techniques to speedup computations in binary extension fields over GF(2) of degree 2 ≤ d < 2048, and the class of functions for optimal modular reduction algorithms for each of the listed polynomials.
Abstract: In this paper we present a number of algorithms and optimization techniques to speedup computations in binary extension fields over GF(2). Particularly, we consider multiplication and modular reduction solutions. Additionally, we provide the table of optimal binary primitive polynomials over GF(2) of degree 2 ≤ d < 2048, and the class of functions for optimal modular reduction algorithms for each of the listed polynomials. We give implementation examples targeting Intel CPU architectures, but generic results can be applied on other platforms as well.
TL;DR: A modified modular division algorithm in the binary finite field GF(2m) is presented and a hardware implementation is designed to verify the algorithm's performance with Verilog HDL and simulate it in GF( 2m) to compare with other algorithms on clock cycles consumption.
Abstract: Modular inversion is the most complicated operation in elliptic curve cryptography(ECC) Based on the Extended Euclidean Algorithm(EEA), this paper presents a modified modular division algorithm in the binary finite field GF(2m) Furthermore, this paper designs a hardware implementation to verify the algorithm's performance with Verilog HDL, we also simulate it in GF(2m) to compare with other algorithms on clock cycles consumption
TL;DR: Wang et al. as mentioned in this paper provided GF(2 ) finite-field multi-threshold progressive secrete image sharing and reconstruction methods, where a secret key is shared by combining a random participation value, and an MD5 value corresponding to the shared secret key and participation value is published to prevent cheating.
Abstract: The invention provides GF(2 ) finite-field multi-threshold progressive secrete image sharing and reconstruction methods. A secret key is shared by combining a random participation value, and an MD5 value corresponding to the shared secrete key and participation value is published to prevent cheating; 8*8 frequency domain transformation is carried out on a secret image, sub-block frequency domain coefficients are quantified randomly, a specific code length distribution list is represented in a binary manner, multiple division frequency bands are formed according to a similar Z-shaped scanning sequence and frequency band recombination, and a frequency band backup is formed by scrambling; and frequency band, frequency band backup and authentication information is shared in multiple thresholds in the GF(2 ) finite-field, and 2-bit authentication information is added to reconstruct a distribution shadow image. During recovery, the MD5 value is detected, a reconstructable frequency band is determined by combining the multiple thresholds, primary and secondary frequency band backup tables are reconstructed via dual authentication, a corresponding frequency band is reconstructed, and the reconstructed frequency band is used to reconstruct a secret image. Compared with existing methods, the distribution shadow image can be used fully to share the secret image progressively, and the visual reconstruction quality is improved.
TL;DR: This study presents a novel hybrid multiplier for Gaussian normal basis (GNB) in GF(2 m ) which combines subquadratic and quadratic structures and shows that the proposed hybrid multiplier can save ∼18% space complexity and 12% time complexity than the existing GNB multiplier with pure TMVP decomposition.
Abstract: In recent years, subquadratric-and-quadratric Toeplitz matrix–vector product (TMVP) computations are widely used for the implementation of binary field multiplication in elliptic curve cryptography. Pure subquadratric TMVP structure involves significantly less space complexity and long computational delay, while quadratric TMVP structure involves larger space complexity and less computation delay. To optimise the tradeoff between time and space complexities, this study presents a novel hybrid multiplier for Gaussian normal basis (GNB) in GF(2 m ) which combines subquadratic and quadratic structures. From the theoretical analysis, it is shown that the proposed hybrid multiplier can save ∼18% space complexity and 12% time complexity than the existing GNB multiplier with pure TMVP decomposition.
TL;DR: It is shown how binary sequences can be associated with automatic composition of monophonic pieces and multilevel block-codes are used in a new approach of e-music composition, engendering a particular style as an e-composer.
Abstract: It is shown how binary sequences can be associated with automatic composition of monophonic pieces. We are concerned with the composition of e-music from finite field structures. The information at the input may be either random or information from a black-and-white, grayscale or color picture. New e-compositions and music score are made available, including a new piece from the famous Lenna picture: the score of the e-music > The corresponding stretch of music score are presented. Some particular structures, including clock arithmetic (mod 12), GF(7), GF(8), GF(13) and GF(17) are addressed. Further, multilevel block-codes are also used in a new approach of e-music composition, engendering a particular style as an e-composer. As an example, Pascal multilevel block codes recently introduced are handled to generate a new style of electronic music over GF(13).
TL;DR: PB and modified polynomial basis (MPB) define a double basis multiplication, where MPB is transformed by PB when F(x) = xm + xn + 1, n ≥, and it is found that the double multiplication can transformed into Toeplitz matrix vector produc-t(TMVP).
Abstract: Finite field multiplication plays a important roles in the applications of elliptic curve cryptography. In this paper, we use PB and modified polynomial basis(MPB) define a double basis multiplication, where MPB is transformed by PB when F(x) = xm + xn + 1, n ≥. m/2 We find that the double multiplication can transformed into Toeplitz matrix vector produc-t(TMVP). To reduce time and space complexities, we recursively use two-way TMVP approach calculate the product of double basis multiplication. We respectively implement shifted addition algorithm and TMVP approach in hardware chip FPGA Kintex7-xc7k325T. The proposed structure can obtain significantly lower area-delay product.
TL;DR: The proposed and existing multipliers are synthesised and compared using 45 nm CMOS technology, where the throughputs of the proposed parallel and serial vector \(GF((2^8)^2) multipliers) are greater than Karatsuba based multiplier design [11] respectively.
Abstract: Composite Galois Field \(GF((2^m)^n)\) multiplications denote the multiplication with extension field over the ground field \(GF(2^m)\), that are used in cryptography and error correcting codes. In this paper, composite versatile and vector \(GF((2^m)^2)\) multipliers are proposed. The proposed versatile \(GF((2^m)^2)\) multiplier design is used to perform the \(GF((2^x)^2)\) multiplication, where \(2\le x\le m\). The proposed vector \(GF((2^m)^2)\) multiplier design is used to perform \(2^k\) numbers of \(GF((2^{\frac{m}{2^k}})^2)\) multiplications in parallel, where throughput is comparatively higher than other designs and \(k\in \{0, 1, ...(log_{2}m)-1) \}\). In both the works, the hardware cost is the trade-off while the flexibility is high. The proposed and existing multipliers are synthesised and compared using 45 nm CMOS technology. The throughputs of the proposed parallel and serial vector \(GF((2^8)^2)\) multipliers are \(72.7\%\) and \(53.62\%\) greater than Karatsuba based multiplier design [11] respectively.
TL;DR: This letter presents a low-complexity semi-systolic array implementation for polynomial multiplication over GF(2m) based on two-level parallel computing approach to reduce the cell delay, latency, and area-time (AT) complexity.
Abstract: This letter presents a low-complexity semi-systolic array implementation for polynomial multiplication over GF(2m). We consider finite field Montgomery modular multiplication (MMM) based on two-level parallel computing approach to reduce the cell delay, latency, and area-time (AT) complexity. Compared to related multipliers, the proposed scheme yields significantly lower AT complexity.
TL;DR: Substitution Boxs have been generated from Irreducible or Reducible Polynomials over Galois field GF(p q ).
Abstract: Substitution Box or S-Box had been generated using 4-bit Boolean Functions (BFs) for Encryption and Decryption Algorithm of Lucifer and Data Encryption Standard (DES) in late sixties and late seventies respectively. The S-Box of Advance Encryption Standard have also been generated using Irreducible Polynomials over Galois field GF(2 8 ) adding an additive constant in early twenty first century. In this paper Substitution Boxes have been generated from Irreducible or Reducible Polynomials over Galois field GF(p q ). Binary Galois fields have been used to generate Substitution Boxes. Since the Galois Field Number or the Number generated from coefficients of a polynomial over a particular Binary Galois field (2 q ) is similar to log 2 q+1 bit BFs. So generation of log 2 q+1 bit SBoxes is Possible. Now if p = prime or non-prime number then generation of S-Boxes is possible using Galois field GF (p q ). where, q = p-1.
TL;DR: This paper identified an efficient performance of concurrent algorithm using complementary recoding over GF(2 ) for scalar multiplication in the polynomial basis (PB) to use in an elliptic curve cryptosystems, which enhances security.
Abstract: Since the introduction of public-key cryptography by Diffe and Hellman in 1976, the potential for the use of the discrete logarithm problem in public-key cryptosystems has been recognized. Although the discrete logarithm problem as first employed by Diffe and Hellman was defined explicitly as the problem of finding logarithms with respect to a generator in the multiplicative group of the integers module a prime, this idea can be extended to arbitrary groups and in particular, to elliptic curve groups. The resulting public – key systems provide relatively small block size, high speed, and high security. This paper identified an efficient performance of concurrent algorithm using complementary recoding over GF(2 ) for scalar multiplication in the polynomial basis (PB) to use in an elliptic curve cryptosystems, which enhances security. So this scheme is of less computation cost which is valuable in applications with limited memory, communications bandwidth or computing power.
TL;DR: An algorithm entitled Composite Algorithm using both multiplication and division over Galois fields have been demonstrated to generate all monic IPs over extended Galois Field GF(p^q) for large value of both p and q.
Abstract: Irreducible Polynomials (IPs) have been of utmost importance in generation of substitution boxes in modern cryptographic ciphers. In this paper an algorithm entitled Composite Algorithm using both multiplication and division over Galois fields have been demonstrated to generate all monic IPs over extended Galois Field GF(p^q) for large value of both p and q. A little more efficient Algorithm entitled Multiplication Algorithm and more too Division Algorithm have been illustrated in this Paper with Algorithms to find all Monic IPs over extended Galois Field GF(p^q) for large value of both p and q. Time Complexity Analysis of three algorithms with comparison to Rabin’s Algorithms has also been exonerated in this Research Article.
TL;DR: In this paper, the number of monic irreducible polynomials in even degree polylogarithm with first four coefficients having prescribed values has been evaluated.
Abstract: In this paper we evaluate the number of monic irreducible polynomials in ð½2[ð¥] of even degree ð whose first four coefficients have prescribed values. This problem first studied in [7] and some approximate results are obtained. Our results extends the results given in [7] in some cases.
TL;DR: The proposed scalable and unified digit-serial structure, with low space complexity to perform multiplication and inversion operations in inline-formula, is more suitable for constrained implementations of cryptographic primitives in ultra-low power devices, such as wireless sensor nodes and radio frequency identification devices.
Abstract: This paper proposes a scalable and unified digit-serial structure, with low space complexity to perform multiplication and inversion operations in $GF(2^{m})$ , based on the bit serial multiplication algorithm and the previously modified extended Euclidean inversion algorithm. In this structure, the multiplier and inverter shares the data-path and thus saves more area resources and power than the case of using separate data-path for each operation. Also, this structure is suitable for fixed size processor that only reuse the core and does not require to modulate the core size when the field size $m$ is modified. This structure is extracted by applying a nonlinear methodology that gives the designer more flexibility to control the processing element workload. Implementation results for of the proposed scalable and unified digit-serial design and previously reported efficient designs show that the proposed scalable structure achieves a significant reduction in area ranging from 64.3% to 85.5% and also achieves a significant saving in energy ranging from 21.9% to 92.5% over them, but it has lower throughput compared with them. This makes the proposed design more suitable for constrained implementations of cryptographic primitives in ultra-low power devices, such as wireless sensor nodes and radio frequency identification devices.
TL;DR: A recursive algorithm for the multiplication is derived, and used to design a regular and localized bit-level dependence graph (DG) for systolic computation that is designed specifically for scalability of throughput and hardware-complexity to meet the area-time trade-off in resource-constrained applications.
Abstract: In this paper, an efficient recursive formulation is suggested for systolic implementation of canonical basis finite field multiplication over GF(2
m
) based on irreducible AOP. We have derived a recursive algorithm for the multiplication, and used that to design a regular and localized bit-level dependence graph (DG) for systolic computation. The bit-level regular DG is converted into a fine-grained DG by node-splitting, and mapped that into a parallel systolic architecture. Unlike most of the existing structures, it does not involve any global communications for modular reduction. The proposed bit-parallel systolic structure has the same cycle time as that of the best existing bit-parallel systolic structure, but involves significantly less number of registers. The proposed bit-parallel design has a scalable latency of l + ⌈log
2
s⌉ +1 cycles which is considerably low compared with those of existing systolic designs. Moreover, the proposed time-multiplexed structure is designed specifically for scalability of throughput and hardware-complexity to meet the area-time trade-off in resource-constrained applications while maintaining or reducing the overall latency. The ASIC synthesis report shows that the proposed bit-parallel structures offers nearly 30% saving of area and nearly 38% saving of power consumption over the best of the existing AOP-based systolic finite field multiplier.