TL;DR: The resulting scalar multiplier is the fastest reported implementation for generic curves over binary finite fields and leads to area requirements that is significantly lesser compared to other high-speed implementations.
Abstract: In this paper we present an FPGA implementation of a high-speed elliptic curve scalar multiplier for binary finite fields. High speeds are achieved by boosting the operating clock frequency while at the same time reducing the number of clock cycles required to do a scalar multiplication. To increase clock frequency, the design uses optimized implementations of the underlying field primitives and a mathematically analyzed pipeline design. To reduce clock cycles, a new scheduling scheme is presented that allows overlapped processing of scalar bits. The resulting scalar multiplier is the fastest reported implementation for generic curves over binary finite fields. Additionally, the optimized primitives leads to area requirements that is significantly lesser compared to other high-speed implementations. Detailed implementation results are furnished in order to support the claims.
TL;DR: A novel image encryption algorithm is proposed by using the linear fractional transformation (LFT) substitution boxes and tangent-delay for elliptic reflecting cavity (TD-ERCS) chaotic sequence to improve correlation analysis, UACI analysis, and NPCR analysis of proposed algorithm.
Abstract: In this article, we propose a novel image encryption algorithm by using the linear fractional transformation (LFT) substitution boxes and tangent-delay for elliptic reflecting cavity (TD-ERCS) chaotic sequence. In addition, we apply the proposed approach to an image and come to know that the correlation analysis, UACI analysis, and NPCR analysis of proposed algorithm are much improved than many existing techniques and very easy to put into practice.
TL;DR: In this paper, the authors proposed a throughput optimal triangular network coding scheme over GF(2), which can supply unlimited number of innovative packets and the decoding involves the simple back substitution, and showed that such a coding scheme provides an efficient solution to the index coding problem and its lower computation and energy cost makes it suitable for practical implementation on devices with limited processing and energy capacity.
Abstract: The index coding problem is a fundamental transmission problem which occurs in a wide range of multicast networks. Network coding over a large finite field size has been shown to be a theoretically efficient solution to the index coding problem. However the high computational complexity of packet encoding and decoding over a large finite field size, and its subsequent penalty on encoding and decoding throughput and higher energy cost makes it unsuitable for practical implementation in processor and energy constraint devices like mobile phones and wireless sensors. While network coding over GF(2) can alleviate these concerns, it comes at a tradeoff cost of degrading throughput performance. To address this tradeoff, we propose a throughput optimal triangular network coding scheme over GF(2). We show that such a coding scheme can supply unlimited number of innovative packets and the decoding involves the simple back substitution. Such a coding scheme provides an efficient solution to the index coding problem and its lower computation and energy cost makes it suitable for practical implementation on devices with limited processing and energy capacity.
TL;DR: A new modification of the ITA algorithm allows the inversion over finite fields in a minimal number of clock cycles, and the proposed implementations complete the inversions over GF(2233) or GF(2409) from only 10 clock cycles.
Abstract: Inversion over finite fields is the most costly basic operation for diverse cryptographic applications, such as elliptic curve cryptography and others. The Itoh-Tsujii algorithm (ITA) provides high performance implementations for the inversion operation in standard bases through diverse versions like squarer-ITA, parallel squarer-ITA or quad-ITA. A new modification of the ITA algorithm allows the inversion over finite fields in a minimal number of clock cycles. The proposed implementations complete the inversion over GF(2233) or GF(2409) from only 10 clock cycles.
TL;DR: Analysis shows that the proposed architecture saves about 57 percent space complexity and 50 percent time complexity when compared with the only existing semi-systolic even-type GNB multiplier, and due to properties of regularity and modularity, the proposed multiplier is very suitable for VLSI implementation.
Abstract: Efficient finite field multiplication is crucial for implementing public key cryptosystem. To achieve this, multipliers using Gaussian normal basis have been widely explored in previous works. In this paper, based on proposed Gaussian normal basis Montgomery (GNBM) representation, a semi-systolic even-type GNBM multiplier is developed. Analysis shows that the proposed architecture saves about 57 percent space complexity and 50 percent time complexity when compared with the only existing semi-systolic even-type GNB multiplier. Moreover, due to properties of regularity and modularity, the proposed multiplier is very suitable for VLSI implementation.
TL;DR: This work will present a simple method for designing a Gaussian normal basis (GNB) multiplier over GF(2 m) needing only fewer computation power whereas keeping lower cost, and which saves � 57% space complexity as compared with existing GNB multiplier.
Abstract: The elliptic curve cryptosystem (ECC) is very attractive for the use in portable devices because of the small key size. The finite field multiplication over GF(2 m ) is the most important arithmetic for performing the ECC. Portable devices usually have restricted computation power and memory resources. This work will present a simple method for designing a Gaussian normal basis (GNB) multiplier over GF(2 m ) needing only fewer computation power whereas keeping lower cost. The proposed Gaussian NB multiplier saves � 57% space complexity as compared with existing GNB multiplier.
TL;DR: The proposed approach has a definite capability of formally verifying practical Galois-field arithmetic circuits for which the conventional techniques fail, and successfully verifies the AES data path description within 800 s.
Abstract: This paper proposes a formal approach to designing Galois-field (GF) arithmetic circuits, which are widely used in modern cryptographic processors. Our method describes GF arithmetic circuits in a hierarchical manner with high-level directed graphs associated with specific GFs and arithmetic functions. The proposed circuit description can be effectively verified by symbolic computations based on polynomial reduction using Grobner bases. The verified description is then translated into the equivalent hardware description language (HDL) codes, which are available for the conventional design flow. We first describe the proposed graph representation and present an example of the description and verification. The significant advantage of the proposed approach is demonstrated through experimental designs of parallel multipliers over GF(2m) for different word lengths and irreducible polynomials. The result shows that the proposed approach has a definite capability of formally verifying practical GF arithmetic circuits for which the conventional techniques fail. We also propose an application of this approach to cryptographic processor design. The target considered here is a 128-bit advanced encryption standard (AES) data path with a loop architecture. To the best of the authors' knowledge, this is the first verification of this type of practical AES data path. We present a detailed description of the AES data path and its verification. The proposed approach successfully verifies the AES data path description within 800 s.
TL;DR: A novel scheme for the decomposition of the multiplication over GF(2m) based on irreducible trinomials into several independent units that facilitates maximal resister sharing and low-latency parallel implementation is presented.
Abstract: Systolic structures for finite field multiplication involve large number of registers for parallel implementation, while bit-serial implementations require a large computation time, which increases along with the order of the field. In this paper, we present a novel scheme for the decomposition of the multiplication over GF(2m) based on irreducible trinomials into several independent units that facilitates maximal resister sharing and low-latency parallel implementation. It is shown that the proposed design involves significantly less area-delay complexity compared with the best of the corresponding existing systolic designs, and could be used for a wider class of trinomials.
TL;DR: An efficient FPGA implementation for modular multiplication in the finite field GF(2^m) that is suitable for implementing Elliptic Curve Cryptosystems and can be scaled easily to larger values of m.
Abstract: This paper describes an efficient FPGA implementation for modular multiplication in the finite field GF(2^m) that is suitable for implementing Elliptic Curve Cryptosystems We have developed a systolic array implementation of a~Montgomery modular multiplication Our solution is efficient for large finite fields (m=160-193), that offer a high security level, and it can be scaled easily to larger values of m The clock frequency of the implementation is independent of the field size In contrast to earlier work, the design is not restricted to field representations using irreducible trinomials, all one polynomials or equally spaced polynomials
TL;DR: The proposed architecture for multiplication and exponentiation based on systolic structures has low latency and total computation time and is thus suitable for high-performance implementations of the cryptographic schemes such as the elliptic curve cryptography.
Abstract: This paper presents a new super digit-serial systolic multiplier architecture for computing multiplication over GF(2^{m}). the proposed architecture has low latency and total computation time and is thus suitable for high-performance implementations of the cryptographic schemes such as the elliptic curve cryptography (ECC). through comparisons, we show the efficiency improvements of the proposed architectures compared to the previously-presented ones. the presented architectures for multiplication and exponentiation based on systolic structures make hardware implementations of the cryptographic systems more efficient and high-performance.
TL;DR: The proposed design is capable of performing a field multiplication over the extension field with degree 163 in 11.92 s with the maximum achievable frequency of 251 MHz on Xilinx Virtex-4 while 22% of the chip area is occupied.
Abstract: A new and highly efficient architecture for elliptic curve scalar point multiplication which is optimized for a binary field recommended by NIST and is well-suited for elliptic curve cryptographic (ECC) applications is presented. To achieve the maximum architectural and timing improvements we have reorganized and reordered the critical path of the Lopez-Dahab scalar point multiplication architecture such that logic structures are implemented in parallel and operations in the critical path are diverted to noncritical paths. With G=41, the proposed design is capable of performing a field multiplication over the extension field with degree 163 in 11.92 s with the maximum achievable frequency of 251 MHz on Xilinx Virtex-4 (XC4VLX200) while 22% of the chip area is occupied, where G is the digit size of the underlying digit-serial finite field multiplier. Keywords—Elliptic Curve Cryptography, FPGA implementation, Scalar point multiplication
TL;DR: In this paper, the authors present improved Karatsuba formulae for multiplying two small binary polynomials, compare different strategies for PCLMULQDQ-based multiplication in the five GF(2^m) fields recommended by NIST, and conclude the best design approaches to software implementation of GF (2) multiplication.
TL;DR: The two low latency systolic structures for multiplications over GF(2m) based on general irreducible polynomials and pentanomials are presented and are suitable for many time critical applications.
Abstract: Systolic implementation of multiplication over GF(2m) is usually very efficient in area-time complexity, but its latency is usually very large. Thus, two low latency systolic multipliers over GF(2m) based on general irreducible polynomials and irreducible pentanomials are presented. First, a signal flow graph (SFG) is used to represent the algorithm for multiplication over GF(2m). Then, the two low latency systolic structures for multiplications over GF(2m) based on general irreducible polynomials and pentanomials are presented from the SFG by suitable cut-set retiming, respectively. Analysis indicates that the proposed two low latency designs involve at least one-third less area-delay product when compared with the existing designs. To the authors’ knowledge, the time-complexity of the structures is the lowest found in literature for systolic GF(2m) multipliers based on general irreducible polynomials and pentanomials. The proposed low latency designs are regular and modular, and therefore they are suitable for many time critical applications.
TL;DR: A low-complexity bit-parallel DB multiplier using the multiplexer approach that saves up to 60% of space complexity and can achieve the same security level but uses less key length than RSA.
Abstract: Recently, information security is heavily dependent on cryptosystems such as Rivest-Shamir-Adleman algorithm (RSA algorithm) and elliptic curve cryptosystem (ECC). RSA can provide higher security level than ECC, but it is not suitable for the resource-constrained devices such as smart phones or embedded system. Thus, ECC is attracted on application in resource-constrained devices because it can achieve the same security level, but uses less key length than RSA. Galois or finite field multiplication is the core arithmetic operation of ECC. There are three popular bases in the finite field over GF(2m), polynomial basis, normal basis and dual basis (DB). Each basis representation has its own advantages. In this study, the authors will introduce a low-complexity bit-parallel DB multiplier using the multiplexer approach. Compared with the related work, our design saves up to 60% of space complexity.
TL;DR: VLSI simulation is performed for several existing digit level field multipliers in the same field, GF(2283), and with the same 0.18μm VLSI technology so that an effective comparison of their power efficiency along with other IC features such as area and critical path delay can be made.
Abstract: Several digit level finite field multiplier architectures have been proposed in the literature. Some of them are with power estimation with different VLSI technology for different field sizes which makes it difficult to compare their power efficiency. In this paper, we perform VLSI simulation for several existing digit level field multipliers in the same field, GF(2283), and with the same 0.18μm VLSI technology so that an effective comparison of their power efficiency along with other IC features such as area and critical path delay can be made. Recommendations of the most efficient finite field multiplier are given for the different application constraints. Detailed discussion is provided for power constrained mobile and wireless applications. The comparison results obtained in this paper are expected to be useful for those who design and/or implement elliptic curve cryptography for wireless and portable systems.
TL;DR: This work presents $\textrm{GF}(2^m)$ multipliers with reduced activity variations for asymmetric cryptography and proposes modified multiplication algorithms and multiplier architectures to reduce useful activity variations during an operation.
Abstract: Electrical activity variations in a circuit are one of the information leakage used in side channel attacks. In this work, we present $\textrm{GF}(2^m)$ multipliers with reduced activity variations for asymmetric cryptography. Useful activity of typical multiplication algorithms is evaluated. The results show strong shapes, which can be used as a small source of information leakage. We propose modified multiplication algorithms and multiplier architectures to reduce useful activity variations during an operation.
TL;DR: This manuscript describes mathematical representations of the Camellia S-box by using composite fields such as polynomial, normal or mixed, and the theoretical design with composite normal bases allows saving gates in the critical path by using 19 XOR gates, 4 AND gates and 2 NOT gates.
Abstract: Substitution Box (S-box) is usually the most complex module in some block ciphers. Some prominent ciphers such as AES and Camellia use S-boxes, which are affine equivalents of a multiplicative inverse in small finite fields. This manuscript describes mathematical representations of the Camellia S-box by using composite fields such as polynomial, normal or mixed. An optimized hardware implementation typically aims to reduce the number of gates to be used. Our theoretical design with composite normal bases allows saving gates in the critical path by using 19 XOR gates, 4 AND gates and 2 NOT gates. With composite mixed bases, the critical path has 2 XOR gates more than the representation with composite normal bases. Redundancies found in the affine transformation matrix that form the composite fields were eliminated. For mixed bases, new Algebraic Normal Form identities were obtained to compute the inner composite multiplicative inverse, reducing the critical path of the complete implementation of the Camellia S-box. These constructions were translated into transistor-gate architectures for hardware representations by using Electric VLSI [29] under MOSIS C5 process [17], [18], thus obtaining the corresponding schematic models.
TL;DR: A class of matrices in GF(2) which are non invertible and easy to generate are proposed which can be used as multiplication matrix in Hill Cipher technique for one way hash algorithm.
Abstract: In this paper, we describe non invertible matrix in GF(2)which can be used as multiplication matrix in Hill Cipher technique for one way hash algorithm. The matrices proposed are permutation matrices with exactly one entry 1 in each row and each column and 0 elsewhere. Such matrices represent a permutation of m elements. Since the invention, Hill cipher algorithm was used for symmetric encryption, where the multiplication matrix is the key. The Hill cipher requires the inverse of the matrix to recover the plaintext from cipher text. We propose a class of matrices in GF(2) which are non invertible and easy to generate.
TL;DR: A novel finite-field multiplier with concurrent error-detection capabilities based on the Karatsuba formula is proposed, which is able to face a wide range of fault-based attacks.
Abstract: Galois fields are widely used in cryptographic applications. The detection of an error caused by a fault in a cryptographic circuit is important to avoid undesirable behaviours of the system that could be used to reveal secret information. One of the methods used to avoid these behaviours is the concurrent error detection. Multiplication in finite field is one of the most important operations and is widely used in different cryptographic systems. The authors propose in this study an error-detection method for composite finite-field multipliers based on the use of Karatsuba formula. The Karatsuba formula can be used in GF((2n)2) field to decrease the hardware complexity of the finite-field multiplier. The authors propose a novel finite-field multiplier with concurrent error-detection capabilities based on the Karatsuba formula. How the error-detection capabilities of this multiplier are able to face a wide range of fault-based attacks is also shown.
TL;DR: This paper presents two parallel algorithms of basic arithmetic operations concerning multiple-precision integers over finite field GF(2 n) and shows high efficiencies of the proposed parallel algorithms.
Abstract: This paper presents two parallel algorithms of basic arithmetic operations concerning multiple-precision integers over finite field GF(2 n ). The parallel algorithms of reduction operation and inversion-multiplication operation are designed by analyzing their data dependencies. Time complexities of the parallel algorithms and the sequential algorithms are calculated to make the quantitative comparison. The performance evaluation shows high efficiencies of the proposed parallel algorithms.
TL;DR: A novel low-cost bit-parallel DB multiplier which employs multiplexer approach is presented and saves at least 40% space complexity.
Abstract: Information security is heavily dependent on public key cryptosystems such as RSA. However, RSA is not available for the resource-constrained devices like embedded systems. Therefore, the new elliptic curve cryptosystem with very low cost as compared to RSA is now available and suggested for information security. Galois/Finite field multiplication is the most important operation in elliptic curve cryptosystem. There are three popular types of bases for representing elements in finite field, termed polynomial basis (PB), normal basis (NB), and dual basis (DB). A novel low-cost bit-parallel DB multiplier which employs multiplexer approach is presented. As compared to traditional DB multiplier using XOR gates, the proposed design saves at least 40% space complexity.
TL;DR: In this paper, the authors provided conditions under which the group generated by Rijndael-like round functions based on operations of the finite field (GF (p^k)$ ($p\geq 2$) is equal to the symmetric group or the alternating group on the state space.
Abstract: We provide conditions under which the set of Rijndael functions considered as permutations of the state space and based on operations of the finite field $\GF (p^k)$ ($p\geq 2$ a prime number) is not closed under functional composition. These conditions justify using a sequential multiple encryption to strengthen the AES (Rijndael block cipher with specific block sizes) in case AES became practically insecure. In Sparr and Wernsdorf (2008), R. Sparr and R. Wernsdorf provided conditions under which the group generated by the Rijndael-like round functions based on operations of the finite field $\GF (2^k)$ is equal to the alternating group on the state space. In this paper we provide conditions under which the group generated by the Rijndael-like round functions based on operations of the finite field $\GF (p^k)$ ($p\geq 2$) is equal to the symmetric group or the alternating group on the state space.
TL;DR: A methodology for incorporating Polynomial Residue Arithmetic (PRA) in the Montgomery multiplication algorithm for polynomials in GF(2n) and performance results are given in terms of the field characteristic n, the number of moduli elements L, and the moduli word-length w.
Abstract: A methodology for incorporating Polynomial Residue Arithmetic (PRA) in the Montgomery multiplication algorithm for polynomials in GF(2n) is presented in this paper. The mathematical conditions that need to be satisfied, in order for this incorporation to be valid are examined and performance results are given in terms of the field characteristic n, the number of moduli elements L, and the moduli word-length w. The proposed architecture is highly parallelizable and flexible, as it supports Polynomial-to-PRA and PRA-to-Polynomial conversions, Chinese Remainder Theorem (CRT) for polynomials, Montgomery multiplication, and Montgomery exponentiation in the same hardware.
TL;DR: This paper presents a new bit-parallel multiplier for the finite field GF (2) generated with an irreducible all-one polynomial, and a three-term Karatsuba-like formula is combined with this representation to decrease the space complexity.
Abstract: This paper presents a new bit-parallel multiplier for the finite field GF (2) generated with an irreducible all-one polynomial. Redundant representation is used to reduce the time delay of the proposed multiplier, while a three-term Karatsuba-like formula is combined with this representation to decrease the space complexity. As a result, the proposed multiplier requires about 10 percent fewer AND/XOR gates than the most efficient bit-parallel multipliers using an all-one polynomial, while it has almost the same time delay as the previously proposed ones.
TL;DR: In this algorithm, the operations required in several contiguous iterations of a previously reported algorithm based on the extended Euclid's algorithm are represented as a matrix and performed at once through the matrix by means of a polynomial multiply instruction on GF(2).
Abstract: The authors propose a fast inversion algorithm in Galois field GF(2 m ). In this algorithm, the operations required in several contiguous iterations of a previously reported algorithm based on the extended Euclid's algorithm are represented as a matrix. These operations are performed at once through the matrix by means of a polynomial multiply instruction on GF(2). When the word size of a processor is 32 or 64 and m is larger than 233 for National Institute of Standards and Technology (NIST)-recommended irreducible polynomials, the proposed algorithm computes inversion with less polynomial multiply instructions on GF(2) and exclusive-OR instructions required by previously reported inversion algorithms on an average.
TL;DR: In this paper, a non-invertible matrix in GF(2) which can be used as multiplication matrix in Hill Cipher technique for one way hash algorithm is described and a class of matrices which are non invertible and easy to generate.
Abstract: this paper, we describe non invertible matrix in GF(2) which can be used as multiplication matrix in Hill Cipher technique for one way hash algorithm. The matrices proposed are permutation matrices with exactly one entry 1 in each row and each column and 0 elsewhere. Such matrices represent a permutation of m elements. Since the invention, Hill cipher algorithm was used for symmetric encryption, where the multiplication matrix is the key. The Hill cipher requires the inverse of the matrix to recover the plaintext from cipher text. We propose a class of matrices in GF(2) which are non invertible and easy to generate.
TL;DR: In this paper, a digit-serial dual-basis multiplier over GF(2(superscript m)) is proposed to balance low space complexity and low time complexity at the same time.
Abstract: Multiplication is one of the most important finite field arithmetic operations in cryptographic computations. Dual basis multipliers over GF(2(superscript m)) are widely applied in this kind of computations due to its advantage of small chip area. However, up to date, there are only few methods that can keep balance of low space complexity and low time complexity at the same time. To achieve such an efficient aim, this study presents a novel digit-serial dual basis multiplier that is different from existing ones with a modified cut-set method using Karatsuba algorithm. Though this kind of multiplier will lose some throughput, it needs only a small number of transistors so that it is particularly suitable for some hand held devices that equipped only limited resources. The proposed digit-serial dual basis multiplier saves 54% space complexity and 30% time complexity as compared to existing similar studies with NIST suggested values for elliptic curve cryptosystem.
TL;DR: The main contributions of this paper are the improved finite field multiplier, which uses a 2-stage Karatsuba-Ofman multiplier architecture, and a revised algorithm for the projective to affine coordinate conversion, which computes 2 inversion operations simultaneously with the numerator portion.
Abstract: Improvements of the Elliptic Curve Cryptosystem (ECC) point multiplication processor is presented in this paper. The main contributions of this paper are the improved finite field multiplier, which uses a 2-stage Karatsuba-Ofman multiplier architecture. Furthermore, a revised algorithm is proposed for the projective to affine coordinate conversion, which computes 2 inversion operations simultaneously with the numerator portion, in order to make better use of parallel cores implemented in the ECC processor. The design is implemented on a Virtex 4 XC4VLX80 FPGA and the implementation results show that the ECC processor can compute a point multiplication in 6.72 us. This time is the fastest to the authors' best knowledge. Thus, the ECC processor proposed in this paper is suitable for applications where high-throughput is required, such as network servers.
TL;DR: A parallel, power-efficient and scalable word-based crypto architecture is proposed that performs the operations required for scalar point multiplication including add, multiplication and inversion operations on GF(2) operands.
Abstract: In this paper, a parallel, power-efficient and scalable word-based crypto architecture is proposed that performs the operations required for scalar point multiplication including add, multiplication and inversion operations on GF(2) operands. The proposed architecture distinguishes itself from exiting architectures, including our prior architecture, by the fact that its resource usage and power-consumption is based on the input data. Hence, such architecture might be used for various operand sizes without modifying or reconfiguring the underlying hardware. The architecture has also the ability to perform several different operations in parallel when each operation requires a small key size which significantly increases the overall performance and throughput of the system. In the absence of parallel requests, the remaining unused modules will be turned off in order to save power. The experimental results show significant improvement in the timing, throughput and energy performances with a slight overhead in the circuit area.