TL;DR: Two low-complexity bit-parallel systolic multipliers are presented based on the algorithm proposed, which can be applied in computing multiplications over the class of fields GF(2/sup m/) in which the elements are represented with the root of an irreducible equally spaced polynomial.
Abstract: Two operations, the cyclic shifting and the inner product, are defined by the properties of irreducible all one polynomials. An effective algorithm is proposed for computing multiplications over a class of fields GF(2/sup m/) using the two operations. Then, two low-complexity bit-parallel systolic multipliers are presented based on the algorithm. The first multiplier is composed of (m+1)/sup 2/ identical cells, each consisting of one 2-input AND gate, one 2-input XOR gate, and three 1-bit latches. The other multiplier comprises of (m+1)/sup 2/ identical cells and mXOR gates. Each cell consists of one 2-input AND gate, one 2-input XOR gate, and four 1-bit latches. Each multiplier exhibits very low latency and propagation delay and is thus very fast. Moreover, the architectures of the two multipliers can be applied in computing multiplications over the class of fields GF(2/sup m/) in which the elements are represented with the root of an irreducible equally spaced polynomial of degree m.
TL;DR: A fast algorithm for multiplicative inversion in GF(2/sup m/) using normal basis is proposed, which is an improvement on those proposed by Itoh and Tsujii and by Chang et al., which are based on Fermat's theorem and require O(logm) multiplications.
Abstract: A fast algorithm for multiplicative inversion in GF(2/sup m/) using normal basis is proposed. It is an improvement on those proposed by Itoh and Tsujii and by Chang et al., which are based on Fermat's theorem and require O(logm) multiplications. The number of multiplications is reduced by decomposing m-1 into several factors and a small remainder.
TL;DR: This paper presents a hardware architecture for a unified multiplier which operates in two types of finite fields: GF(p) and GF(m), and it is shown that the proposed architecture is highly regular and simple to design.
Abstract: The performance of elliptic curve cryptosystems is primarily determined by an efficient implementation of the arithmetic operations in the underlying finite field. This paper presents a hardware architecture for a unified multiplier which operates in two types of finite fields: GF(p) and GF(m). In both cases, the multiplication of field elements is performed by accumulation of partial-products to an intermediate result according to an MSB-first shift-and-add method. The reduction modulo the prime p (or the irreducible polynomial p(t), respectively) is interleaved with the addition steps by repeated subtractions of 2p and/or p (or p(t), respectively). A bit-serial multiplier executes a multiplication in GF(p) in approximately 1.5ċ⌈log2(p)⌉ clock cycles, and the multiplication in GF(m) takes exactly m clock cycles. The unified multiplier requires only slightly more area than that of the multiplier for prime fields GF(p). Moreover, it is shown that the proposed architecture is highly regular and simple to design.
TL;DR: An algorithm for computing AB/sup 2/+C over a finite field GF(2/sup m/) is presented using the properties of the irreducible all one polynomial of degree m and a parallel-in parallel-out systolic multiplier is proposed.
Abstract: An algorithm for computing AB/sup 2/+C over a finite field GF(2/sup m/) is presented using the properties of the irreducible all one polynomial of degree m. Based on the algorithm, a parallel-in parallel-out systolic multiplier is proposed. The architecture of the multiplier is very simple, regular, modular, and exhibits very low latency and propagation delay. Therefore, it is suitable for very large scale integration implementation of cryptosystems.
TL;DR: A bit-serial architecture for efficient addition and multiplication in binary finite fields GF(2/sup m/) using a polynomial basis representation is presented and a low-voltage/low-power implementation of the arithmetic circuits and the registers is proposed.
Abstract: This paper presents a bit-serial architecture for efficient addition and multiplication in binary finite fields GF(2/sup m/) using a polynomial basis representation. Moreover, a low-voltage/low-power implementation of the arithmetic circuits and the registers is proposed. The introduced multiplier operates over a variety of binary fields up to an order of 2/sup m/. We detail that the bit-serial multiplier architecture can be implemented with only 28m gate equivalents, and that it is scalable, highly regular and simple to design.
TL;DR: A comparison of several projective point transformations of an elliptic curve defined over GF(2n) and a ranking of their performance is provided.
Abstract: Here we provide a comparison of several projective point transformations of an elliptic curve defined over GF(2n) and rank their performance.We provide strategies to achieve improved implementations of each. Our work shows that under certain conditions, these strategies can alter the ranking of these projective point arithmetic methods.
TL;DR: A W-bit wide word computer capable of operating on one or more sets of k-bit operands executes Galois field arithmetic by mapping arithmetic operations of Galois Field GF(2 n ) to corresponding operations in subfields lower order (m < n), which one selected on the basis of an appropriate cost function as mentioned in this paper.
Abstract: Efficient parallel processing of algorithms involving Galois Field arithmetic use data slicing techniques to execute arithmetic operations on a computing hardware having SIMD (single-instruction, multiple-data) architectures. A W-bit wide word computer capable of operating on one or more sets of k-bit operands executes Galois Field arithmetic by mapping arithmetic operations of Galois Field GF(2 n ) to corresponding operations in subfields lower order (m
TL;DR: A systematic design of this generalization of XTR in which certain subgroup of the Galois field GF(p6) can be represented by elements in GF( p2) is given and about optimal choices for p and m with respect to performances are discussed.
Abstract: A. K. Lenstra and E. R. Verheul in [2] proposed a very efficient way called XTR in which certain subgroup of the Galois field GF(p6) can be represented by elements in GF(p2). At the end of their paper [2], they briefly mentioned on a method of generalizing their idea to the field GF(p6m). In this paper, we give a systematic design of this generalization and discuss about optimal choices for p and m with respect to performances. If we choose m large enough, we can reduce the size of p as small as the word size of common processors. In such a case, this extended XTR is well suited for the processors with optimized arithmetic on integers of word size.
TL;DR: A parallel-in parallel-out systolic division circuit over GF(2/Sup m/) based on the novel extended Stein's algorithm that provides guaranteed convergence in 2/sup m/-1 iterations is presented.
Abstract: We present a parallel-in parallel-out systolic division circuit over GF(2/sup m/) based on the novel extended Stein's algorithm that provides guaranteed convergence in 2/sup m/-1 iterations. The area-time (AT) complexity of our design is O(m/sup 2/) and the achievable maximum clock rate is 1 GHz based on the 0.6 /spl mu/m technology. Compared to the best systolic design known to date based on the extended Euclid's algorithm the proposed circuit exhibits significant area and speed advantages.
TL;DR: A logic system is defined that can treat non-linear and nonconvex constraints, for describing specifications and implementations of arithmetic algorithms over Galois field GF(2m), and an automatic correctness proof of a Reed-Solomon ECC decoding algorithm is performed.
Abstract: The Galois field GF(2m) is an important number system that is widely used in applications such as error correction codes (ECC), and complicated combinations of arithmetic operations are performed in those applications However, few practical formal methods for algorithm verification at the word-level have ever been developed We have defined a logic system, GF2m -arithmetic, that can treat non-linear and nonconvex constraints, for describing specifications and implementations of arithmetic algorithms over GF(2m) We have investigated various decision techniques for the GF2m -arithmetic and its subclasses, and have performed an automatic correctness proof of a (n, n 4) Reed-Solomon ECC decoding algorithm Because the correctness criterion is in an efficient subclass of the GF2m -arithmetic (k -field-size independent), the proof is completed in significantly reduced time, less than one second for any m ≥ 3 and n ≥ 5, by using a combination of polynomial division and variable elimination over GF(2m), without using any costly techniques such as factoring or a decision over GF(2) that can easily increase the verification time to more than a day
TL;DR: A low latency architecture to compute the multiplicative inverse and division in a finite field GF (2/sup m/) is presented and can be used in error-correction or cryptography to increase system throughput.
Abstract: A low latency architecture to compute the multiplicative inverse and division in a finite field GF (2/sup m/) is presented. Compared to other proposals with the same complexity, this circuit has lower latency and can be used in error-correction or cryptography to increase system throughput. This architecture takes advantage of the simplicity to computing powers (2/sup l/) of an element in the Galois Field. The inverse of an element is computed in two stages: power calculation and multiplication. A division can be performed using only one more multiplication in the inversion circuit.
TL;DR: Results which have immediate application in synthesis of connection polynomials for stream cipher systems and algorithmic issues in getting trinomial multiples of low degree are discussed.
Abstract: Linear Feedback Shift Registers (LFSR) are important building blocks in stream cipher systems. The connection polynomials of the LFSRs need to be primitive over GF(2). Also the polynomial should have high weight and it should not have sparse multiples of moderate degree. Here we provide results which have immediate application in synthesis of connection polynomials for stream cipher systems. We show that, given any primitive polynomial f(x) of degree d there exists 2d-1 - 1 many distinct trinomial multiples of degree less than 2d - 1. Among these trinomial multiples, it is known that a trinomial of the form x2/3(2d-1) +x1/3 (2d-1) + 1 contains all the degree d (d even) primitive polynomials as its factors. We extend this result by showing that, if d1 (even) divides d (even) and 2d-1/3 ? 0 mod (2d1 - 1), then the trinomial x2/3(2d-1) + x1/3(2d-1) + 1 contains all the primitive polynomials of degree d1 as its factor. We also discuss algorithmic issues in getting trinomial multiples of low degree. Next we present some results on t-nomial multiples of primitive polynomials which help us in choosing primitive polynomials that do not have sparse multiples.
TL;DR: The analysis results show that the proposed architecture leads to a considerable reduction of computational delay time with a moderate increase of hardware complexity, compared to the existing digit-serial systolic multipliers.
Abstract: An efficient digit-serial systolic array is proposed for multiplication in finite fields GF(2/sup m/) with the standard basis representation. From the least significant bit first algorithm, we obtain a new dependence graph and design an efficient digit-serial systolic multiplier. If input data comes in continuously, the proposed array can produce multiplication results at a rate of one every [m/L] clock cycles, where L is the selected digit size. The analysis results show that the proposed architecture leads to a considerable reduction of computational delay time with a moderate increase of hardware complexity, compared to the existing digit-serial systolic multipliers. Furthermore, since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementation with fault-tolerant design.
TL;DR: It is proved that each of the two generalized hexagons of order (2, 2) has generating rank 14, that the central involution geometry of the Hall-Janko sporadic group has generatingRank 28, and that the dual polar space DU(6,2) has generated rank 22.
Abstract: The generating rank is determined for several GF(2)-embeddable geometries and it is demonstrated that their generating and embedding ranks are equal. Specifically, we prove that each of the two generalized hexagons of order (2, 2) has generating rank 14, that the central involution geometry of the Hall-Janko sporadic group has generating rank 28, and that the dual polar space DU(6,2) has generating rank 22. We also include a survey of all instances in which either the generating or embedding rank of an embeddable GF(2) geometry is known.
TL;DR: In this paper, the multiplicative inverse of an element of a subfield of GF(2 2M ) is determined using a look-up table that contains multiplicative inverses of selected elements of the subfield.
Abstract: A system determines the multiplicative inverse of A∈GF(2 2M ) by representing A using a selected basis in which basis elements are squares of one another, and performing various operations that involve raising A to powers of 2 as cyclic rotations of A. The system also performs multiplication operations over GF(2 2M ) or subfields thereof by calculating the coefficients of the product of two elements A and B that are represented using the selected basis as combinations of the coefficients of cyclically rotated versions of A and B. The system further utilizes a relatively small look-up table that contains the multiplicative inverses of selected elements of a subfield of GF(2 2M ). The system may then cyclically rotate the multiplicative inverse values read from the table to produce the multiplicative inverses of the remaining elements of the subfield. Thereafter, as applicable, the system further manipulates the multiplicative inverse of the subfield element, to produce the multiplicative inverse of the desired element of GF(2 2M ). Using the selected basis, elements of GF(2 2M ) that are elements of the subfields have m lowest-order coefficients that are duplicates of the m highest order coefficients. Each element in the look-up table can thus be represented using only m bits, and the table can be entered with m bits.
TL;DR: A low latency digit serial multiplier for GF(2/sup m/) that can be pipelined to the bit-level that does not put any restriction on the type of generator polynomial used or the digit size.
Abstract: A low latency digit serial multiplier for GF(2/sup m/) that can be pipelined to the bit-level is presented in this paper. Unlike existing structures, the new multiplier does not put any restriction on the type of generator polynomial used or the digit size. Furthermore, the latency of the new multiplier is significantly less than the latency of the existing bit-level pipelined digit-serial multiplier.
TL;DR: Experimental results establish the efficiency of the MACA scheme in terms of saving in memory space and execution time and enhanced diagnostic resolution and the processing time is reduced.
Abstract: This paper introduces an efficient diagnosis scheme for VLSI circuits. A special class of non-group CA referred to as multiple attractor cellular automats (MACA) is introduced to diagnose the faulty block of a circuit under test (CUT). The scheme employs significantly lesser memory than the existing methods reported so far. Experimental results establish the efficiency of the scheme in terms of saving in memory space and execution time and enhanced diagnostic resolution. Rather than GF(2) CA where each CA cell handles GF(2) elements (0 and 1), the GF(2/sup p/) CA is employed to reduce the processing time.
TL;DR: An algorithm for correlation attacks on stream ciphers over GF(2/sup n/) is proposed and the performance of the algorithm is analysed by using a random coding bound.
Abstract: An algorithm for correlation attacks on stream ciphers over GF(2/sup n/) is proposed. Furthermore, the performance of the algorithm is analysed by using a random coding bound.
TL;DR: If the proposed digit-serial multiplier chooses the digit size L appropriately, it can meet the throughput requirement of a certain application with minimum hardware and be well suited to VLSI implementation.
Abstract: This paper presents a new digit-serial systolic multiplier for finite fields GF(2/sup m/). The hardware requirements of the proposed multiplier are less than those of the existing multiplier of the same class, while maintaining the same cell delay. The proposed multiplier possesses the features of regularity, modularity, and unidirectional data flow. Thus, it is well suited to VLSI implementation. If the proposed digit-serial multiplier chooses the digit size L appropriately, it can meet the throughput requirement of a certain application with minimum hardware.
TL;DR: An innovative scheme based on logic folding is presented to reduce the BIST overhead and make it more effective for large circuits.
Abstract: This paper presents an efficient BIST solution for VLSI circuit testing based on GF(2/sup p/) CA (cellular automata on an extended Galois field). The novel architecture of GF(2/sup p/) CA permits the BIST structure to be highly customized to the circuit under test (CUT). A methodology has been proposed to optimize the design of GF(2/sup p/) CA structure to maximize the fault coverage in a given CUT. In addition, an innovative scheme based on logic folding is presented to reduce the BIST overhead and make it more effective for large circuits.
TL;DR: Two special operations, called the cyclic shifting and the inner product are defined based on the properties of irreducible all one polynomials, and an effective algorithm for computing multiplication over a class of GF(2/sup m/) was developed.
Abstract: Two special operations, called the cyclic shifting and the inner product are defined based on the properties of irreducible all one polynomials. With the two operations, an effective algorithm for computing multiplication over a class of GF(2/sup m/) was developed in this paper. The low-complexity bit-parallel systolic multipliers are presented. The multiplier is composed of (m+1)/sup 2/ identical cells, each of which consisting of one 2-bit AND gate, one 2-bit XOR gate and three 1-bit latches. The multiplier has very low latency and propagation delay, which makes them very fast. Moreover the architectures of the multiplier can also be applied to compute multiplication over the class of GF(2/sup m/) in which the elements are represented with the root of an irreducible equally spaced polynomial degree.
TL;DR: It is shown that if, for a certain degree, an irreducible ESP of a large degree can be obtained from a corresponding irreduceible AOP of a very small degree, then from the complexity point view, the structure of the ESP-based multiplier is beneficial to construct a modular architecture.
Abstract: In this paper, an effective algorithm for computing multiplication over a class of GF(2/sup m/) based on irreducible all one polynomials (AOP) and equally spaced polynomials (ESP) is presented. The structures are the use of two special operations, called the cyclic shifting and the inner product, to construct the low-latency bit-parallel systolic multipliers. The circuits are simple and modular which is important for hardware implementation. The AOP-based multiplier is composed of (m+1)/sup 2/ identical cells, each of which consisting of one 2-bit AND gate, one 2-bit XOR gate and three 1-bit latches. This multiplier has very low latency and propagation delay, which makes them very fast. Moreover, the AOP-based multiplier of small size can also be applied to construct ESP-based multiplier of large size, in which the elements are represented with the root of an irreducible equally spaced polynomial of degree nr. It is shown that if, for a certain degree, an irreducible ESP of a large degree can be obtained from a corresponding irreducible AOP of a very small degree, then from the complexity point view, the structure of the ESP-based multiplier is beneficial to construct a modular architecture.
TL;DR: An effective algorithm for computing multiplications over a class of GF(2/sup m/) was developed and the low complexity bit-parallel systolic multiplier is presented.
Abstract: The operations of the cyclic shifting and the inner product are defined based on the properties of irreducible all one polynomials. With the two operations, an effective algorithm for computing multiplications over a class of GF(2/sup m/) was developed in this paper. The low complexity bit-parallel systolic multiplier is presented. The multiplier has very low latency, which makes them very fast. Moreover the architectures of the multiplier can also be applied to compute multiplications over the class of GF(2/sup m/) in which the elements are represented with the root of an irreducible equally spaced polynomial of degree m.
TL;DR: The new digit-serial systolic array with unidirectional data flow is highly regular, nearest-neighbor connected, and thus is well suited for VLSI implementation.
Abstract: This paper presents a novel digit-serial-in-serial-out systolic array for performing the power-sum operation C+AB/sup 2/ in finite fields GF(2/sup m/) with the standard basis representation. If the appropriate digit-size is selected, the proposed method can meet the throughput requirement of a specific application with minimum hardware. With the digit size of the regular square form, the latency of the array can be reduced by 20% as before in GF(2/sup 160/). The new digit-serial systolic array with unidirectional data flow is highly regular, nearest-neighbor connected, and thus is well suited for VLSI implementation.
TL;DR: A comparative study of digit-serial multiplier VLSI architectures, for fields of type GF(2/sup n/), is carried out and figures of merit like time latency, silicon area and power consumption are evaluated.
Abstract: Multiplication in finite fields (Galois fields) is a basic operation for cryptography applications. Recent proposals for elliptic code cryptography, require efficient computation of multiplication in finite fields of type GF(2/sup n/) for large values of n (150, 200 bits). Digit-serial multiplier VLSI architectures are an attractive solution, being a compromise between purely parallel and serial ones. A comparative study of digit-serial multiplier VLSI architectures, for fields of type GF(2/sup n/), is carried out. Such architectures are reviewed, some further optimisations are proposed, and are then implemented in VHDL (CMOS cell library, 0.35 /spl mu/m, by ST Microelectronics). Figures of merit like time latency, silicon area and power consumption are evaluated by simulation with Synopsis tools, varying parameters like the size n of the field elements and the size k of the blocks of bits being processed in parallel by the digit-serial architectures.
TL;DR: This paper considers codes over GF(3), GF(5), GF (7), and GF(8), and the existence of ten new codes and the nonexistence of six codes is proved, improving the corresponding lower and upper bounds in Brouwer's table.
Abstract: Let [n, k, d]q-codes be linear codes of length n, dimension k, and minimum Hamming distance d over GF(q) In this paper we consider codes over GF(3), GF(5), GF(7), and GF(8) Over GF(3), three new linear codes are constructed Over GF(5), eight new linear codes are constructed and the nonexistence of six codes is proved Over GF(7), the existence of 33 new codes is proved Over GF(8), the existence of ten new codes and the nonexistence of six codes is proved All of these results improve the corresponding lower and upper bounds in Brouwer's table [wwwwintuenl/∼aeb/voorlincodhtml]
TL;DR: The presented cellular-array circuits have the advantages of less circuit complexity, shorter latency, and lower power consumption, which makes them suitable for pipeline architecture and can be applied in wireless communications.
Abstract: Two types of cellular-array power-sum circuits over a fixed-size finite field GF(2''') are first presented. According to the two fixed field power-sum circuits, VLSI architectures of cellular-array generalized power-sum circuits in the finite field GF(2''') are presented. The presented cellular-array circuits have the advantages of less circuit complexity, shorter latency, and lower power consumption. Thus they are suitable for pipeline architecture and can be applied in wireless communications.
TL;DR: An alternative of the high-speed parallel multiplier based on the standard basis over GF(2/sup m/) is designed, composed of three types of general multiplier cells (GMC) and two types of delay boxes (DB).
Abstract: We design an alternative of the high-speed parallel multiplier based on the standard basis over GF(2/sup m/). it is composed of three types of general multiplier cells (GMC) and two types of delay boxes (DB) When we implement the proposed multiplier over GF(2/sup 8/) by using 0.8 /spl mu/m CMOS standard cell library, at the 185 MHz clock-rate, the implemented multiplier has less complexity, ie, a 25% reduction from that of Berlekamp (1982) and a 33% reduction from that of Jain et al., (1998). For power-consumption, the implemented multiplier has a 29% reduction from that of Jain.