TL;DR: It is shown that if a Boolean function has correlation at most epsi les 1/2 with any of these models, then the correlation of the parity of its values on m independent instances drops exponentially with m, and a new proof that the Modm function on n bits, for odd m, has correlationat most exp(-n/4d) with degree-d GF(2) polynomials.
Abstract: This paper presents a unified and simple treatment of basic questions concerning two computational models: multiparty communication complexity and GF(2) polynomials. The key is the use of (known) norms on Boolean functions, which capture their approximability in each of these models. The main contributions are new XOR lemmas. We show that if a Boolean function has correlation at most epsi les 1/2 with any of these models, then the correlation of the parity of its values on m independent instances drops exponentially with m. More specifically: For GF(2) polynomials of degree d, the correlation drops to exp (-m/4d). No XOR lemma was known even for d = 2. For c-bit k-party protocols, the correlation drops to 2c ldrepsim/2 k . No XOR lemma was known for k ges 3 parties. Another contribution in this paper is a general derivation of direct product lemmas from XOR lemmas. In particular, assuming that f has correlation at most epsi les 1/2 with any of the above models, we obtain the following bounds on the probability of computing m independent instances of f correctly: For GF(2) polynomials of degree d we again obtain a bound of exp(-m/4d). For c-bit k-party protocols we obtain a bound of 2-Omega(m) in the special case when epsi les exp (-c ldr 2k). In this range of epsi, our bound improves on a direct product lemma for two-parties by Parnafes, Raz, and Wigderson (STOC '97). We also use the norms to give improved (or just simplified) lower bounds in these models. In particular we give a new proof that the Modm function on n bits, for odd m, has correlation at most exp(-n/4d) with degree-d GF(2) polynomials.
TL;DR: A reconfigurable curve-based cryptoprocessor that accelerates scalar multiplication of ECC and HECC of genus 2 over GF(2n) and it can handle various curve parameters and arbitrary irreducible polynomials.
Abstract: This paper presents a reconfigurable curve-based cryptoprocessor that accelerates scalar multiplication of Elliptic Curve Cryptography (ECC) and HyperElliptic Curve Cryptography (HECC) of genus 2 over GF(2n). By allocating a copies of processing cores that embed reconfigurable Modular Arithmetic Logic Units (MALUs) over GF(2n), the scalar multiplication of ECC/HECC can be accelerated by exploiting Instruction-Level Parallelism (ILP). The supported field size can be arbitrary up to a(n + 1) - 1. The superscaling feature is facilitated by defining a single instruction that can be used for all field operations and point/divisor operations. In addition, the cryptoprocessor is fully programmable and it can handle various curve parameters and arbitrary irreducible polynomials. The cost, performance, and security trade-offs are thoroughly discussed for different hardware configurations and software programs. The synthesis results with a 0.13-mum CMOS technology show that the proposed reconfigurable cryptoprocessor runs at 292 MHz, whereas the field sizes can be supported up to 587 bits. The compact and fastest configuration of our design is also synthesized with a fixed field size and irreducible polynomial. The results show that the scalar multiplication of ECC over GF(2163) and HECC over GF(283) can be performed in 29 and 63 mus, respectively.
TL;DR: The proposed scalable systolic architecture is demonstrated to have significantly less time–area product complexity than existing digit-serial syStolic architectures and to have regularity, modularity and local interconnectability, making them highly appropriate for VLSI implementation.
Abstract: A Montgomery's algorithm in GF(2m) based on the Hankel matrix–vector representation is proposed. The hardware architecture obtained from this algorithm indicates low-complexity bit-parallel systolic multipliers with irreducible trinomials. The results reveal that the proposed multiplier saves approximately 36% of space complexity as compared to an existing systolic Montgomery multiplier for trinomials. A scalable and systolic Montgomery multiplier is also developed by applying the block-Hankel matrix–vector representation. The proposed scalable systolic architecture is demonstrated to have significantly less time–area product complexity than existing digit-serial systolic architectures. Furthermore, the proposed architectures have regularity, modularity and local interconnectability, making them highly appropriate for VLSI implementation.
TL;DR: In this paper, the authors question the need for the standardisation of irreducible polynomials in the first place, and derive the best polynomial to use depending on the underlying processor architecture.
Abstract: The irreducible polynomials recommended for use by multiple standards documents are in fact far from optimal on many platforms. Specifically they are suboptimal in terms of performance, for the computation of field square roots and in the application of the “almost inverse” field inversion algorithm. In this paper we question the need for the standardisation of irreducible polynomials in the first place, and derive the “best” polynomials to use depending on the underlying processor architecture. Surprisingly it turns out that a trinomial polynomial is in many cases not necessarily the best choice. Finally we make some specific recommendations for some particular types of architecture.
TL;DR: This paper represents the SMS4 cipher with an overdetermined, sparse multivariate quadratic equation system over GF(28), and estimates the computational complexity of the XSL algorithm for solving the equation system and finds that the complexity is 277 when solving the whole system of equations.
Abstract: SMS4 is a 128-bit block cipher which is used in the WAPI standard in China for protecting wireless transmission data. Due to the nature that the functions deployed in the round transformations of SMS4 operate on two different fields GF(28) and GF(2), it is difficult to analyze this cipher algebraically. In this paper we describe a new block cipher called ESMS4, which uses only algebraic operations over GF(28). The new cipher is an extension of SMS4 in the sense that SMS4 can be embedded into ESMS4 with restricted plaintext space and key spaces. Thus, the SMS4 cipher can be investigated through this embedding over GF(28). Based on this new cipher, we represent the SMS4 cipher with an overdetermined, sparse multivariate quadratic equation system over GF(28). Furthermore, we estimate the computational complexity of the XSL algorithm for solving the equation system and find that the complexity is 277 when solving the whole system of equations.
TL;DR: This paper presents a unified systolic multiplication architecture, by employing Hankel matrix-vector multiplication, for various basis representations, and demonstrates that the proposed architectures perform well both in space and time complexities.
Abstract: In general, there are three popular basis representations, standard (canonical, polynomial) basis, normal basis, and dual basis, for representing elements in GF(2m). Various basis representations have their distinct advantages and have their different associated multiplication architectures. In this paper, we will present a unified systolic multiplication architecture, by employing Hankel matrix-vector multiplication, for various basis representations. For various element representation in GF(2m), we will show that various basis multiplications can be performed by Hankel matrix-vector multiplications. A comparison with existing and similar structures has shown that the proposed architectures perform well both in space and time complexities.
TL;DR: In this paper, an interesting one-to-one correspondence between the operators of the Mermin-Peres square and the points of the projective line over the product ring GF(2) ⊗ GF (2) is established.
Abstract: In 1993, Mermin (Rev. Mod. Phys. 65, 803–815) gave lucid and strikingly simple proofs of the Bell-Kochen-Specker (BKS) theorem in Hilbert spaces of dimensions four and eight by making use of what has since been referred to as the Mermin(-Peres) “magic square” and the Mermin pentagram, respectively. The former is a 3 × 3 array of nine observables commuting pairwise in each row and column and arranged so that their product properties contradict those of the assigned eigenvalues. The latter is a set of ten observables arranged in five groups of four lying along five edges of the pentagram and characterized by similar contradiction. An interesting one-to-one correspondence between the operators of the Mermin-Peres square and the points of the projective line over the product ring GF(2) ⊗ GF(2) is established. Under this mapping, the concept “mutually commuting” translates into “mutually distant” and the distinguishing character of the third column’s observables has its counterpart in the distinguished properties of the coordinates of the corresponding points, whose entries are both either zero-divisors, or units. The ten operators of the Mermin pentagram answer to a specific subset of points of the line over GF(2)[x]/h x 3 − xi . The situation here is, however, more intricate as there are two different configurations that seem to serve equally well our purpose. The first one comprises the three distinguished points of the (sub)line over GF(2), their three “Jacobson” counterparts and the four points whose both coordinates are zero-divisors; the other features the neighbourhood of the point (1,0) (or, equivalently, that of (0,1)). Some other ring lines that might be relevant for BKS proofs in higher dimensions are also mentioned.
TL;DR: It presents a novel digit-serial architecture for finite field multiplications over GF(2m) defined by irreducible trinomials as field polynomials, and offers considerably lower area-time complexity compared with the existing designs.
Abstract: It presents a novel digit-serial architecture for finite field multiplications over GF(2m) defined by irreducible trinomials as field polynomials. The critical path of the proposed structure is reduced, and a saving of m number of XOR gates is achieved by the proposed structure at the final output stage by successive finite field accumulation through T flip-flops instead of using D flip-flops and XOR gates in sequential loop. The proposed design is highly modular, and consists of regular blocks of AND and XOR logic gates. The details of hardware requirement and computational delay of the proposed multiplier have been estimated and compared with those of the existing designs. It is found that the proposed design offers considerably lower area-time complexity compared with the existing designs. The advantage of the proposed design is mainly based on its lower critical path, optimal logic design and 100% hardware utilization efficiency.
TL;DR: An algorithm for inversion in GF( 2m) suitable for implementation using a polynomial multiply instruction on GF(2) is proposed, based on the extended Euclid's algorithm.
Abstract: An algorithm for inversion in GF(2m) suitable for implementation using a polynomial multiply instruction on GF(2) is proposed. It is based on the extended Euclid's algorithm. In the algorithm, operations corresponding to several contiguous iterations of the VLSI algorithm proposed by Brunner et al. is represented as a matrix. They are calculated at once through the matrix efficiently by means of a polynomial multiply instruction on GF(2). For example, in the case where the word size of a processor and m are 32 and 571, respectively, the algorithm calculates inversion with about the half number of instructions of the conventional algorithm on the average.
TL;DR: The theory of which irreducible polynomials f(x) divide trinomials over GF(2) is developed and some related problems such as Artin's conjecture about primitive roots, and the conjectures of Blake, Gao, and Lambert, as well as of Tromp, Zhang, and Zhao are discussed.
Abstract: The simplest linear shift registers to generate binary sequences involve only two taps, which corresponds to a trinomial over GF(2). It is therefore of interest to know which irreducible polynomials f(x) divide trinomials over GF(2), since the output sequences corresponding to f(x) can be obtained from a two-tap linear feedback shift register (with a suitable initial state) if and only if f(x) divides some trinomial t(x)=xm+xa+1 over GF(2). In this paper, we develop the theory of which irreducible polynomials do, or do not, divide trinomials over GF(2). Then some related problems such as Artin's conjecture about primitive roots, and the conjectures of Blake, Gao, and Lambert, as well as of Tromp, Zhang, and Zhao are discussed
TL;DR: This research modifies a GF(p) multiplication algorithm to make it applicable for GF(2k) and adjusts it to have area flexibility feature, which is used as the basic block in modeling a complete projective coordinate GF( 2k) ECC coprocessor.
Abstract: Elliptic curve cryptography (ECC) is popularly defined either over GF(p) or GF(2k). This research modifies a GF(p) multiplication algorithm to make it applicable for GF(2k). Both algorithms, the GF(p) and GF(2k) one, are designed in hardware to be compared. The GF(2k) multiplier is found faster and small. This GF(2k) multiplier is further improved to benefit in speed, it gained more than 40% faster speed with the cost of 5% more area. This multiplier hardware is furthermore adjusted to have area flexibility feature, which is used as the basic block in modeling a complete projective coordinate GF(2k) ECC coprocessor.
TL;DR: A high performance finite field processor for elliptic curve cryptography is presented, and the modified bit-parallel word-serial (BPWS) finite field multiplication algorithm and its corresponding pipeline-fashion multiplier architecture is presented.
Abstract: A high performance finite field processor for elliptic curve cryptography is presented. One of the contributions in this work is the modified bit-parallel word-serial (BPWS) finite field multiplication algorithm and its corresponding pipeline-fashion multiplier architecture. The proposed multiplier achieves a throughput of one multiplication every N + 1 clock cycles, in contrast with at least N + 3 clock cycles required in the recent other designs, where N is the ratio of field size to word size. Another contribution of this work is to explore parallelism at the instruction level in the proposed processor. Separated hardware modules for finite field multiplication, squaring and addition make it possible that up to three finite field arithmetic operations be executed in parallel. At a higher level, data dependencies are detected at compile-time by analyzing the data interdependency when performing elliptic curve point operations. Implemented using a CMOS 0.18mum chip, which runs at 125MHz and performs one scalar multiplication in 62mus
TL;DR: The proposed architecture has high order of flexibility and low hardware complexity with critical path delay independent of operand length and can find application in, for example, elliptic curve cryptographic (ECC) processors.
Abstract: This paper proposes a unified and reconfigurable Montgomery multiplier architecture which can operate in both primary GF(p) and binary extension fields GF(2n). The multiplier provides efficient execution of Montgomery multiplication in either field for different operand lengths. It supports any operand length 'n', 1 < n les N where the upper value of N is application dependent. The final result is obtained in 'n+2' clock cycles for either field. Propagation delay of the design is investigated and found to be comparable with the existing unified multiplier architectures while providing reconfigurability at the same time. The proposed architecture has high order of flexibility and low hardware complexity with critical path delay independent of operand length. The multiplier can find application in, for example, elliptic curve cryptographic (ECC) processors
TL;DR: In this article, a Galois field GF(2k) was used to map the input data to a composite Galois Field GF (2k), where k = nm.
Abstract: A system comprises reception of input data of a Galois field GF(2k), mapping of the input data to a composite Galois field GF(2nm), where k=nm, inputting of the mapped input data to an Advanced Encryption Standard round function, performance of two or more iterations of the Advanced Encryption Standard round function in the composite Galois field GF(2nm), reception of output data of a last of the two or more iterations of the Advanced Encryption Standard round function, and mapping of the output data to the Galois field GF(2k).
TL;DR: In this article, the authors describe the implementation of AES encryption and decryption processes in one embodiment of S-box processing and in another embodiment of inverse-columns-mixing.
Abstract: Implementations of Advanced Encryption Standard (AES) encryption and decryption processes are disclosed. In one embodiment of S-box processing, a block of 16 byte values is converted, each byte value being converted from a polynomial representation in GF(256) to a polynomial representation in GF((22)4). Multiplicative inverse polynomial representations in GF((22)4) are computed for each of the corresponding polynomial representations in GF((22)4). Finally corresponding multiplicative inverse polynomial representations in GF((22)4) are converted and an affine transformation is applied to generate corresponding polynomial representations in GF(256). In an alternative embodiment of S-box processing, powers of the polynomial representations are computed and multiplied together in GF(256) to generate multiplicative inverse polynomial representations in GF(256). In an embodiment of inverse-columns-mixing, the 16 byte values are converted from a polynomial representation in GF(256) to a polynomial representation in GF((24)2). A four-by-four matrix is applied to the transformed polynomial representation in GF((24)2) to implement the inverse-columns-mixing.
TL;DR: This paper shows that a field multiplication over GF(2m) can be implemented by the extended Stein algorithm, one of the algorithms used to realize division, and achieves area advantages in comparison with other low-cost designs.
Abstract: Using the concept of reciprocal polynomial, this paper shows that a field multiplication over GF(2m) can be implemented by the extended Stein algorithm, one of the algorithms used to realize division. With a fundamental change at the algorithmic level, the field multiplication can be efficiently embedded into a divider so that the multiplier can be eliminated with very little hardware overhead for operand selection. When applied to elliptic curve cryptography (ECC) using affine coordinates, about 13.8% reduction on the area requirement can be achieved with almost no performance degradation compared with the one implemented with two distinct components. Experimental results show that the combined multiplication and division circuit achieves area advantages in comparison with other low-cost designs. The area-efficient design of ECC system also exhibits obvious improvement in area-time (AT) complexity.
TL;DR: An optimized processor array-based field ALU that efficiently implements addition, squaring, multiplication and division over GF(2m) is presented, which decreases the scalar multiplication critical path delay at the expense of a larger look-up table.
Abstract: This paper presents a high-radix elliptic curve cryptographic architecture that performs a scalar multiple of an elliptic curve point operations over GF(2m). The proposed architecture is based on a new algorithm, which is a modified version of the sliding window scalar multiplication algorithm. We speed-up the scalar multiplication by merging the point doubling and adding operations into a single step, which decreases the scalar multiplication critical path delay at the expense of a larger look-up table. The proposed architecture utilizes an optimized processor array-based field ALU that efficiently implements addition, squaring, multiplication and division over GF(2m). The proposed architecture is implemented for m epsiv {163, 283, 571} on a Xilinx XC4VFX100-12 device. We achieved a frequency of 253 MHz, which allows the architecture to calculate GF(2163) scalar multiplication for radix 28 in 9 mus. Our results for GF(2163) show a speed-up that ranges from 1.5 to 326 in comparison to previous FPGA implementations and a speed-up ranges from 1.1 to 5.6 in comparison to previous ASIC implementations.
TL;DR: By the cryptographic properties of an "ergodic matrix", this paper proposes a hard problem based on the ergodic matrices over F2, and uses it to construct a public key encryption scheme.
Abstract: This paper proposes a new public key encryption scheme. It is based on the difficulty of deducing x and y from A and B = x ċ A ċ y in a specific monoid (m, ċ) which is noncommutative. So we select and do research work on the certain monoid which is formed by all the n×n matrices over finite field F2 under multiplication. By the cryptographic properties of an "ergodic matrix", we propose a hard problem based on the ergodic matrices over F2, and use it construct a public key encryption scheme.
TL;DR: The proposed AOP-based multiplier with low degree uses the modified Booth’s algorithm to develop a new multiplexer-based bit-parallel multiplier that is simple and modular and such properties are important for VLSI hardware implementation.
Abstract: This investigation presents an effective algorithm for computing multiplication over a class of GF(2m) based on both irreducible all one polynomials (AOPs) and equally spaced polynomials (ESPs). The proposed AOP-based multiplier uses the modified Booth's algorithm to develop a new multiplexer-based bit-parallel multiplier that is simple and modular and such properties are important for VLSI hardware implementation. The multiplier requires ⌈m/4⌉(m + 1) MUX4 x 1 and (1.5m2 + 0.5m - 1) XOR gates. Its time delay is not greater than TM + (3 + log2⌈m/4⌉)TX, where TM and TX are the time delays of MUX4 x 1 and 2-input XOR gate, respectively. For a certain degree, an irreducible ESP with a high degree can be obtained from a corresponding irreducible AOP with a relatively very low degree. Using the subword parallel processing, the proposed AOP-based multiplier with low degree can also be adopted to realize ESP-based multipliers with high degrees.
TL;DR: This paper proposed a new modular inverse algorithm based on the right-shifting binary Euclidean algorithm that shows substantial reduction in computation time over Galois field GF(p).
Abstract: This paper proposed a new modular inverse algorithm based on the right-shifting binary Euclidean algorithm. For an n-bit numbers, the number of operations for the proposed algorithm is reduced about 61.3% less than the classical binary extended Euclidean algorithm. The proposed algorithm implementation shows substantial reduction in computation time over Galois field GF(p).
TL;DR: A built-in self-test (BIST) circuit is proposed for generating test patterns internally that obviates the need of having three extra pins for the control inputs and also provides public-key security in cryptography.
Abstract: This paper presents a C-testable technique for detecting transition faults with 100% fault coverage in the polynomial basis (PB) bit parallel (BP) multiplier circuits over GF(2m). The proposed technique requires only 10 vectors, which is independent of multiplier size, at the cost of 6% (avg.) extra hardware and three control pins. The proposed constant test vectors which are sufficient to detect both the transition and stuck-at faults in the multiplier circuits can be derived directly without any requirement of an ATPG tool. As the GF(2m) multipliers have found critical applications in public key cryptography and need secure internal testing, a built-in self-test (BIST) circuit is proposed for generating test patterns internally. This obviates the need of having three extra pins for the control inputs and also provides public-key security in cryptography. Area and delay of the testable circuit are analyzed using 0.18mum CMOS technology library from UMC
TL;DR: XSL (eXtended Sparse Linearization) is a recent algebraic attack aimed at the Advanced Encryption Standard, and it is suggested that if a more compact representation of equation systems can be found, such as one where the variables are 8- byte blocks, or even a more generalized form of 8n-byte blocks, it may be possible to increase the speed of XSL dramatically.
Abstract: XSL (eXtended Sparse Linearization) is a recent algebraic attack aimed at the Advanced Encryption Standard. In order to shed some light into the behavior of the algorithm, which is largely unknown, we have studied XSL on equation systems with variables interpreted either as bits or bytes. The algorithm solves byte-systems much faster than it does bit-systems, which promts us to suggest that if a more compact representation of equation systems can be found, such as one where the variables are 8-byte blocks, or even a more generalized form of 8n-byte blocks, it may be possible to increase the speed of XSL dramatically.
TL;DR: A C-testable implementation of polynomial basis (PB) bit parallel (BP) multiplier over the Galois fields of form GF(2 m) for detecting stuck-at faults in multiplier circuits has been proposed, resulting in low power testability.
Abstract: In this paper, a C-testable implementation of polynomial basis (PB) bit parallel (BP) multiplier over the Galois fields of form GF(2 m) for detecting stuck-at faults in multiplier circuits has been proposed. The length of the constant test set is only 8. The fault detection can be incorporated in the multiplier circuit with only three extra inputs for controllability. The gate counts of the proposed testable multiplier as a function of degree m is also analyzed. The proposed constant test set is much smaller than ATPG generated or algorithmic test set, resulting in low power testability. As the GF(2 m) multipliers have found some critical field applications and need for efficient online testing, built-in self-test (BIST) circuit is proposed to generate test pattern internally. This BIST also obviates the need of having three extra pins for the control inputs. Area and delay of testable circuits and BIST circuit is analyzed using 0.18mum CMOS technology library from UMC. The proposed test pattern has the intrinsic ability to detect single bit errors in the test pattern generator (TPG) itself. The test set provides 100 percent single fault coverage
TL;DR: Generalized multilevel constructions for binary RM(r,m) codes using projections onto GF(2q) are presented and are readily applicable for their efficient decoding.
Abstract: Generalized multilevel constructions for binary RM(r,m) codes using projections onto GF(2 q ) are presented. These constructions exploit component codes over GF(2), GF(4),..., GF(2 q ) that are based on shorter Reed-Muller codes and set partitioning using partition chains of length-2 l codes. Using these constructions we derive multilevel constructions for the Barnes-Wall ?(r,m) family of lattices which also use component codes over GF(2), GF(4),..., GF(2 q ) and set partitioning based on partition chains of length-2 l lattices. These constructions of Reed-Muller codes and Barnes-Wall lattices are readily applicable for their efficient decoding.
TL;DR: A novel, high-speed, low-area architecture for multiplication and squaring over GF(2m), which utilizes the most significant bit multiplication algorithm and polynomial basis and uses NIST-recommended polynomials.
Abstract: We propose a novel, high-speed, low-area architecture for multiplication and squaring over GF(2m) The proposed architecture is processor array based, which utilizes the most significant bit multiplication algorithm and polynomial basis A design space exploration to optimize the area and speed of the proposed architecture was done Our architecture requires only m processing elements as compared to m2/2 for the best previous design We use NIST-recommended polynomials, which makes our design secure and more suitable for cryptographic engines The proposed architecture is implemented for m isin {163,283,571} on a Xilinx XC2V4000-6 device to verify its functionality and measure its performance We achieve a frequency of 264 MHz, which allows the architecture to calculate GF(2163) multiplication in 640 ns and squaring in 57 ns
TL;DR: A novel, low-area, high-speed architecture for the basic operations over GF(2m), which utilizes the most significant bit multiplication algorithm and polynomial basis and uses the National Institute of Standard and Technology recommended polynomials.
Abstract: We propose a novel, low-area, high-speed architecture for the basic operations over GF(2m) The proposed architecture is a processor array based, which utilizes the most significant bit multiplication algorithm and polynomial basis A design space exploration to optimize the area and speed of the proposed architecture was done We use the National Institute of Standard and Technology recommended polynomials, which makes our design secure and more suitable for cryptographic applications The proposed architecture is implemented for misin {163,283,571} on a Xilinx XC2V4000 device to verify its functionality and measure its performance We achieve a frequency of 264 MHz, which allows the architecture to calculate GF(2163) multiplication in 640 ns and inversion in 14357 mus
TL;DR: This correspondence considers constraining the length of the filter bank to be equal to N+1, required in certain error control coding applications and contrast this factorization based method with an existing trial and error approach employing the Berlekamp factoring algorithm.
Abstract: Over the real field all degree-J paraunitary (PU) multirate systems can be described by the multiplication of J degree-1 lattice blocks and a unitary matrix. Over the finite field GF(2r) this degree-1 factorization is not complete, i.e., it only describes a subset of all possible PU systems. In the two-channel case degree-2tau blocks are also required to completely describe all PU systems over GF(2 r). Therefore, different factorizations can be considered. Each factorization generates a subset of PU systems. It is interesting to consider if these different factorizations have distinct properties. In this correspondence, we specifically consider constraining the length of the filter bank to be equal to N+1. This is required in certain error control coding applications. We contrast this factorization based method with an existing trial and error approach employing the Berlekamp factoring algorithm. A key advantage of the proposed method is the elimination of redundant polyphase factorizations. Further simplifications over GF(2) identified by this method are also discussed