TL;DR: A modification of belief propagation is presented that enables us to decode LDPC codes defined on high order Galois fields with a complexity that scales as p log/sub 2/ (p), p being the field order.
Abstract: We present a modification of belief propagation that enables us to decode LDPC codes defined on high order Galois fields with a complexity that scales as p log/sub 2/ (p), p being the field order. With this low complexity algorithm, we are able to decode GF(2/sup q/) LDPC codes up to a field order value of 256. We show by simulation that ultra-sparse regular LDPC codes in GF(64) and GF(256) exhibit very good performance.
TL;DR: This paper introduces NTRUSIGN, a new family of signature schemes based on solving the approximate closest vector problem (APPR-CVP) in NTRU-type lattices and introduces the idea of using carefully chosen perturbations to limit the information that is obtainable from an analysis of a large signature transcript.
Abstract: In this paper we introduce NTRUSIGN, a new family of signature schemes based on solving the approximate closest vector problem (APPR-CVP) in NTRU-type lattices. We explore the properties of general APPR-CVP based signature schemes (e.g. GGH) and show that they are not immune to transcript attacks even in the random oracle model. We then introduce the idea of using carefully chosen perturbations to limit the information that is obtainable from an analysis of a large signature transcript. In the case of NTRUSIGN this can be achieved while maintaining attractive efficiency properties.
TL;DR: A cryptographic processor for Elliptic Curve Cryptography (ECC) is described, capable of handling arbitrary curves without requiring reconfiguration and integrated into the open source toolkit OpenSSL, which implements the Secure Sockets Layer (SSL).
Abstract: We describe a cryptographic processor for Elliptic Curve Cryptography (ECC). ECC is evolving as an attractive alternative to other public-key cryptosystems such as the Rivest-Shamir- Adleman algorithm (RSA) by offering the smallest key size and the highest strength per bit. The cryptographic processor performs point multiplication for elliptic curves over binary polynomial fields GF(2m). In contrast to other designs that only support one curve at a time, our processor is capable of handling arbitrary curves without requiring reconfiguration. More specifically, it can handle both named curves as standardized by the National Institute for Standards and Technology (NIST) as well as any other generic curves up to a field degree of 255. Efficient support for arbitrary curves is particularly important for the targeted server applications that need to handle requests for secure connections generated by a multitude of heterogeneous client devices. Such requests may specify curves which are infrequently used or not even known at implementation time.
We have implemented the cryptographic processor in a field-programmable gate array (FPGA) running at a clock frequency of 66.4 MHz. Its performance is 6955 point multiplications per second for named curves over GF(2163) and 3308 point multiplications per second for generic curves over GF(2163). We have integrated the cryptographic processor into the open source toolkit OpenSSL, which implements the Secure Sockets Layer (SSL) which is today's dominant Internet security protocol.
This report is an extended version of a paper presented at the IEEE 14th International Conference on Application-specific Systems, Architectures and Processors, The Hague, June 2003 where it received the "Best Paper Award".
TL;DR: Two systolic architectures for inversion and division in GF(2/sup m/) based on a modified extended Euclidean algorithm are presented, one of which uses an adder or an (m+l)-bit ring counter inside each control cell, while the other distributes the ring counters into the computing cells, thereby reducing each control cells to just two gates.
Abstract: We present two systolic architectures for inversion and division in GF(2/sup m/) based on a modified extended Euclidean algorithm Our architectures are similar to those proposed by others in that they consist of two-dimensional arrays of computing cells and control cells with only local intercell connections and have O(m/sup 2/) area-time product However, in comparison to similar architectures, both our architectures have critical path delays that are smaller, gate counts that range from being considerably smaller to only slightly larger, and latencies that are identical for inversion but somewhat larger for division One architecture uses an adder or an (m+l)-bit ring counter inside each control cell, while the other architecture distributes the ring counters into the computing cells, thereby reducing each control cell to just two gates
TL;DR: This work designs and optimized four high performance parallel GF (2) multipliers for an FPGA realization and analyzes the time and area complexities to create the first hardware realization of subquadratic arithmetic and currently the fastest and most efficient implementation of 233 bit finite field multipliers.
Abstract: For many applications from the areas of cryptography and coding, finite field multiplication is the most resource and time consuming operation We have designed and optimized four high performance parallel GF (2) multipliers for an FPGA realization and analyzed the time and area complexities One of the multipliers uses a new hybrid structure to implement the Karatsuba algorithm For increasing performance, we make excessive use of pipelining and efficient control techniques and use a modern state-of-the-art FPGA technology As a result we have, to our knowledge, the first hardware realization of subquadratic arithmetic and currently the fastest and most efficient implementation of 233 bit finite field multipliers
TL;DR: This work outlines that multiplication of binary polynomials can be easily integrated into a multiplier datapath for integers without significant additional hardware, and presents new algorithms for multiple-precision arithmetic in GF(2/sup m/) based on the availability of an instruction for single- Precision multiplication ofbinary polynmials.
Abstract: The performance of elliptic curve (EC) cryptosystems depends essentially on efficient arithmetic in the underlying finite field. Binary finite fields GF(2/sup m/) have the advantage of "carry-free" addition. Multiplication, on the other hand, is rather costly since polynomial arithmetic is not supported by general-purpose processors. We propose a combined hardware/software approach to overcome this problem. First, we outline that multiplication of binary polynomials can be easily integrated into a multiplier datapath for integers without significant additional hardware. Then, we present new algorithms for multiple-precision arithmetic in GF(2/sup m/) based on the availability of an instruction for single-precision multiplication of binary polynomials. The proposed hardware/software approach is considerably faster than a "conventional" software implementation and well suited for constrained devices like smart cards. Our experimental results show that an enhanced 16 bit RISC processor is able to generate a 191 bit ECDSA signature in less than 650 msec when the core is clocked at 5 MHz.
TL;DR: A novel digit-serial modular multiplier that uses a hybrid architecture to perform the reduction operation needed to reduce the multiplication result: hardwired logic is used for fast reduction of named curves and the multiplier circuit is reused for reduction of generic curves.
Abstract: We describe a cryptographic processor for elliptic curve cryptography (ECC). ECC is evolving as an attractive alternative to other public-key schemes such as RSA by offering the smallest key size and the highest strength per bit. The processor performs point multiplication for elliptic curves over binary polynomial fields GF(2/sup m/). In contrast to other designs that only support one curve at a time, our processor is capable of handling arbitrary curves without requiring reconfiguration. More specifically, it can handle both named curves as standardized by NIST as well as any other generic curves up to a field degree of 255. Efficient support for arbitrary curves is particularly important for the targeted server applications that need to handle requests for secure connections generated by a multitude of heterogeneous client devices. Such requests may specify curves which are infrequently used or not even known at implementation time. Our processor implements 256 bit modular multiplication, division, addition and squaring. The multiplier constitutes the core function as it executes the bulk of the point multiplication algorithm. We present a novel digit-serial modular multiplier that uses a hybrid architecture to perform the reduction operation needed to reduce the multiplication result: hardwired logic is used for fast reduction of named curves and the multiplier circuit is reused for reduction of generic curves. The performance of our FPGA-based prototype, running at a clock frequency of 66.4 MHz, is 6955 point multiplications per second for named curves over GF(2/sup 163/) and 3308 point multiplications per second for generic curves over GF(2/sup 163/).
TL;DR: Using the self duality of an optimal normal basis (ONB) of type II, a bit parallel systolic multiplier over GF(2/sup m/), which has a low hardware complexity and a low latency is presented.
Abstract: Using the self duality of an optimal normal basis (ONB) of type II, we present a bit parallel systolic multiplier over GF(2/sup m/), which has a low hardware complexity and a low latency We show that our multiplier has a latency m+1 and the basic cell of our circuit design needs 5 latches (flip-flops) On the other hand, most of other multipliers of the same type have latency 3m and the basic cell of each multiplier needs 7 latches Comparing the gates areas in each basic cell, we find that the hardware complexity of our multiplier is 25 percent reduced from the multipliers with 7 latches
TL;DR: The controlled-multiplication operation, which is the only group-specific operation in Shor's algorithms for factoring and solving the Discrete Log Problem, is described, and the detailed size, width and depth complexity of such circuits are given, which ultimately will allow us to obtain detailed upper bounds on the amount of quantum resources needed to solve instances of the DLP.
Abstract: In this paper we discuss the problem of performing elementary finite field arithmetic on a quantum computer. Of particular interest, is the controlled-multiplication operation, which is the only group-specific operation in Shor's algorithms for factoring and solving the Discrete Log Problem. We describe how to build quantum circuits for performing this operation on the generic Galois fields GF($p^k$), as well as the boundary cases GF($p$) and GF($2^k$). We give the detailed size, width and depth complexity of such circuits, which ultimately will allow us to obtain detailed upper bounds on the amount of quantum resources needed to solve instances of the DLP on such fields.
TL;DR: This work presents algorithms that are especially suited to high-performance devices like large-scaled server computers and shows how to perform an efficient field multiplication for operands of arbitrary size, and how to achieve efficient field reduction for dense polynomials.
Abstract: This work discusses generic arithmetic for arbitrary binary fields in the context of elliptic curve cryptography (ECC). ECC is an attractive public-key cryptosystem recently endorsed by the US government for mobile/wireless environments which are limited in terms of their CPU, power, and network connectivity. Its efficiency enables constrained, mobile devices to establish secure end-to-end connections. Hence the server side has to be enabled to perform ECC operations for a vast number of mobile devices that use variable parameters in an efficient way to reduce cost. We present algorithms that are especially suited to high-performance devices like large-scaled server computers. We show how to perform an efficient field multiplication for operands of arbitrary size, and how to achieve efficient field reduction for dense polynomials. We also give running times of our implementation for both general elliptic curves and Koblitz curves on various platforms, and analyze the results. Our new algorithms are the fastest algorithms for arbitrary binary fields in literature.
TL;DR: In this article, the problem of providing an arithmetic program on a finite field for making a computer perform a function of expanding the possibility of further accelerating the operation on the finite field by using a basis whose operation cost is small is addressed.
Abstract: PROBLEM TO BE SOLVED: To provide an arithmetic program on a finite field for making a computer perform a function of expanding the possibility of further accelerating the operation on the finite field by using a basis whose operation cost is small. SOLUTION: In the arithmetic program on a finite field, a finite field having q elements is set to GF(q), q=p 2 , v is one of the elements of GF(p), a check trident symbol α satisfies α 2 +v=0, and the basis is [α, α 2 ]. First, an extension field element input section 10 receives the input of the elements of the finite field GF(q) represented by using the elements of the finite field GF(p), and sends the elements of the finite field GF(p) constituting the inputted elements of the finite field GF(q) to an extension field element operation section 20. The extension field element operation section 20 to which the elements of the finite field GF(p) are sent performs the arithmetic operation among elements of the finite field GF(p) constituting the inputted elements of the finite field GF(q), calculates the operation result on the finite field GF(q). An operation result outputting section 30 outputs the operation result by using the elements of the elements of the finite field GF(p). COPYRIGHT: (C)2005,JPO&NCIPI
TL;DR: A new method for degree reduction is introduced which is significantly faster than previously reported iterative techniques and shortens the critical path of the reduction circuit by a factor of between 1.36 and 3.0 for digit-sizes ranging from d=4 to 16.
Abstract: We present an architecture for digit-serial multiplication in finite fields GF(2/sup m/) with applications to cryptography. The proposed design uses polynomial basis representation and interleaves multiplication steps with degree reduction steps. An M-bit multiplier works with arbitrary irreducible polynomials and can be used for any binary field of order 2/sup m//spl les/2/sup M/. We introduce a new method for degree reduction which is significantly faster than previously reported iterative techniques. A representative example for a digit-size of d=4, illustrating the reduction circuit, is given. Experimental results show that the proposed method shortens the critical path of the reduction circuit by a factor of between 1.36 and 3.0 for digit-sizes ranging from d=4 to 16.
TL;DR: A new division architecture for GF(2m) using the standard basis representation is proposed based on a modified version of the binary extended greatest common divisor (GCD) algorithm, which provides a compact and fast divider.
Abstract: Division over a finite field GF(2m) is the most time and area consuming operation. In this paper, A new division architecture for GF(2m) using the standard basis representation is proposed. Based on a modified version of the binary extended greatest common divisor (GCD) algorithm, we design a compact and fast divider. The proposed divider can produce division results at a rate of one per 2m - 1 clock cycles. Analysis shows that the computational delay time of the proposed architecture is significantly less than previously proposed dividers with reduced transistor counts. Furthermore, since the new architecture does not restrict the choice of irreducible polynomials and has the features of regularity and modularity, it provides a high flexibility and scalability with respect to the field size m.
TL;DR: Specialized hardware devices for speeding up the linear algebra step of the number field sieve are proposed and whether the required hardware fits onto a single wafer when dealing with cryprographically relevant parameters is questioned.
Abstract: Bernstein [1] and Lenstra et al. [5] have proposed specialized hardware devices for speeding up the linear algebra step of the number field sieve. A key issue in the design of these devices is the question whether the required hardware fits onto a single wafer when dealing with cryprographically relevant parameters.
TL;DR: These multipliers are highly area efficient and require fewer number of logic gates even when compared with the most area efficient multiplier available in the open literature, which makes the proposed multipliers suitable for applications where the value of m is large but space is of concern, e.g., resource constrained cryptographic systems.
Abstract: For efficient hardware implementation of finite field arithmetic units, the use of a normal basis is advantageous. Two architectures for multipliers over the finite field GF(2/sup m/) are proposed. Both of these multipliers are of sequential type - after receiving the coordinates of the two input field elements, they go through m iterations (or clock cycles) to finally yield all the coordinates of the product in parallel. These multipliers are highly area efficient and require fewer number of logic gates even when compared with the most area efficient multiplier available in the open literature. This makes the proposed multipliers suitable for applications where the value of m is large but space is of concern, e.g., resource constrained cryptographic systems. Additionally, the AND gate count for one of the multipliers is /spl lfloor/m/2/spl rfloor/+1 only. This implies that if the multiplication over GF(2/sup m/) is performed using a suitable subfield GF(2/sup n/), where n>1 and n|m, then the corresponding multiplier architecture will yield a highly efficient digit or word serial multiplier.
TL;DR: This work designs and optimized four high performance parallel GF(2/sup 233/) multipliers, for an FPGA realization, and analyzes the time and area complexities to create the first hardware realization of subquadratic arithmetic.
Abstract: For many applications from the areas of cryptography and coding, finite field multiplication is the most resource and time consuming operation We have designed and optimized four high performance parallel GF(2/sup 233/) multipliers, for an FPGA realization, and analyzed the time and area complexities One of the multipliers uses a new hybrid structure to implement the Karatsuba algorithm For increasing performance, we make excessive use of pipelining and efficient control techniques and use a modem state-of-the-art FPGA technology As a result we have, to our knowledge, the first hardware realization of subquadratic arithmetic and currently the fastest and most efficient implementation of 233 bit finite field multipliers
TL;DR: This work investigates the GF( p) inversion and presents several phases in the design of efficient hardware implementations to compute the Montgomery modular inverse, and proposes a scalable and unified architecture for a Montgomery inverse hardware that operates in both GF(p) and GF(2n) fields.
Abstract: The computation of the inverse of a number in finite fields, namely Galois Fields GF(p) or GF(2n), is one of the most complex arithmetic operations in cryptographic applications. In this work, we investigate the GF(p) inversion and present several phases in the design of efficient hardware implementations to compute the Montgomery modular inverse. We suggest a new correction phase for a previously proposed almost Montgomery inverse algorithm to calculate the inversion in hardware. It is also presented how to obtain a fast hardware algorithm to compute the inverse by multi-bit shifting method. The proposed designs have the hardware scalability feature, which means that the design can fit on constrained areas and still handle operands of any size. In order to have long-precision calculations, the module works on small precision words. The word-size, on which the module operates, can be selected based on the area and performance requirements. The upper limit on the operand precision is dictated only by the available memory to store the operands and internal results. The scalable module is in principle capable of performing infinite-precision Montgomery inverse computation of an integer, modulo a prime number. We also propose a scalable and unified architecture for a Montgomery inverse hardware that operates in both GF(p) and GF(2n) fields. We adjust and modify a GF(2n) Montgomery inverse algorithm to benefit from multi-bit shifting hardware features making it very similar to the proposed best design of GF(p) inversion hardware. We compare all scalable designs with fully parallel ones based on the same basic inversion algorithm. All scalable designs consumed less area and in general showed better performance than the fully parallel ones, which makes the scalable design a very efficient solution for computing the long precision Montgomery inverse.
TL;DR: The current paper presents a new AB2 algorithm based on the MSB-first scheme using a standard basis representation of Galois fields, GF( 2m), and parallel-in parallel-out and serial-in serial-out systolic realizations for computing AB2 and inversion/division in GF(2m).
TL;DR: The design of a low-power multiply/accumulate (MAC) unit for efficient arithmetic in finite fields is presented, which combines integer arithmetic and polynomial arithmetic into a single functional unit which can be configured at run-time to serve both types of fields.
Abstract: Recent multi-application smart cards are equipped with powerful 32-bit RISC cores clocked at 33 MHz or even more. They are able to support a variety of public-key cryptosystems, including elliptic curve systems over prime fields GF(p) and binary fields GF(2 m ) of arbitrary order. This flexibility is achieved by implementing the cryptographic primitives in software and taking advantage of dedicated instruction set extensions along with special functional units for low-level arithmetic operations. In this paper, we present the design of a low-power multiply/accumulate (MAC) unit for efficient arithmetic in finite fields. The MAC unit combines integer arithmetic and polynomial arithmetic into a single functional unit which can be configured at run-time to serve both types of fields, GF(p) and GF(2 m ). Our experimental results show that a properly designed unified (dual-field) multiplier consumes significantly less power in polynomial mode than in integer mode.
TL;DR: In this paper, a scalable unified architecture for Montgomery multiplication over either of the finite fields GF(p) and GF(2n) was described. But the main advantage of this architecture is that a control signal which is broadcast to all cells to suppress carries under GF(n) is not needed, thus, larger multipliers can be synthesised whose pipelined speed is independent of the buffering required for the control signal.
Abstract: We describe a scalable unified architecture for Montgomery multiplication over either of the finite fields GF(p) and GF(2n). This architecture has the advantage of possessing a new redundant binary adder that supports carry-save additions under either of the Galois fields without the need for an external control signal to specify which field is to be used. Its main advantage over previously reported dual field multiplier is that a control signal which is broadcast to all cells to suppress carries under GF(2n is not needed. Consequently, larger multipliers can be synthesised whose pipelined speed is independent of the buffering required for the control signal.
TL;DR: A 173-bit (m = 173) Type II Optimal Normal Basis (ONBII) representation is chosen in the implementation of the Galois Field GF(2/sup m/) arithmetic logic unit by asynchronous architecture and is especially aimed at low power consumption by reducing the switching activities in the latches.
Abstract: Elliptic curve cryptography is becoming popular in recent decades due to its high security strength per bit, less memory resources and low processing power which makes it attractive for application in energy constraint applications such as contact-less smart cards. In this paper, a 173-bit (m = 173) Type II Optimal Normal Basis (ONBII) representation is chosen in the implementation of the Galois Field GF(2/sup m/) arithmetic logic unit by asynchronous architecture. This proposed architecture uses the advantages of asynchronous properties and is especially aimed at low power consumption by reducing the switching activities in the latches, reducing the number of cycles to complete each multiplication process and reducing the number of squaring operations in each inversion process. The simulation results show that the resulting ALU consumes only 110.8 nW in 780 ns to complete each multiplication operation.
TL;DR: This work provides the VHDL description of an architecture for exponentiation in GF(2/sup m/) based in the square-and-multiply method, called binary method, using two multipliers in parallel previously developed by ourselves.
Abstract: Exponentiation in finite or Galois fields, GF(2/sup m/), is a basic operation for several algorithms in areas such as cryptography, error-correlation codes and digital signal processing. Nevertheless the involved calculations are very time consuming, especially when they are performed by software. Due to performance and security reasons, it is often more convenient to implement cryptographic algorithms by hardware. In order to overcome the well-known drawback of little or inexistent flexibility associated to traditional application specific integrated circuits (ASIC) solutions, we propose an architecture using field programmable gate arrays (FPGA). A cheap but still flexible modular exponentiation can be implemented using these devices. We provide the VHDL description of an architecture for exponentiation in GF(2/sup m/) based in the square-and-multiply method, called binary method, using two multipliers in parallel previously developed by ourselves. Our structure, compared with other designs reported earlier, introduces an important saving in hardware resources.
TL;DR: A hardware-software co-design approach for flexible programmable Galois Field Processing for applications which require operations over GF(2m), such as RS and BCH codes, Elliptic Curve Cryptography and the AES is described.
Abstract: This paper describes a hardware-software co-design approach for flexible programmable Galois Field Processing for applications which require operations over GF(2m), such as RS and BCH codes, Elliptic Curve Cryptography and the AES Complexities of flexible implementations of different applications on a same computation architecture can be migrated to software during design time However, the underlying GF(2m) arithmetic architecture needs to be designed with software programmability (or reconfigurability) in mind We describe novel reconfigurable subword parallel GF(2m) arithmetic architectures designed with an associated instruction set architecture for different applications over GF(2m) and same applications with differing parameters Design space exploration is carried out with two simple parameters P and Q which can be changed at design time and will affect the performance of different applications and flexibility of the final implementation We show implementation results given for an FPGA prototype of the processor and programmed for RS and BCH coding, AES and elliptic curve cryptography with differing parameters Complexity figures and configuration overheads for subword parallel GF(2m) arithmetic architectures are also estimated and discussed
TL;DR: An arithmetic logic unit for elliptic curve cryptosystems over GF (2/sup m/) with the inclusion of a hardware inversion operation, which is as fast as a multiplication, is presented.
Abstract: In this paper the authors presented an arithmetic logic unit (ALU) for elliptic curve cryptosystems over GF (2/sup m/). The novelty of these ALU is the inclusion of a hardware inversion operation, which is as fast as a multiplication. So faster algorithms can be used for the computation of kP. Although a serial multiplication and inversion was used, a faster computation than other parallelized hardware implementations was achieved.
TL;DR: Whether it is bit serial or bit parallel, the multiplier has a better or comparable hardware complexity and critical path delay and has the same unidirectional data flow to the multipliers with MSB (Most Significant Bit) first scheme.
Abstract: By using a polynomial basis with LSB (Least Significant Bit) first scheme, we present new bit serial and bit parallel systolic multipliers over GF(2/sup m/). Our bit serial systolic multiplier has only one control signal with 10 latches in each basic cell. Also, our bit parallel multiplier has unidirectional data flow with 7 latches in each basic cell. Thus, whether it is bit serial or bit parallel, our multiplier has a better or comparable hardware complexity and critical path delay and has the same unidirectional data flow to the multipliers with MSB (Most Significant Bit) first scheme.
TL;DR: In this paper, the degree distribution of the t-nomial multiples of primitive polynomials of a nonlinear combiner generator for stream cipher system has been studied.
Abstract: A standard model of nonlinear combiner generator for stream cipher system combines the outputs of several independent Linear Feedback Shift Register (LFSR) sequences using a nonlinear Boolean function to produce the key stream. Given such a model, cryptanalytic attacks have been proposed by finding the sparse multiples of the connection polynomials corresponding to the LFSRs. In this direction recently a few works are published on t-nomial multiples of primitive polynomials. We here provide further results on degree distribution of the t-nomial multiples. However, getting the sparse multiples of just a single primitive polynomial does not suffice. The exact cryptanalysis of the nonlinear combiner model depends on finding sparse multiples of the products of primitive polynomials. We here make a detailed analysis on t-nomial multiples of products of primitive polynomials. We present new enumeration results for these multiples and provide some estimation on their degree distribution.
TL;DR: Various properties of these fastest LI transforms as well as some experimental results for them using standard benchmark functions and their comparison with the generalized Reed-Muller transform are discussed.
Abstract: Linearly Independent (LI) transforms in Galois Field (2) algebra that have fastest transform calculation have been investigated recently. It was found that there are some LI transforms having smaller computational cost than the Reed-Muller transform, which was previously known as the most efficient transform over GF(2). This paper discusses various properties of these fastest LI transforms as well as some experimental results for them using standard benchmark functions and their comparison with the generalized Reed-Muller transform.
TL;DR: A high performance GF(2/sup k/) elliptic curve crypto processor architecture suitable for multimedia security is proposed by using three separate bit-level pipelined digit serial-parallel multipliers that can operate in parallel.
Abstract: A high performance GF(2/sup k/) elliptic curve crypto processor architecture suitable for multimedia security is proposed. To meet the high data rates of multimedia, the new architecture exploits parallelism within elliptic curve point operations after using projective coordinates. In this paper, the decision on which projective coordinate to use is based on its efficiency with regard to its parallel implementation. Two different projective coordinates are compared here. This parallelism is exploited in the new architecture by using three separate bit-level pipelined digit serial-parallel multipliers that can operate in parallel. It is worth pointing that such multipliers are ideally suited for the repetitive multiplications inherent in elliptic curve cryptography. it is believed that such high performance architectures are needed for high end servers that need to support the security of many multimedia streams at the same time.