TL;DR: This paper proposes a new and simple method for performing analog arithmetic operations which in this scheme, signals are represented and stored through a memristance of the newly found circuit element, i.e. memristor, instead of voltage or current.
TL;DR: A fresh, geometric interpretation of quaternions is provided, appropriate for contemporary Computer Graphics, based on insights from the algebra and geometry of multiplication in the complex plane.
Abstract: Quaternion multiplication can be applied to rotate vectors in 3-dimensions. Therefore in Computer Graphics, quaternions are sometimes used in place of matrices to represent rotations in 3-dimensions. Yet while the formal algebra of quaternions is well-known in the Graphics community, the derivations of the formulas for this algebra and the geometric principles underlying this algebra are not well understood. The goals of this paper are: i.To provide a fresh, geometric interpretation of quaternions, appropriate for contemporary Computer Graphics; ii.To derive the formula for quaternion multiplication from first principles; iii.To present better ways to visualize quaternions, and the effect of quaternion multiplication on points and vectors in 3-dimensions based on insights from the algebra and geometry of multiplication in the complex plane; iv.To develop simple, intuitive proofs of the sandwiching formulas for rotation and reflection; v.To show how to apply sandwiching to compute perspective projections. In Part I of this paper, we investigate the algebra of quaternion multiplication and focus in particular on topics i and ii. In Part II we apply our insights from Part I to analyze the geometry of quaternion multiplication with special emphasis on topics iii, iv and v.
TL;DR: The cohomology tables of vector bundles on projective spaces up to rational multiple have been studied in this paper, where the authors give an introduction and survey of these newly developed areas.
Abstract: Boij-Soderberg theory describes the Betti diagrams of graded modules over the polynomial ring up to multiplication by a rational number. Analog Eisenbud-Schreyer theory also describes the cohomology tables of vector bundles on projective spaces up to rational multiple. We give an introduction and survey of these newly developed areas.
TL;DR: This paper presents a novel multiplication technique that increases the performance of multiplication by sophisticated caching of operands and significantly reduces the number of needed load instructions which is usually one of the most expensive operation on modern processors.
Abstract: Multi-precision multiplication is one of the most fundamental operations on microprocessors to allow public-key cryptography such as RSA and Elliptic Curve Cryptography (ECC). In this paper, we present a novel multiplication technique that increases the performance of multiplication by sophisticated caching of operands. Our method significantly reduces the number of needed load instructions which is usually one of the most expensive operation on modern processors. We evaluate our new technique on an 8-bit ATmega128 microcontroller and compare the result with existing solutions. Our implementation needs only 2, 395 clock cycles for a 160-bit multiplication which outperforms related work by a factor of 10% to 23 %. The number of required load instructions is reduced from 167 (needed for the best known hybrid multiplication) to only 80. Our implementation scales very well even for larger Integer sizes (required for RSA) and limited register sets. It further fully complies to existing multiply-accumulate instructions that are integrated in most of the available processors.
TL;DR: This paper presents a simple and efficient multiplier with the possibility to achieve an arbitrary accuracy through an iterative procedure, prior to achieving the exact result.
TL;DR: Experimental results show that the proposed modular exponentiation and modular-multiplication design obtain the best delay performance compared with the published works and outperform them in terms of area-time complexity as well.
Abstract: Modular exponentiation with large modulus and exponent, which is usually accomplished by repeated modular multiplications, has been widely used in public key cryptosystems. Typically, the Montgomery's modular-multiplication algorithm is used since no trial division is necessary, and the carry-save addition (CSA) is employed to reduce the critical path. In this paper, we optimize the Montgomery's multiplication and propose architectures to perform the least significant bit first and the most significant bit first algorithms. The developed architecture has the following distinctive characteristics: 1) use of digit serial approach for Montgomery multiplication. 2) Conversion of the CSA representation of intermediate multiplication using carry-skip addition. This allows the critical path to be reduced, albeit with a small-area speed penalty; and 3) precompute the quotient value in Montgomery's iteration in order to speed up the operating frequency. In this paper, we present results in Xilinx Virtex 5 and in 0.18-μm application-specified integrated circuit technologies. For fair comparison with previous works, Xilinx Virtex 2 results are reported. Experimental results show that the proposed modular exponentiation and modular-multiplication design obtain the best delay performance compared with the published works and outperform them in terms of area-time complexity as well.
TL;DR: A generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC) is presented.
Abstract: This paper presents a generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC). The proposed addressing scheme considers the continuous-flow operation with minimum shared memory requirements. To improve throughput, parallel high-radix processing units are employed. We prove that the solution to non-conflict memory access satisfying the constraints of the continuous-flow, variable-size, higher-radix, and parallel-processing operations indeed exists. In addition, a rescheduling technique for twiddle-factor multiplication is developed to reduce hardware complexity and to enhance hardware efficiency. From the results, we can see that the proposed processor has high utilization and efficiency to support flexible configurability for various FFT sizes with fewer computation cycles than the conventional radix-2/radix-4 memory-based FFT processors.
TL;DR: This work investigates the particular but important shape position case and obtains an implementation which is able to manipulate 0-dimensional ideals over a prime field of degree greater than 30000 and outperforms the Magma/Singular/FGb implementations of FGLM.
Abstract: Let I in K[x1,...,xn] be a 0-dimensional ideal of degree D where K is a field. It is well-known that obtaining efficient algorithms for change of ordering of Grobner bases of I is crucial in polynomial system solving. Through the algorithm FGLM, this task is classically tackled by linear algebra operations in K[x1,...,n]/I. With recent progress on Grobner bases computations, this step turns out to be the bottleneck of the whole solving process.Our contribution is an algorithm that takes advantage of the sparsity structure of multiplication matrices appearing during the change of ordering. This sparsity structure arises even when the input polynomial system defining I is dense. As a by-product, we obtain an implementation which is able to manipulate 0-dimensional ideals over a prime field of degree greater than 30000. It outperforms the Magma/Singular/FGb implementations of FGLM.First, we investigate the particular but important shape position case. The obtained algorithm performs the change of ordering within a complexity O(D(Ni>1+nlog(D))), where N1 is the number of nonzero entries of a multiplication matrix. This almost matches the complexity of computing the minimal polynomial of one multiplication matrix. Then, we address the general case and give corresponding complexity results. Our algorithm is dynamic in the sense that it selects automatically which strategy to use depending on the input. Its key ingredients are the Wiedemann algorithm to handle 1-dimensional linear recurrence (for the shape position case), and the Berlekamp-Massey-Sakata algorithm from Coding Theory to handle multi-dimensional linearly recurring sequences in the general case.
TL;DR: This paper discusses how the Strassen algorithm was adapted to operate within the limitations of the GPU and how it dealt with other issues encountered in the implementation process, including details of the memory layout of the authors' FFTs.
Abstract: We have improved our prior implementation of Strassens algorithm for high performance multiplication of very large integers on a general purpose graphics processor (GPU). A combination of algorithmic and implementation optimizations result in a factor of up to 13.9 speed improvement over our previous work, running on an NVIDIA 295. We have also reoptimized the implementation for an NVIDIA 480, from which we obtain a factor of up to 19 speedup in comparison with a Core i7 processor core of the same technology generation. To provide a fairer chip to chip comparison, we also determined total GPU throughput on a set of multiplications relative to all of the cores on a multicore chip running in parallel. We find that the GTX 480 provides a factor of six higher throughput than all four cores/eight threads of the Core i7. This paper discusses how we adapted the algorithm to operate within the limitations of the GPU and how we dealt with other issues encountered in the implementation process, including details of the memory layout of our FFTs. Compared with our earlier work, which used Karatsuba's algorithm to guide multiplication of different operand sizes built on top of Strassen's algorithm being applied to fixed-size segments of the operands, we are now able to apply Strassen's algorithm directly to operands ranging in size from 255K bits to 16,320K bits.
TL;DR: A new SMT solver for formulas of the quantifier-free logic over fixed-sized bit vectors (QF-BV) that can solve subproblems related to the entire arithmetic design component and was successfully evaluated in comparison with other state-of-the-art SMTsolvers.
Abstract: This paper presents a new SMT solver, STABLE, for formulas of the quantifier-free logic over fixed-sized bit vectors (QF-BV). The heart of STABLE is a computer-algebra-based engine which provides algorithms for simplifying arithmetic problems of an SMT instance prior to bit-blasting. As the primary application domain for STABLE we target an SMT-based property checking flow for System-on-Chip (SoC) designs. When verifying industrial data path modules we frequently encounter custom-designed arithmetic components specified at the logic level of the hardware description language being used. This results in SMT problems where arithmetic parts may include non-arithmetic constraints. STABLE includes a new technique for extracting arithmetic bit-level information for these non-arithmetic constraints. Thus, our algebraic engine can solve subproblems related to the entire arithmetic design component. STABLE was successfully evaluated in comparison with other state-of-the-art SMT solvers on a large collection of SMT formulas describing verification problems of industrial data path designs that include multiplication. In contrast to the other solvers STABLE was able to solve instances with bit-widths of up to 64 bits.
TL;DR: The research is presented, extending the one-dimensional method for cache-oblivious SpMV multiplication to two dimensions, while still allowing only row and column permutations on the sparse input matrix, with the largest gain obtained over a factor of 3 in SpMv speed, compared to the natural matrix ordering.
Abstract: In earlier work, we presented a one-dimensional cache-oblivious sparse matrix-vector (SpMV) multiplication scheme which has its roots in one-dimensional sparse matrix partitioning. Partitioning is often used in distributed-memory parallel computing for the SpMV multiplication, an important kernel in many applications. A logical extension is to move towards using a two-dimensional partitioning. In this paper, we present our research in this direction, extending the one-dimensional method for cache-oblivious SpMV multiplication to two dimensions, while still allowing only row and column permutations on the sparse input matrix. This extension requires a generalisation of the compressed row storage data structure to a block-based data structure, for which several variants are investigated. Experiments performed on three different architectures show further improvements of the two-dimensional method compared to the one-dimensional method, especially in those cases where the one-dimensional method already provided significant gains. The largest gain obtained by our new reordering is over a factor of 3 in SpMV speed, compared to the natural matrix ordering.
TL;DR: In this paper, the dual notion of multiplication modules (i.e., comultiplication modules) over a commutative ring has been studied and some results concerning its relation to the notion of commutativity have been obtained.
Abstract: This paper deals with some results concerning the dual notion of multiplication modules (i.e., comultiplication modules) over a commutative ring.
TL;DR: In this paper, an accelerated Schoof-type point-counting algorithm for curves of genus 2 equipped with an efficiently computable real multiplication endomorphism was presented. But this algorithm is not suitable for cryptographic applications, and it cannot compute a 256-bit prime-order Jacobian.
Abstract: We present an accelerated Schoof-type point-counting algorithm for curves of genus 2 equipped with an efficiently computable real multiplication endomorphism. Our new algorithm reduces the complexity of genus 2 point counting over a finite field (\F_{q}) of large characteristic from (\widetilde{O}(\log^8 q)) to (\widetilde{O}(\log^5 q)). Using our algorithm we compute a 256-bit prime-order Jacobian, suitable for cryptographic applications, and also the order of a 1024-bit Jacobian.
TL;DR: Improved costs for the multiplication of matrices of small size are tabulated and standard algorithms for small matrices due to Strassen, Winograd, Pan, Laderman, and Laderman are exploited.
Abstract: We tabulate improved costs for the multiplication of matrices of small size, up to 30. Following previous work by Probert &Fisc her [5], Smith [4], and Mezzarobba [2], we base our approach on previous algorithms for small matrices due to Strassen, Winograd, Pan, Laderman, . . . and show how to exploit these standard algorithms in an improved way. We illustrate the use of our results by generating multiplication code for various rings, such as integers, polynomials, differential operators or linear recurrence operators.
TL;DR: Two algorithms are described, a new co-transformation procedure and an improvement to an existing interpolation method, that reduce these tables to an extent that allows their easy synthesis in logic.
Abstract: The logarithmic number system has been proposed as an alternative to floating-point arithmetic. Multiplication, division and square-root operations are accomplished with fixed-point methods, but addition and subtraction are considerably more challenging. Recent work has demonstrated that these operations too can be done with similar speed and accuracy to their FP equivalents, but the necessary circuitry is complex. In particular, it is dominated by the need for large ROM tables for the storage of non-linear functions. This paper describes two algorithms, a new co-transformation procedure and an improvement to an existing interpolation method, that reduce these tables to an extent that allows their easy synthesis in logic. An implementation shows substantial reductions in area and delay from the previous best 32-bit realisation, with equivalent accuracy.
TL;DR: The generalized algorithm of type Chudnovsky with derivative evaluations on places of degree one, two and four applied on the descent of a Garcia-Stichtenoth tower of algebraic function fields defined over F"2"^"4 enables the best known asymptotic bound to be obtained.
TL;DR: In this paper, the authors investigated the progress of students' learning on multiplication fractions with natural numbers through the five activity levels based on Realistic Mathematics Education (RME) approach proposed by Streefland.
Abstract: This study aimed at investigating the progress of students’ learning on multiplication fractions with natural numbers through the five activity levels based on Realistic Mathematics Education (RME) approach proposed by Streefland. Design research was chosen to achieve this research goal. In design research, the Hypothetical Learning Trajectory (HLT) plays important role as a design and research instrument. This HLT tested to thirty-seven students of grade five primary school (i.e. SDN 179 Palembang). The result of the classroom practices showed that measurement (length) activity could stimulate students’ to produce fractions as the first level in learning multiplication of fractions with natural numbers. Furthermore, strategies and tools used by the students in partitioning gradually be developed into a more formal mathematics in which number line be used as the model of measuring situation and the model for more formal reasoning. The number line then could bring the students to the last activity level, namely on the way to rules for multiplying fractions with natural numbers. Based on this findings, it is suggested that Streefland’s five activity levels can be used as a guideline in learning multiplication of fractions with natural numbers in which the learning process become a more progressive learning.
TL;DR: The present article applies techniques for computing abstract least fixpoint semantics of affine programs over the relational template polyhedra domain to practical algorithms for computing exact least solutions of equation systems over the reals with addition, multiplication by positive constants, minimum and maximum.
Abstract: We present practical algorithms for computing exact least solutions of equation systems over the reals with addition, multiplication by positive constants, minimum and maximum. The algorithms are based on strategy iteration. Our algorithms can, for instance, be used for the analysis of recursive stochastic games. In the present article we apply our techniques for computing abstract least fixpoint semantics of affine programs over the relational template polyhedra domain. In particular, we thus obtain practical algorithms for computing abstract least fixpoint semantics over the abstract domains of intervals, zones, and octagons.
TL;DR: In this paper, the authors present a new SMT solver, STABLE, for formulas of the quantifier-free logic over fixed-sized bit vectors (QF-BV), which provides algorithms for simplifying arithmetic problems of an SMT instance prior to bit-blasting.
Abstract: This paper presents a new SMT solver, STABLE, for formulas of the quantifier-free logic over fixed-sized bit vectors (QF-BV). The heart of STABLE is a computer-algebra-based engine which provides algorithms for simplifying arithmetic problems of an SMT instance prior to bit-blasting. As the primary application domain for STABLE we target an SMT-based property checking flow for System-on-Chip (SoC) designs. When verifying industrial data path modules we frequently encounter custom-designed arithmetic components specified at the logic level of the hardware description language being used. This results in SMT problems where arithmetic parts may include non-arithmetic constraints. STABLE includes a new technique for extracting arithmetic bit-level information for these non-arithmetic constraints. Thus, our algebraic engine can solve subproblems related to the entire arithmetic design component. STABLE was successfully evaluated in comparison with other state-of-the-art SMT solvers on a large collection of SMT formulas describing verification problems of industrial data path designs that include multiplication. In contrast to the other solvers STABLE was able to solve instances with bit-widths of up to 64 bits.
TL;DR: In this article, the authors demonstrate the phenomenon of nonlinear frequency multiplication in sub-micrometer Permalloy dots and show that the efficiency of multiplication is strongly enhanced when the harmonic is resonant with the normal dynamical modes of the dot.
Abstract: We demonstrate the phenomenon of nonlinear frequency multiplication in sub-micrometer Permalloy dots. The efficiency of multiplication is strongly enhanced when the harmonic is resonant with the normal dynamical modes of the dot. We find that the characteristics of resonant enhancement are dependent on the spatial symmetry of the dynamical mode and are different for the double- and the triple-frequency harmonics. The resonant frequency tripling is particularly efficient, providing a practical route for the implementation of microscopic integrated microwave frequency multipliers.
TL;DR: A high-speed implementation of arithmetic in Optimal Prime Fields for the ATmega128, an 8-bit processor used in a number of sensor nodes including the MICAz mote, and an optimized variant of Montgomery multiplication, based on Gura et al's hybrid technique, that takes the low weight of such primes into account to minimize execution time are described.
Abstract: Public-Key Cryptography (PKC) is essential to ensure the authenticity and confidentiality of communication in open computer networks such as the Internet. While RSA is still the most widely used public-key cryptosystem today, it can be expected that Elliptic Curve Cryptography (ECC) will continue to gain importance and become the de-facto standard for PKC in the emerging “Internet of Things.” ECC is particularly attractive for use in resource-restricted devices (e.g. wireless sensor nodes, RFID tags) due to its high level of security per bit, which allows for shorter keys compared to RSA. The performance of elliptic curve cryptosystems is primarily determined by the efficiency of certain arithmetic operations (especially multiplication and squaring) in the underlying finite field. In the present paper, we introduce a high-speed implementation of arithmetic in Optimal Prime Fields (OPFs) for the ATmega128, an 8-bit processor used in a number of sensor nodes including the MICAz mote. An OPF is defined by a prime of the form p = u · 2k +v, whereby u and v are small compared to 2k; in our implementation u is a 16-bit integer and v = 1. A special property of these primes is their low Hamming weight since only a few bits near the MSB and LSB are one. We describe an optimized variant of Montgomery multiplication, based on Gura et al's hybrid technique, that takes the low weight of such primes into account to minimize execution time. Our implementation for the ATmega128 is able to perform a multiplication in a 160-bit OPF in 3,532 clock cycles, which represents a new speed record for 160-bit modular multiplication on an 8-bit processor.
TL;DR: A full specification of the BGW perfect multiplication protocol is provided and its security is proved, and a new multiplication protocol that utilizes bivariate secret sharing is presented in order to achieve higher efficiency while maintaining a round complexity that is constant per multiplication.
Abstract: In the setting of secure multiparty computation, a set of n parties with private inputs wish to jointly compute some functionality of their inputs. One of the most fundamental results of information-theoretically secure computation was presented by Ben-Or, Goldwasser and Wigderson (BGW) in 1988. They demonstrated that any n-party functionality can be computed with perfect security, in the private channels model. The most technically challenging part of this result is a protocol for multiplying two shared values, with perfect security in the presence of up to t > n/3 malicious adversaries.
In this paper we provide a full specification of the BGW perfect multiplication protocol and prove its security. This includes one new step for the perfect multiplication protocol in the case of n/4 ≤ t < n/3. As in the original BGW protocol, this protocol works whenever the parties hold univariate (Shamir) shares of the input values. In addition, we present a new multiplication protocol that utilizes bivariate secret sharing in order to achieve higher efficiency while maintaining a round complexity that is constant per multiplication. Both of our protocols are presented with full proofs of security.
TL;DR: The carry-less multiplication instruction in the latest Intel desktop processors significantly accelerates multiplication in binary fields and presents the opportunity for reevaluating algorithms for binary field arithmetic and scalar multiplication over elliptic curves as mentioned in this paper.
Abstract: The availability of a new carry-less multiplication instruction in the latest Intel desktop processors significantly accelerates multiplication in binary fields and hence presents the opportunity for reevaluating algorithms for binary field arithmetic and scalar multiplication over elliptic curves. We describe how to best employ this instruction in field multiplication and the effect on performance of doubling and halving operations. Alternate strategies for implementing inversion and half-trace are examined to restore most of their competitiveness relative to the new multiplier. These improvements in field arithmetic are complemented by a study on serial and parallel approaches for Koblitz and random curves, where parallelization strategies are implemented and compared. The contributions are illustrated with experimental results improving the state-of-the-art performance of halving and doubling-based scalar multiplication on NIST curves at the 112and 192-bit security levels, and a new speed record for side-channel resistant scalar multiplication in a random curve at the 128-bit security level.
TL;DR: The border rank of the matrix multiplication operator for n by n matrices is a standard measure of its complexity and techniques from algebraic geometry and representation theory are used to show the border rank is at least 2n^2-n.
Abstract: The border rank of the matrix multiplication operator for n by n matrices is a standard measure of its complexity. Using techniques from algebraic geometry and representation theory, we show the border rank is at least 2n^2-n. Our bounds are better than the previous lower bound (due to Lickteig in 1985) of 3/2 n^2+ n/2 -1 for all n>2. The bounds are obtained by finding new equations that bilinear maps of small border rank must satisfy, i.e., new equations for secant varieties of triple Segre products, that matrix multiplication fails to satisfy.
TL;DR: The notion of Lehrer-concave integral is generalized taking instead of the usual arithmetical operations of addition and multiplication of reals more general real operations called pseudo-addition and pseudo-multiplication.
Abstract: The notion of Lehrer-concave integral is generalized taking instead of the usual arithmetical operations of addition and multiplication of reals more general real operations called pseudo-addition and pseudo-multiplication.
TL;DR: An accelerated Schoof-type point-counting algorithm for curves of genus 2 equipped with an efficiently computable real multiplication endomorphism and a 256-bit prime-order Jacobian, suitable for cryptographic applications are presented.
Abstract: We present an accelerated Schoof-type point-counting algorithm for curves of genus 2 equipped with an efficiently computable real multiplication endomorphism. Our new algorithm reduces the complexity of genus 2 point counting over a finite field Fq of large characteristic from O(log q) to O(log q). Using our algorithm we compute a 256-bit prime-order Jacobian, suitable for cryptographic applications, and also the order of a 1024-bit Jacobian.
TL;DR: The developed multiplier architecture is based on vertical and crosswise structure of Ancient Indian Vedic Mathematics and nearly reaches a saturation level in its efficiency at 4×4 decomposition.
Abstract: In this paper, a high performance, high throughput and area efficient architecture of a multiplier for the Field Programmable Gate Array (FPGAs) is proposed. The most significant aspect of the proposed method is that, the developed multiplier architecture is based on vertical and crosswise structure of Ancient Indian Vedic Mathematics. As per The proposed architecture, for two 8-bit numbers; the multiplier and multiplicand, each are grouped as 4-bit numbers so that it decomposes into 4×4 multiplication modules. It is also illustrated that the further hierarchical decomposition of 4×4 modules into 2×2 modules will not have a significant effect in improvement of the multiplier efficiency or in other words multiplier decomposition nearly reaches a saturation level in its efficiency at 4×4 decomposition. The coding is done in VHDL (Very High Speed Integrated Circuits Hardware Description Language) and the FPGA synthesis is done using Xilinx library.
TL;DR: This work shows that Onesuch calculus, a language that takes the GHZ and W states as its basic generators, allows one to encode standard rational calculus, with theGHZ state as multiplication, the W state as addition, and the Pauli Z gate as additive inversion.
Abstract: Graphical calculi for representing interacting quantum systems serve a number of purposes: com-positionally, intuitive graphical reasoning, and a logical underpinning for automation. The power ofthese calculi stems from the fact that they embody generalized symmetries of the structure of quan-tum operations, which, for example, stretch well beyond the Choi-Jamiolkowski isomorphism. Onesuch calculus takes the GHZ and W states as its basic generators. Here we show that this languageallows one to encode standard rational calculus, with the GHZ state as multiplication, the W state asaddition, the Pauli X gate as multiplicative inversion, and the Pauli Z gate as additive inversion.
TL;DR: Two new algorithms (Three-Four split and Four-Three split) based on the principle of splitting the binary partial product into two parts and computing the contributions of the two parts to the partial BCD result in parallel are proposed.
Abstract: Decimal arithmetic has received considerable attention recently due to its suitability for many financial and commercial applications. In particular, numerous algorithms have been recently proposed for decimal multiplication. A major approach to decimal multiplication shaped by these proposals is based on performing the decimal digit-by-digit multiplication in binary, converting the binary partial product back to decimal, and then adding the decimal partial products as appropriate to form the final product in decimal. With this approach, the efficiency of binary-to-BCD partial product conversion is critical for the efficiency of the overall multiplication process. A recently proposed algorithm for this conversion is based on splitting the binary partial product into two parts (i.e., two groups of bits), and then computing the contributions of the two parts to the partial BCD result in parallel. This paper proposes two new algorithms (Three-Four split and Four-Three split) based on this principle. We present our proposed architectures that implement these algorithms and compare them to existing algorithms. The synthesis results show that the Three-Four split algorithm runs 15%faster and occupies 26.1%less area than the best performing equivalent circuit found in the literature. Furthermore, the Four-Three split algorithm occupies 37.5% less area than the state of the art equivalent circuit.