Top 327 papers published in the topic of Multiplication in 2007

Showing papers on "Multiplication published in 2007"

Journal Article•10.1109/LSP.2006.882088•

Multiplication-Free One-Bit Transform for Low-Complexity Block-Based Motion Estimation

[...]

Sarp Erturk¹•Institutions (1)

15 Jan 2007-IEEE Signal Processing Letters

TL;DR: In this article, a multiplication-free one-bit transform (1BT) for low-complexity block-based motion estimation is presented, which can be implemented in integer arithmetic using addition and shifts only, reducing the computational complexity, processing time, and power consumption.

...read moreread less

Abstract: A multiplication-free one-bit transform (1BT) for low-complexity block-based motion estimation is presented in this letter. A novel filter kernel is utilized to construct the 1BT of image frames using addition and shift operations only. It is shown that the proposed approach provides the same motion estimation accuracy at macro-block level and even better accuracy for smaller block sizes compared to previously proposed 1BT methods. Because the proposed 1BT approach does not require multiplication operations, it can be implemented in integer arithmetic using addition and shifts only, reducing the computational complexity, processing time, as well as power consumption

...read moreread less

142 citations

Journal Article•10.1109/TCSII.2007.903212•

Lower Bounds for Constant Multiplication Problems

[...]

Oscar Gustafsson¹•Institutions (1)

Linköping University¹

12 Nov 2007-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: Lower bounds for problems related to realizing multiplication by constants with shifts, adders, and subtracters are presented and have applications in proving the optimality of solutions obtained by heuristics.

...read moreread less

Abstract: Lower bounds for problems related to realizing multiplication by constants with shifts, adders, and subtracters are presented. These lower bounds are straightforwardly calculated and have applications in proving the optimality of solutions obtained by heuristics.

...read moreread less

127 citations

Book Chapter•10.1007/978-3-540-71316-6_21•

Precise fixpoint computation through strategy iteration

[...]

Thomas Martin Gawlitza¹, Helmut Seidl¹•Institutions (1)

Technische Universität München¹

24 Mar 2007

TL;DR: A practical algorithm for computing least solutions of systems of equations over the integers with addition, multiplication with positive constants, maximum and minimum, based on strategy iteration is presented.

...read moreread less

Abstract: We present a practical algorithm for computing least solutions of systems of equations over the integers with addition, multiplication with positive constants, maximum and minimum. The algorithm is based on strategy iteration. Its run-time (w.r.t. the uniform cost measure) is independent of the sizes of occurring numbers. We apply our technique to solve systems of interval equations. In particular, we show how arbitrary intersections as well as full interval multiplication in interval equations can be dealt with precisely.

...read moreread less

100 citations

Proceedings Article•10.1109/FPT.2007.4439248•

Efficient and High-Throughput Implementations of AES-GCM on FPGAs

[...]

Gang Zhou, H. Michalik, L. Hinsenkamp

1 Dec 2007

TL;DR: By optimizing and balancing the critical delay of sub-components, two high performance GCM implementations are presented on Xilinx Virtex-4 devices and provide a good criterion to minimize the influence of technology mapping.

...read moreread less

Abstract: This paper addresses efficient and high-throughput implementations of AES-GCM optimized for FPGAs. Two main components, the AES engine and the modular multiplication over GF(2m), are discussed and their complexities on FPGAs are shown. Instead of discussing the complexities by using AND and XOR gates as primitives, we present the complexity analysis directly based on FPGA primitives, e.g., Look-Up-Tables (LUTs). For the modular multiplier, the straightforward multiplication is used to get a speed-efficient design while the Karatsuba 's algorithm is used to get an area-efficient design. For the AES engine, the composite field approach is adopted and then inner-round pipelining technology is applied. The estimated resource consumption returned by the complexity analysis provides a good criterion to minimize the influence of technology mapping. By optimizing and balancing the critical delay of sub-components, two high performance GCM implementations are presented on Xilinx Virtex-4 devices.

...read moreread less

86 citations

Journal Article•10.1109/TPDS.2007.1068•

High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

[...]

Ling Zhuo¹, Gerald R. Morris, Viktor K. Prasanna¹•Institutions (1)

University of Southern California¹

01 Oct 2007-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper identifies two basic methods for designing serial reduction circuits: the tree-traversal method and the striding method and proposes high-performance and area-efficient designs using each method.

...read moreread less

Abstract: Field-programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelining in FPGA-based floating-point units, data hazards may occur during these sequential reduction operations. Improperly designed reduction circuits can adversely impact the performance, impose unrealistic buffer requirements, and consume a significant portion of the FPGA. In this paper, we identify two basic methods for designing serial reduction circuits: the tree-traversal method and the striding method. Using accumulation as an example, we analyze the design trade-offs among the number of adders, buffer size, and latency. We then propose high-performance and area-efficient designs using each method. The proposed designs reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline or imposing unrealistic buffer requirements. Using a Xilinx Virtex-ll Pro FPGA as the target device, we implemented our designs and present performance and area results.

...read moreread less

81 citations

Book Chapter•10.1007/978-3-540-74735-2_19•

How to Maximize the Potential of FPGA Resources for Modular Exponentiation

[...]

Daisuke Suzuki¹•Institutions (1)

Mitsubishi Electric¹

10 Sep 2007

TL;DR: A circuit architecture that can handle multiple data lengths using the same circuits and improve the Montgomery multiplication algorithm in order to maximize the performance of the multiplication unit in FPGA.

...read moreread less

Abstract: This paper describes a modular exponentiation processing method and circuit architecture that can exhibit the maximum performance of FPGA resources. The modular exponentiation architecture proposed by us comprises three main techniques. The first technique is to improve the Montgomery multiplication algorithm in order to maximize the performance of the multiplication unit in FPGA. The second technique is to improve and balance the circuit delay. The third technique is to ensure and make fast the scalability of the effective FPGA resource. We propose a circuit architecture that can handle multiple data lengths using the same circuits. In addition, our architecture can perform fast operations using small-scale resources; in particular, it can complete 512-bit modular exponentiation in 0.26 ms by means of XC4VF12-10SF363, which is the minimum logic resources in the Virtex-4 Series FPGAs. Also, the number of SLICEs used is approx. 4000 to make a very compact design. Moreover, 1024-, 1536- and 2048-bit modular exponentiations can be processed in the same circuit with the scalability.

...read moreread less

80 citations

Patent•

Optimized FFT/IFFT module

[...]

Maher Amer¹•Institutions (1)

Wilmington University¹

26 Dec 2007

TL;DR: In this article, an optimal hardware implementation of the FFT/IFFT operation that minimizes the number of clock cycles required to compute FFT while at the same time minimizing the complexity of complex multipliers is presented.

...read moreread less

Abstract: We disclose an optimal hardware implementation of the FFT/IFFT operation that minimizes the number of clock cycles required to compute the FFT/IFFT while at the same time minimizing the number of complex multipliers needed. An input module combines a plurality of inputs after applying a multiplication factor to each of the inputs. At least one multiplicand generator generates multiplicands. At least two complex multiplier modules perform complex multiplications with at least one of the complex multiplier modules receiving an output from the input module. A map module receives outputs of the at least two complex multiplier modules, the map module selecting and applying a multiplication factor to each of the outputs received to generate multiple outputs. Finally, an accumulation module receives and performs an accumulation task on each of the multiple outputs of the map module thereby generating a corresponding number of multiple outputs.

...read moreread less

70 citations

Patent•10.20381/RUOR-12098•

Accelerating scalar multiplication on elliptic curve cryptosystems over prime fields

[...]

Patrick Longa¹, Ali Miri¹•Institutions (1)

University of Ottawa¹

14 Sep 2007-IACR Cryptology ePrint Archive

TL;DR: A method and apparatus for accelerating scalar multiplication in an elliptic curve cryptosystem (ECC) over prime fields is provided in this article, where multiplication operations within an ECC point operation are identified and modified utilizing an equivalent point representation that inserts multiples of two.

...read moreread less

Abstract: A method and apparatus for accelerating scalar multiplication in an elliptic curve cryptosystem (ECC) over prime fields is provided. Multiplication operations within an ECC point operation are identified and modified utilizing an equivalent point representation that inserts multiples of two. Algebraic substitutions of the multiplication operations with squaring operations and other cheaper field operations are performed. Scalar multiplication can also be protected against simple side-channel attacks balancing the number of multiplication operations and squaring operations and providing novel atomic structures to implement the ECC operation. In addition, a new coordinate system is defined to enable more effective operation of ECC to multiprocessor environments.

...read moreread less

67 citations

Journal Article•10.1364/OL.32.000716•

Tunable pulse repetition-rate multiplication using phase-only line-by-line pulse shaping

[...]

José Caraquitena¹, Zhi Jiang¹, Daniel E. Leaird¹, Andrew M. Weiner¹•Institutions (1)

Purdue University¹

15 Mar 2007-Optics Letters

TL;DR: A method for all-optical, tunable pulse repetition-rate multiplication of a mode-locked laser based on spectral line-by-line control is demonstrated with very high fidelity.

...read moreread less

Abstract: We demonstrate a method for all-optical, tunable pulse repetition-rate multiplication of a mode-locked laser based on spectral line-by-line control. In particular, two-to-five-times repetition-rate multiplication of a 9 GHz source is achieved with very high fidelity.

...read moreread less

65 citations

Journal Article•10.3758/BF03193508•

Do multiplication and division strategies rely on executive and phonological working memory resources

[...]

Ineke Imbo¹, André Vandierendonck¹•Institutions (1)

Ghent University¹

01 Oct 2007-Memory & Cognition

TL;DR: The role of executive and phonological working memory resources in simple arithmetic was investigated in two experiments as mentioned in this paper, where participants had to solve simple multiplication problems (e.g., 4×8) or simple division problems under no load, phonological-load, and executive-load conditions.

...read moreread less

Abstract: The role of executive and phonological working memory resources in simple arithmetic was investigated in two experiments. Participants had to solve simple multiplication problems (e.g., 4×8; Experiment 1) or simple division problems (e.g., 42÷7; Experiment 2) under no-load, phonological-load, and executive-load conditions. The choice/no-choice method was used to investigate strategy execution and strategy selection independently. Results for strategy execution showed that executive working memory resources were involved in direct memory retrieval of both multiplication and division facts. Executive working memory resources were also involved in the use of nonretrieval strategies. Phonological working memory resources, on the other hand, tended to be involved in nonretrieval strategies only. Results for strategy selection showed no effects of working memory load. Finally, correlation analyses showed that both strategy execution and strategy selection correlated with individual-difference variables, such as gender, math anxiety, associative strength, calculator use, arithmetic skill, and math experience.

...read moreread less

63 citations

Journal Article•10.1109/TC.2007.1076•

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases

[...]

Haining Fan¹, M. Anwar Hasan²•Institutions (2)

Tsinghua University¹, University of Waterloo²

01 Oct 2007-IEEE Transactions on Computers

TL;DR: Based on a recently proposed Toeplitz matrix-vector product approach, a subquadratic computational complexity scheme is presented for multiplications in binary extended finite fields using type I and II optimal normal bases.

...read moreread less

Abstract: Based on a recently proposed Toeplitz matrix-vector product approach, a subquadratic computational complexity scheme is presented for multiplications in binary extended finite fields using type I and II optimal normal bases.

...read moreread less

Proceedings Article•10.1109/MEMCOD.2007.371239•

Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA

[...]

Nirav Dave¹, Kermin Fleming¹, Myron King¹, Michael Pellauer¹, Muralidaran Vijayaraghavan¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

30 May 2007

TL;DR: The first MEMOCODE hardware/software co-design contest posed the following problem: optimize matrix-matrix multiplication in such a way that it is split between the FPGA and PowerPC on a Xilinx Virtex IIPro30, which was implemented on aXilinx XUP development board with 256 MB of DRAM.

...read moreread less

Abstract: The first MEMOCODE hardware/software co-design contest posed the following problem: optimize matrix-matrix multiplication in such a way that it is split between the FPGA and PowerPC on a Xilinx Virtex IIPro30. In this paper we discuss our solution, which we implemented on a Xilinx XUP development board with 256 MB of DRAM. The design was done by the five authors over a span of approximately 3 weeks, though of the 15 possible man-weeks, about 9 were actually spent working on this problem. All hardware design was done using Blue-spec SystemVerilog (BSV), with the exception of an imported Verilog multiplication unit, necessary only due to the limitations of the Xilinx FPGA toolflow optimizations.

...read moreread less

Posted Content•

Optimizing Multiprecision Multiplication for Public Key Cryptography.

[...]

Michael Scott¹, Piotr Szczechowiak¹•Institutions (1)

Dublin City University¹

01 Jan 2007-IACR Cryptology ePrint Archive

TL;DR: In this paper, the authors recall the hybrid method of Gura et al. for multi-precision multiplication, which exploits the increased number of registers available on modern architectures in order to avoid duplicated loads from memory.

...read moreread less

Abstract: In this paper we recall the hybrid method of Gura et al. for multi-precision multiplication [4] which is an improvement on the basic Comba method and which exploits the increased number of registers available on modern architectures in order to avoid duplicated loads from memory. We then show how to improve and generalise the method for application across a wide range of processor types, setting some new records in the process.

...read moreread less

Journal Article•10.1109/TVLSI.2007.893659•

On Concurrent Detection of Errors in Polynomial Basis Multiplication

[...]

Siavash Bayat-Sarmadi¹, M.A. Hasan¹•Institutions (1)

University of Waterloo¹

01 Apr 2007-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Experimental results presented here show that due to an increase in the number of parity bits, the area overhead tends to increase linearly, but the probability of error detection approaches unity fairly quickly, e.g., for eight parity bits.

...read moreread less

Abstract: The detection of errors in arithmetic operations is an important issue. This paper discusses the detection of multiple-bit errors due to faults in bit-serial and bit-parallel polynomial basis (PB) multipliers over binary extension fields. Our approach is based on multiple parity bits. Experimental results presented here show that due to an increase in the number of parity bits, the area overhead tends to increase linearly, but the probability of error detection approaches unity fairly quickly, e.g., for eight parity bits. In bit-serial implementation of a GF(2163) PB multiplier using eight parity bits, the area overhead and the probability of error detection are 10.29% and 0.996, respectively. This is achieved without any increase in the computation time of the GF(2163) PB multiplier

...read moreread less

Proceedings Article•10.1109/ARITH.2007.18•

Fast Modular Reduction

[...]

William C. Hasenplaugh¹, Gunnar Gaubatz¹, Vindoh Gopal¹•Institutions (1)

Intel¹

25 Jun 2007

TL;DR: This paper proposes a modification to Barrett's algorithm that leads to a significant reduction (25% to 75%) in multiplications and additions.

...read moreread less

Abstract: It is widely acknowledged that efficient modular multiplication is a key to high-performance implementation of public-key cryptography, be it classical RSA, Diffie-Hellman, or (hyper-) elliptic curve algorithms. In the recent decade, practitioners have relied mainly on two popular methods: Montgomery Multiplication and regular long-integer multiplication in combination with Barrett's modular reduction technique. In this paper, we propose a modification to Barrett's algorithm that leads to a significant reduction (25% to 75%) in multiplications and additions.

...read moreread less

Proceedings Article•10.1145/1277548.1277585•

Fast arithmetic for triangular sets: from theory to practice

[...]

Xin Li¹, Marc Moreno Maza¹, Éric Schost¹•Institutions (1)

University of Western Ontario¹

29 Jul 2007

TL;DR: A suitable extension of fast univariate Euclidean division of triangular families of polynomials is extended, obtaining theoretical and practical improvements over a direct recursive approach and reaching quasi-linear complexity for a family of special cases.

...read moreread less

Abstract: We study arithmetic operations for triangular families of polynomials, concentrating on multiplication in dimension zero. By a suitable extension of fast univariate Euclidean division, we obtain theoretical and practical improvements over a direct recursive approach; for a family of special cases, we reach quasi-linear complexity. The main outcome we have in mind is the acceleration of higher-level algorithms, by interfacing our low-level implementation with languages such as AXIOM or Maple We show the potential for huge speed-ups, by comparing two AXIOM implementations of van Hoeij and Monagan's modular GCD algorithm.

...read moreread less

...

Expand

Showing papers on "Multiplication published in 2007"

Multiplication-Free One-Bit Transform for Low-Complexity Block-Based Motion Estimation

Lower Bounds for Constant Multiplication Problems

Precise fixpoint computation through strategy iteration

Efficient and High-Throughput Implementations of AES-GCM on FPGAs

High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

How to Maximize the Potential of FPGA Resources for Modular Exponentiation

Optimized FFT/IFFT module

Accelerating scalar multiplication on elliptic curve cryptosystems over prime fields

Tunable pulse repetition-rate multiplication using phase-only line-by-line pulse shaping

Do multiplication and division strategies rely on executive and phonological working memory resources

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases

Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA

Optimizing Multiprecision Multiplication for Public Key Cryptography.

On Concurrent Detection of Errors in Polynomial Basis Multiplication

Fast Modular Reduction

Fast arithmetic for triangular sets: from theory to practice

Comments on "Five, Six, and Seven-Term Karatsuba-Like Formulae

Optimizing double-base elliptic-curve single-scalar multiplication

Multiplication by a Constant is Sublinear

Software Implementation of Arithmetic in

The row–column multiplication of high dimensional rhotrices

Elliptic curve scalar multiplication algorithm using complementary recoding

How to Write Fast Numerical Code: A Small Introduction

Explicit formulas for efficient multiplication in F 3 6m

Scale multiplication in odd Gabor transform domain for edge detection

A Low-Power Unified Arithmetic Unit for Programmable Handheld 3-D Graphics Systems

Interpolation of depth-3 arithmetic circuits with two multiplication gates

SPA resistant Elliptic Curve Cryptosystem using Addition Chains

Cayley-Dickson algebras and loops

Assessing Students' Levels of Understanding Multiplication through Problem Writing