Top 339 papers published in the topic of Multiplication in 2009

Showing papers on "Multiplication published in 2009"

Posted Content•

Fully Homomorphic Encryption over the Integers.

[...]

Marten van Dijk¹, Craig Gentry², Shai Halevi², Vinod Vaikuntanathan²•Institutions (2)

Massachusetts Institute of Technology¹, IBM²

01 Jan 2009-IACR Cryptology ePrint Archive

TL;DR: In this paper, a somewhat homomorphic encryption scheme using elementary modular arithmetic is described. But the main appeal of their approach is the conceptual simplicity. And the security of their scheme is reduced to finding an approximate integer gcd, i.e., given a list of integers that are near-multiples of a hidden integer, output that hidden integer.

...read moreread less

Abstract: We describe a very simple “somewhat homomorphic” encryption scheme using only elementary modular arithmetic, and use Gentry’s techniques to convert it into a fully homomorphic scheme. Compared to Gentry’s construction, our somewhat homomorphic scheme merely uses addition and multiplication over the integers rather than working with ideal lattices over a polynomial ring. The main appeal of our approach is the conceptual simplicity. We reduce the security of our somewhat homomorphic scheme to finding an approximate integer gcd – i.e., given a list of integers that are near-multiples of a hidden integer, output that hidden integer. We investigate the hardness of this task, building on earlier work of HowgraveGraham.

...read moreread less

1,297 citations

Proceedings Article•10.1145/1654059.1654078•

Implementing sparse matrix-vector multiplication on throughput-oriented processors

[...]

Nathan Bell¹, Michael Garland¹•Institutions (1)

Nvidia¹

14 Nov 2009

TL;DR: This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices.

...read moreread less

Abstract: Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns. We explore SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes. The techniques we propose are efficient, successfully utilizing large percentages of peak bandwidth. Furthermore, they deliver excellent total throughput, averaging 16 GFLOP/s and 10 GFLOP/s in double precision for structured grid and unstructured mesh matrices, respectively, on a GeForce GTX 285. This is roughly 2.8 times the throughput previously achieved on Cell BE and more than 10 times that of a quad-core Intel Clovertown system.

...read moreread less

1,030 citations

Book•

Hardware Implementation of Finite-Field Arithmetic

[...]

Jean-Pierre Deschamps

14 Jan 2009

TL;DR: Hardware Implementation of Finite-Field Arithmetic describes algorithms and circuits for executing finite-field operations, including addition, subtraction, multiplication, squaring, exponentiation, and division.

...read moreread less

Abstract: Implement Finite-Field Arithmetic in Specific Hardware (FPGA and ASIC) Master cutting-edge electronic circuit synthesis and design with help from this detailed guide. Hardware Implementation of Finite-Field Arithmetic describes algorithms and circuits for executing finite-field operations, including addition, subtraction, multiplication, squaring, exponentiation, and division. This comprehensive resource begins with an overview of mathematics, covering algebra, number theory, finite fields, and cryptography. The book then presents algorithms which can be executed and verified with actual input data. Logic schemes and VHDL models are described in such a way that the corresponding circuits can be easily simulated and synthesized. The book concludes with a real-world example of a finite-field application--elliptic-curve cryptography. This is an essential guide for hardware engineers involved in the development of embedded systems. Get detailed coverage of: Modulo m reduction Modulo m addition, subtraction, multiplication, and exponentiation Operations over GF(p) and GF(pm) Operations over the commutative ring Zp[x]/f(x) Operations over the binary field GF(2m) using normal, polynomial, dual, and triangular Table of contents Chapter 1. Mathematical background Chapter 2. Mod m reduction Chapter 3. Mod m operations Chapter 4. Operations over GF(p) Chapter 5. Operations over Zp [x] / f(x) Chapter 6. Operations over GF(pn) Chapter 7. Operations over GF(2m) - Polynomial bases Chapter 8. Operations over GF(2m) - Normal bases Chapter 9. Operations over GF(2m) - Other bases Chapter 10. Elliptic curve cryptographyAppendix A. p = 2(192) - 2(64) - 1 Appendix B. Optical Extension Fields Appendix C. Binary Fields Appendix D. Ada versus VHDL Index

...read moreread less

136 citations

Proceedings Article•10.1109/ACT.2009.162•

Conventional versus Vedic Mathematical Method for Hardware Implementation of a Multiplier

[...]

Parth Mehta¹, Dhanashri H. Gawali¹•Institutions (1)

Maharashtra Academy of Engineering¹

28 Dec 2009

TL;DR: This paper compares and proves implementation of normal multiplication and Vedic multiplication (using Urdhva Tiryakbhyam Sutra) on digital hardware requires same number of multiplication and addition operations.

...read moreread less

Abstract: Aim of this paper is to compare and prove implementation of normal multiplication and Vedic multiplication (using Urdhva Tiryakbhyam Sutra) on digital hardware requires same number of multiplication and addition operations.It makes difference only for mental calculations. Few VHDL codes has been developed for this. All multipliers has been tested for 16X16 multiplications for comparison. Test vectors has been given through a text file. Implementation has been done for the Xilinx FPGA device, Virtex XCV 300 -6PQ240. Various multiplier implementations such as Array multiplier, Multiplier Macro, Vedic multiplier with full partitioning, Vedic multiplier using 4 bit macro, multiplier using 4 bit macro, fully Recursive Vedic multiplier, Vedic multiplier using 8 bit macro have been tested and compared for optimum area and speed.

...read moreread less

127 citations

Journal Article•10.1007/S11227-008-0251-8•

Performance evaluation of the sparse matrix-vector multiplication on modern architectures

[...]

Georgios Goumas¹, Kornilios Kourtis¹, Nikos Anastopoulos¹, Vasileios Karakasis¹, Nectarios Koziris¹ - Show less +1 more•Institutions (1)

National Technical University of Athens¹

01 Oct 2009-The Journal of Supercomputing

TL;DR: This paper revisits the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures and extracts useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.

...read moreread less

Abstract: In this paper, we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided, and thus unsuccessful attempts for optimization. In order to gain an insight into the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. In addition, we investigate the parallel version of the kernel and report on the corresponding performance results and their relation to each architecture's specific multithreaded configuration. Based on our experiments, we extract useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.

...read moreread less

110 citations

Journal Article•10.1016/J.NEUROIMAGE.2008.10.025•

Flexible transfer of knowledge in mental arithmetic--an fMRI study.

[...]

Anja Ischebeck¹, Laura Zamarian², Michael Schocke², Margarete Delazer²•Institutions (2)

University of Graz¹, Innsbruck Medical University²

01 Feb 2009-NeuroImage

TL;DR: Some evidence is yielded that successful transfer of knowledge between arithmetic operations is accompanied by modifications of brain activation patterns, and the left angular gyrus seems not only to be involved in the retrieval of stored arithmetic facts, but also in the transfer betweenithmetic operations.

...read moreread less

98 citations

Journal Article•10.1137/080733243•

Cache-Oblivious Sparse Matrix-Vector Multiplication by Using Sparse Matrix Partitioning Methods

[...]

A. N. Yzelman, Rob H. Bisseling

01 Jun 2009-SIAM Journal on Scientific Computing

TL;DR: A cache-oblivious method to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behavior during sparse matrix-vector multiplication.

...read moreread less

Abstract: In this article, we introduce a cache-oblivious method for sparse matrix-vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behavior during sparse matrix-vector multiplication. Matrices are assumed to be stored in row-major format, by means of the compressed row storage (CRS) or its variants incremental CRS and zig-zag CRS. The zig-zag CRS data structure is shown to fit well with the hypergraph metric used in partitioning sparse matrices for the purpose of parallel computation. The separated block-diagonal (SBD) form is shown to be the appropriate matrix structure for cache enhancement. We have implemented a run-time cache simulation library enabling us to analyze cache behavior for arbitrary matrices and arbitrary cache properties during matrix-vector multiplication within a $k$-way set-associative idealized cache model. The results of these simulations are then verified by actual experiments run on various cache architectures. In all these experiments, we use the Mondriaan sparse matrix partitioner in one-dimensional mode. The savings in computation time achieved by our matrix reorderings reach up to 50 percent, in the case of a large link matrix.

...read moreread less

96 citations

Journal Article•10.1016/J.IPL.2008.09.028•

The Mailman algorithm: A note on matrix--vector multiplication

[...]

Edo Liberty¹, Steven W. Zucker¹•Institutions (1)

Yale University¹

01 Jan 2009-Information Processing Letters

TL;DR: This work claims that if the matrix contains only a constant number of distinct values, then reading the matrix once in O(mn) steps is sufficient to preprocess it such that any subsequent application to vectors requires only O( Mn/log(max{m,n})) operations.

...read moreread less

94 citations

Journal Article•10.1109/TC.2008.209•

A Software Implementation of the IEEE 754R Decimal Floating-Point Arithmetic Using the Binary Encoding Format

[...]

Marius Cornea¹, John Harrison¹, Cristina S. Anderson¹, P. Tang², E. Schneider¹, E. Gvozdev¹ - Show less +2 more•Institutions (2)

Intel¹, D. E. Shaw Research²

01 Feb 2009-IEEE Transactions on Computers

TL;DR: New algorithms and properties are presented in this paper which are used in a software implementation of the IEEE 754R decimal floatingpoint arithmetic, with emphasis on using binary operations efficiently.

...read moreread less

Abstract: The IEEE Standard 754-1985 for binary floating-point arithmetic [19] was revised [20], and an important addition is the definition of decimal floating-point arithmetic [8], [24]. This is intended mainly to provide a robust reliable framework for financial applications that are often subject to legal requirements concerning rounding and precision of the results, because the binary floating-point arithmetic may introduce small but unacceptable errors. Using binary floating-point calculations to emulate decimal calculations in order to correct this issue has led to the existence of numerous proprietary software packages, each with its own characteristics and capabilities. The IEEE 754R decimal arithmetic should unify the ways decimal floating-point calculations are carried out on various platforms. New algorithms and properties are presented in this paper, which are used in a software implementation of the IEEE 754R decimal floating-point arithmetic, with emphasis on using binary operations efficiently. The focus is on rounding techniques for decimal values stored in binary format, but algorithms are outlined for the more important or interesting operations of addition, multiplication, and division, including the case of nonhomogeneous operands, as well as conversions between binary and decimal floating-point formats. Performance results are included for a wider range of operations, showing promise that our approach is viable for applications that require decimal floating-point calculations. This paper extends an earlier publication [6].

...read moreread less

81 citations

Journal Article•10.1109/TC.2009.70•

Bit-Serial and Bit-Parallel Montgomery Multiplication and Squaring over GF(2^m)

[...]

Arash Hariri¹, Arash Reyhani-Masoleh¹•Institutions (1)

University of Western Ontario¹

01 Oct 2009-IEEE Transactions on Computers

TL;DR: This paper considers the Montgomery multiplication in the binary extension fields and designs two bit-parallel multipliers which are comparable to the best finite field multipliers reported in the literature.

...read moreread less

Abstract: Multiplication and squaring are main finite field operations in cryptographic computations and designing efficient multipliers and squarers affect the performance of cryptosystems. In this paper, we consider the Montgomery multiplication in the binary extension fields and study different structures of bit-serial and bit-parallel multipliers. For each of these structures, we study the role of the Montgomery factor, and then by using appropriate factors, propose new architectures. Specifically, we propose two bit-serial multipliers for general irreducible polynomials, and then derive bit-parallel Montgomery multipliers for two important classes of irreducible polynomials. In this regard, first we consider trinomials and provide a way for finding efficient Montgomery factors which results in a low time complexity. Then, we consider type-II irreducible pentanomials and design two bit-parallel multipliers which are comparable to the best finite field multipliers reported in the literature. Moreover, we consider squaring using this family of irreducible polynomials and show that this operation can be performed very fast with the time complexity of two XOR gates.

...read moreread less

77 citations

Journal Article•10.1109/TC.2009.110•

Improving the Speed of Parallel Decimal Multiplication

[...]

Ghassem Jaberipur¹, Amir Kaivani¹•Institutions (1)

Shahid Beheshti University¹

01 Nov 2009-IEEE Transactions on Computers

TL;DR: In order to improve the speed of parallel decimal multiplication, a new PPG method is presented, fine-tune the PPR method of one of the full solutions and the final addition scheme of the other; thus, assembling a new full solution is presented.

...read moreread less

Abstract: Hardware support for decimal computer arithmetic is regaining popularity. One reason is the recent growth of decimal computations in commercial, scientific, financial, and Internet-based computer applications. Newly commercialized decimal arithmetic hardware units use radix-10 sequential multipliers that are rather slow for multiplication-intensive applications. Therefore, the future relevant processors are likely to host fast parallel decimal multiplication circuits. The corresponding hardware algorithms are normally composed of three steps: partial product generation (PPG), partial product reduction (PPR), and final carry-propagating addition. The state of the art is represented by two recent full solutions with alternative designs for all the three aforementioned steps. In addition, PPR by itself has been the focus of other recent studies. In this paper, we examine both of the full solutions and the impact of a PPR-only design on the appropriate one. In order to improve the speed of parallel decimal multiplication, we present a new PPG method, fine-tune the PPR method of one of the full solutions and the final addition scheme of the other; thus, assembling a new full solution. Logical Effort analysis and 0.13 mum synthesis show at least 13 percent speed advantage, but at a cost of at most 36 percent additional area consumption.

...read moreread less

Journal Article•10.1109/TVLSI.2009.2019415•

Flexible Hardware Processor for Elliptic Curve Cryptography Over NIST Prime Fields

[...]

K. Ananyi, H. Alrimeih¹, Daler Rakhmatov¹•Institutions (1)

University of Victoria¹

01 Aug 2009-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This work describes a flexible hardware processor for performing computationally expensive modular addition, subtraction, multiplication, and inversion over prime finite fields GF(p) .

...read moreread less

Abstract: Exchange of private information over a public medium must incorporate a method for data protection against unauthorized access. Elliptic curve cryptography (ECC) has become widely accepted as an efficient mechanism to secure sensitive data. The main ECC computation is a scalar multiplication, translating into an appropriate sequence of point operations, each involving several modular arithmetic operations. We describe a flexible hardware processor for performing computationally expensive modular addition, subtraction, multiplication, and inversion over prime finite fields GF(p) . The proposed processor supports all five primes p recommended by NIST, whose sizes are 192, 224, 256, 384, and 521 bits. It can also be programmed to automatically execute sequences of modular arithmetic operations. Our field-programmable gate-array implementation runs at 60 MHz and takes between 4 and 40 ms (depending on the used prime) to perform a typical scalar multiplication.

...read moreread less

Patent•

Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture

[...]

Alexandre E. Eichenberger¹, Michael K. Gschwind¹, John A. Gunnels¹•Institutions (1)

IBM¹

17 Aug 2009

TL;DR: In this paper, a vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register, and a cross multiply add operation is used to generate a partial product.

...read moreread less

Abstract: Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

...read moreread less

Book Chapter•10.1007/978-3-642-10628-6_17•

Software Implementation of Pairing-Based Cryptography on Sensor Networks Using the MSP430 Microcontroller

[...]

Conrado Porto Lopes Gouvêa¹, Julio López¹•Institutions (1)

State University of Campinas¹

5 Dec 2009

TL;DR: This work describes a software implementation of pairing-based cryptography and elliptic curve cryptography for the MSP430 microcontroller, which is used in some wireless sensors including the Tmote Sky and TelosB and improves the speed of both pairing computation and point multiplication.

...read moreread less

Abstract: The software implementation of cryptographic schemes for wireless sensor networks poses a challenge due to the limited capabilites of the platform. Nevertheless, its feasibility has been shown in recent papers. In this work we describe a software implementation of pairing-based cryptography and elliptic curve cryptography for the MSP430 microcontroller, which is used in some wireless sensors including the Tmote Sky and TelosB. We have implemented the pairing computation for the MNT and BN curves over prime fields along with the ECDSA scheme. The main result of this work is a platform-specific optimization for the multiplication and reduction routines that leads to a 28% speedup in the field multiplication compared to the best known timings published. This optimization consequently improves the speed of both pairing computation and point multiplication.

...read moreread less

Journal Article•10.1037/A0015829•

Operation-specific effects of numerical surface form on arithmetic strategy.

[...]

Jamie I. D. Campbell¹, Nicole M. Alberts¹•Institutions (1)

University of Saskatchewan¹

01 Jul 2009-Journal of Experimental Psychology: Learning, Memory and Cognition

TL;DR: It is proposed that retrieval efficiency for arithmetic connects diverse performance and strategy-related effects across key arithmetic factors, including arithmetic operation, numerical size, and numeral format, to promote a shift to procedural backup strategies.

...read moreread less

Abstract: Educated adults solve simple addition problems primarily by direct memory retrieval, as opposed to by counting or other procedural strategies, but they report using retrieval substantially less often with problems in written-word format (four + eight) compared with digit format (4 + 8). It was hypothesized that retrieval efficiency is relatively low with word operands compared with digits and that this promotes a shift to procedural backup strategies. Consistent with this hypothesis, Experiment 1 demonstrated greater word-format costs on retrieval usage for addition than subtraction, which was due to increased counting for addition but not subtraction. Experiment 2 demonstrated greater word-format costs on retrieval for division than multiplication, which was due to increased use of multiplication-fact reference to solve division problems. Format-related strategy shifts away from retrieval reflected both the efficiency of retrieval for a given operation and the availability of viable alternative strategies. The results demonstrate that calculation processes are not abstracted away from problem surface form. The authors propose that retrieval efficiency for arithmetic connects diverse performance and strategy-related effects across key arithmetic factors, including arithmetic operation, numerical size, and numeral format.

...read moreread less

Journal Article•10.1016/J.COMPELECENG.2008.06.009•

Parallel crypto-devices for GF(p) elliptic curve multiplication resistant against side channel attacks

[...]

Santosh Ghosh¹, Monjur Alam¹, Dipanwita Roy Chowdhury¹, Indranil Sen Gupta¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Mar 2009-Computers & Electrical Engineering

TL;DR: Two different parallelization techniques to speedup the GF(p) elliptic curve multiplication in affine coordinates and the corresponding architectures are proposed and show better throughput of the proposed implementations as compared to existing reported architectures.

...read moreread less

Journal Article•10.1109/TVLSI.2008.2005288•

On Efficient Implementation of Accumulation in Finite Field Over $GF(2^{m})$ and its Applications

[...]

Pramod Kumar Meher¹•Institutions (1)

Nanyang Technological University¹

01 Apr 2009-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper presents a simple but highly useful modification of the conventional hardware implementation of accumulation in finite field over GF(2m) by performing the accumulation operation using m number of T flip-flops instead of using a combination ofm number of XOR gates with equal number of D flip-Flops in dependent loop structures.

...read moreread less

Abstract: Finite field accumulation is the simplest of all the finite field operations, but at the same time, it is one of the most frequently encountered operations in finite field arithmetic. In this paper, we present a simple but highly useful modification of the conventional hardware implementation of accumulation in finite field over GF(2m) . The critical path, as well as, the hardware-complexity are reduced in the proposed design by performing the accumulation operation using m number of T flip-flops instead of using a combination of m number of XOR gates with equal number of D flip-flops in dependent loop structures. The conventional design is found to involve nearly 39% more area, 53% more delay, and 40% more maximum ac power consumption compared with the proposed accumulator. The proposed finite field accumulator is used further for the implementation of serial/parallel polynomial-basis finite field multiplication and bit-serial inter-conversion between polynomial basis representation and normal basis representation over GF(2m). The area-time complexity of the proposed bit-serial/parallel multiplier is less than half of the best of the corresponding existing structures. The structure proposed for digit-serial/parallel multiplication for trinomials is found to involve nearly 56% less area-time complexity compared with the best of the corresponding existing multipliers; and the existing design of bit-serial basis conversion is found to involve nearly twice area-time complexity compared with the proposed design using the proposed finite field accumulator.

...read moreread less

Patent•

Method and apparatus for communication efficient private information retrieval and oblivious transfer

[...]

Zulfikar Ramzan¹, Craig Gentry²•Institutions (2)

Google¹, NTT DoCoMo²

4 Feb 2009

TL;DR: In this article, a method for performing private retrieval of information from a database is presented, in which an index corresponding to information to be retrieved from the database and a query that does not reveal the index to the database is generated.

...read moreread less

Abstract: A method, article of manufacture and apparatus for performing private retrieval of information from a database is disclosed. In one embodiment, the method comprising obtaining an index corresponding to information to be retrieved from the database and generating a query that does not reveal the index to the database. The query is an arithmetic function of the index and a secret value, wherein the arithmetic function includes a multiplication group specified by a modulus of a random value whose order is divisible by a prime power, such that the prime power is an order of the random value. The secret value is an arithmetic function of the index that comprises a factorization into prime numbers of the modulus. The method further comprises communicating the query to the database for execution of the arithmetic function against the entirety of the database.

...read moreread less

Book Chapter•10.1007/978-3-642-03138-0_32•

Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

[...]

Alexander Monakov, Arutyun Avetisyan

21 Jul 2009

TL;DR: In this paper, the authors discuss implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs, and outline an algorithm and various optimizations, and identify potential future improvements and challenging tasks, which is faster on matrix having many high fill-ratio blocks but slower on matrices with low number of non-zero elements per row.

...read moreread less

Abstract: We discuss implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs. We outline an algorithm and various optimizations, and identify potential future improvements and challenging tasks. In comparison with previously published implementation, our implementation is faster on matrices having many high fill-ratio blocks but slower on matrices with low number of non-zero elements per row.

...read moreread less

Journal Article•10.1016/J.MICPRO.2008.08.002•

Fast point multiplication on Koblitz curves: Parallelization method and implementations

[...]

Kimmo Järvinen¹, Jorma Skyttä¹•Institutions (1)

Helsinki University of Technology¹

01 Mar 2009-Microprocessors and Microsystems

TL;DR: A novel parallelization method utilizing point operation interleaving utilizing parallel field multipliers is presented and FPGA implementations are described showing the practical feasibility of this method.

...read moreread less

Journal Article•10.1080/09602010802296378•

Rehabilitation of arithmetic fact retrieval via extensive practice: A combined fMRI and behavioural case-study

[...]

Luisa Zaunmüller¹, Frank Domahs¹, Katharina Dressel¹, Jan Lonnemann¹, Elise Klein¹, Anja Ischebeck², Klaus Willmes¹ - Show less +3 more•Institutions (2)

RWTH Aachen University¹, Innsbruck Medical University²

16 Apr 2009-Neuropsychological Rehabilitation

TL;DR: The training led to a change in calculation strategies: Prior to training, the patient used predominantly time-consuming back-up strategies, after training he relied increasingly on the direct retrieval of arithmetic facts from long-term memory.

...read moreread less

Abstract: The present study investigates the effects of a training of arithmetic fact retrieval in a patient suffering from particular difficulties with multiplication facts. Over a period of four weeks simple multiplication facts were trained extensively. The outcome of the training was assessed behaviourally and changes in cerebral activation patterns were investigated using fMRI. The training led to a change in calculation strategies: Prior to training, the patient used predominantly time-consuming back-up strategies, after training he relied increasingly on the direct retrieval of arithmetic facts from long-term memory. Regarding the fMRI results, prefrontal activations were observed for untrained problems, which can be attributed to the application of back-up strategies strongly relying on fronto-executive functions. Interestingly, significant foci of activation for both trained and untrained items were found in the angular gyrus of the right hemisphere, which, however, differed in their exact localisation. For the trained condition, activations were observed in anterior parts of the angular gyrus which may be related to the training-based automatisation in fact retrieval. Activations in the untrained condition were found in a more posterior portion of the angular gyrus, that might be attributable to one of the patient's back-up strategies, namely to recite a whole multiplication row to get to the correct answer.

...read moreread less

Journal Article•10.1109/TCSI.2008.2011585•

A New Algorithm for High-Speed Modular Multiplication Design

[...]

Ming-Der Shieh¹, Jun-Hong Chen¹, Wen-Ching Lin¹, Hao-Hsuan Wu¹•Institutions (1)

National Cheng Kung University¹

01 Sep 2009-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: This paper first explores how to relax the data dependency that exists between multiplication, quotient determination, and modular reduction in the conventional Montgomery modular multiplication algorithm, and proposes a new modularmultiplication algorithm for high-speed hardware design.

...read moreread less

Abstract: Modular exponentiation in public-key cryptosystems is usually achieved by repeated modular multiplications on large integers. Designing high-speed modular multiplication is thus very crucial to speed up the decryption/encryption process. In this paper, we first explore how to relax the data dependency that exists between multiplication, quotient determination, and modular reduction in the conventional Montgomery modular multiplication algorithm. Then, we propose a new modular multiplication algorithm for high-speed hardware design. The speed improvement is achieved by reducing the critical path delay from the 4-to-2 to 3-to-2 carry-save addition. The resulting time complexity of our development is further decreased by simultaneously performing the multiplication and modular reduction processes. Experimental results show that the developed modular multiplication can operate at speeds higher than those of related work. When the proposed modular multiplication is applied to modular exponentiation, both time and area-time advantages are obtained.

...read moreread less

Patent•

Sparse matrix-vector multiplication on graphics processor units

[...]

Muthu Baskaran¹, Rajesh J. Bordawekar¹•Institutions (1)

IBM¹

30 Sep 2009

TL;DR: In this paper, techniques for optimizing sparse matrix-vector multiplication (SpMV) on a graphics processing unit (GPU) are provided. The techniques include receiving a sparse matrixvector multiplication, analyzing the sparse matrix vector multiplication to identify one or more optimizations, and determining whether the sparse vector multiplication is to be reused across computation.

...read moreread less

Abstract: Techniques for optimizing sparse matrix-vector multiplication (SpMV) on a graphics processing unit (GPU) are provided. The techniques include receiving a sparse matrix-vector multiplication, analyzing the sparse matrix-vector multiplication to identify one or more optimizations, wherein analyzing the sparse matrix-vector multiplication to identify one or more optimizations comprises analyzing a non-zero pattern for one or more optimizations and determining whether the sparse matrix-vector multiplication is to be reused across computation, optimizing the sparse matrix-vector multiplication, wherein optimizing the sparse matrix-vector multiplication comprises optimizing global memory access, optimizing shared memory access and exploiting reuse and parallelism, and outputting an optimized sparse matrix-vector multiplication.

...read moreread less

Journal Article•10.1109/TVLSI.2008.2003004•

Time-Efficient Single Constant Multiplication Based on Overlapping Digit Patterns

[...]

J. Thong¹, Nicola Nicolici¹•Institutions (1)

McMaster University¹

01 Sep 2009-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: By integrating ODPs within H(k), the best existing heuristic algorithm for single constant multiplication (SCM), this work can on average significantly improve the run time of the algorithm (typically by one order of magnitude) while still reducing the number of adders.

...read moreread less

Abstract: Common subexpression elimination (CSE) algorithms try to minimize the number of adders (or subtracters) required to implement constant multiplication by searching and substituting common patterns in the CSE representation of a constant. CSE algorithms, in general, cannot find certain patterns due to inherent restrictions in the CSE representation. We propose overlapping digit patterns (ODPs) to remove some of these restrictions. We integrate ODPs into H(k), the best existing heuristic algorithm for single constant multiplication (SCM). H(k) is not applicable to the multiple constant multiplication (MCM) problem, so we cannot consider this problem. Generally, H(k) finds solutions very close to optimal, so there is a strict limitation on any further improvement which applies to any new heuristic. Instead, by integrating ODPs within H(k), we can on average significantly improve the run time of the algorithm (typically by one order of magnitude) while still reducing the number of adders.

...read moreread less

Book Chapter•10.1007/978-3-642-03644-6_11•

Efficient Multiplication of Polynomials on Graphics Hardware

[...]

Pavel Emeliyanenko¹•Institutions (1)

Max Planck Society¹

21 Aug 2009-Lecture Notes in Computer Science

TL;DR: This work presents the algorithm to multiply univariate polynomials with integer coefficients efficiently using the Number Theoretic transform (NTT) on Graphics Processing Units (GPU) and compared the approach with CPU-based implementations of polynomial and large integer multiplication provided by NTL and GMP libraries.

...read moreread less

Abstract: We present the algorithm to multiply univariate polynomials with integer coefficients efficiently using the Number Theoretic transform (NTT) on Graphics Processing Units (GPU). The same approach can be used to multiply large integers encoded as polynomials. Our algorithm exploits fused multiply-add capabilities of the graphics hardware. NTT multiplications are executed in parallel for a set of distinct primes followed by reconstruction using the Chinese Remainder theorem (CRT) on the GPU. Our benchmarking experiences show the NTT multiplication performance up to 77 GMul/s. We compared our approach with CPU-based implementations of polynomial and large integer multiplication provided by NTL and GMP libraries.

...read moreread less

Proceedings Article•10.1109/ARITH.2009.23•

Fully Redundant Decimal Arithmetic

[...]

Saeid Gorgin¹, Ghassem Jaberipur¹•Institutions (1)

Shahid Beheshti University¹

8 Jun 2009

TL;DR: A framework for fully redundant decimal arithmetic, where all operands and results belong to the same redundant decimal number system and can be stored and later used as operands of further decimal operations.

...read moreread less

Abstract: Hardware implementation of all the basic radix-10 arithmetic operations is evolving as a new trend in the design and implementation of general purpose digital processors. Redundant representation of partial products and remainders is common in the multiplication and division hardware algorithms, respectively. Carry-free implementation of the more frequent add/subtract operations, with the byproduct of enhancing the speed of multiplication and division, is possible with redundant number representation. However, conversion of redundant results to conventional representations entails slow carry propagation that can be avoided if the results are kept in redundant format for later use as operands of other arithmetic operations. Given that redundant decimal representations, contrary to redundant binary, do not necessarily require extra storage, we are motivated to develop a framework for fully redundant decimal arithmetic, where all operands and results belong to the same redundant decimal number system and can be stored and later used as operands of further decimal operations. In this paper, we present a new faster decimal signed digit add/sub unit and show how it can be efficiently used in the design of decimal multipliers and dividers, where all operands and results are represented with the same redundant digit set [–7, 7].

...read moreread less

Journal Article•10.3758/APP.71.3.471•

"2 x 3" primes naming "6": evidence from masked priming.

[...]

Javier García-Orza¹, Jesús Damas-López¹, Antonio J. Matas¹, José Miguel Rodríguez¹•Institutions (1)

University of Málaga¹

01 Apr 2009-Attention Perception & Psychophysics

TL;DR: This research employed a new procedure with the aim of assessing the automatic retrieval of multiplication more directly, and argues that this procedure is highly valuable for exploring the mechanisms involved in simple arithmetic solving.

...read moreread less

Abstract: It is a common assumption for multiplication-solving models that single-digit multiplications are automatically retrieved. However, the experimental evidence for this is based on paradigms under suspicion. In this research, we employed a new procedure with the aim of assessing the automatic retrieval of multiplication more directly. In two experiments, multiplication automatism was studied using briefly presented primes (stimulus onset asynchrony = 48 msec) in a number-naming task. In Experiment 1, in the congruent conditions, the target and the prime were the same numbers (e.g., prime, 6; target, 6) or the target was the solution to the multiplication prime (e.g., prime, 2×3=; target, 6). In the incongruent conditions, no relationship existed between the primes and the targets (e.g., prime, 32; target, 6; or prime, 4×8=; target, 6). Experiment 2 explored the relevance of the equal sign for the multiplication-priming effect. Data showed that naming was faster when the solution of the multiplication prime matched the target, as compared with the incongruent condition (multiplication-priming effect), and that these effects were found irrespective of the presence of the equal sign. The fact that this priming effect was found even though the participants were unaware of the presentation of the primes supports the automatic character of single-digit multiplication. We conclude by arguing that this procedure is highly valuable for exploring the mechanisms involved in simple arithmetic solving.

...read moreread less

Proceedings Article•10.1109/RECONFIG.2009.28•

FPGA Implementations of BCD Multipliers

[...]

Gustavo Sutter, Elías Todorovich, Géry Jean Antoine Bioul, Martín Vázquez, J-P. Deschamps - Show less +1 more

9 Dec 2009

TL;DR: A variety of algorithms for basic one by one digit multiplication are proposed and FPGA implementations are presented, and time and area results for sequential and combinational implementations show better figures compared with previous published work.

...read moreread less

Abstract: This paper presents a number of approaches to implement decimal multiplication algorithms on Xilinx FPGA’s. A variety of algorithms for basic one by one digit multiplication are proposed and FPGA implementations are presented. Later on N by one digit and N by M digit multiplications are studied. Time and area results for sequential and combinational implementations show better figures compared with previous published work. Comparisons against binary fully-optimized multipliers emphasize the interest of the proposed design techniques.

...read moreread less

Journal Article•10.1016/J.OPTLASENG.2009.04.015•

Encryption by using matrix-added, or matrix-multiplied input images placed in the input plane of a double random phase encoding geometry

[...]

Madan Singh, Arvind Kumar, Kehar Singh

01 Nov 2009-Optics and Lasers in Engineering

TL;DR: In this article, the authors describe image encryption by combining the images with several matrices made with letters/numerals and placed in the input plane of a double random phase encoding (DRPE) system.

...read moreread less

Posted Content•

Multiplication of Distributions in one dimension: possible approaches and applications to $\delta$-function and its derivatives

[...]

F. Bagarello

01 Apr 2009-arXiv: Mathematical Physics

TL;DR: In this article, a new class of multiplications of distributions in one dimension merging together two different regularizations of distributions is introduced, and some of the features of these multiplications are discussed in a certain detail.

...read moreread less

Abstract: We introduce a new class of multiplications of distributions in one dimension merging together two different regularizations of distributions. Some of the features of these multiplications are discussed in a certain detail. We use our theory to study a certain number of examples, involving products between Dirac delta functions and its successive derivatives.

...read moreread less

...

Expand