Top 399 papers published in the topic of Multiplication in 2012

Showing papers on "Multiplication published in 2012"

Book Chapter•10.1007/978-3-642-32009-5_38•

Multiparty Computation from Somewhat Homomorphic Encryption

[...]

Ivan Damgård¹, Valerio Pastro¹, Nigel P. Smart², Sarah Zakarias¹•Institutions (2)

Aarhus University¹, University of Bristol²

19 Aug 2012

TL;DR: A general multiparty computation protocol secure against an active adversary corrupting up to $$n-1$$ of the n players is proposed, which may be used to compute securely arithmetic circuits over any finite field $$\mathbb {F}_{p^k}$$.

...read moreread less

Abstract: We propose a general multiparty computation protocol secure against an active adversary corrupting up to $$n-1$$ of the n players. The protocol may be used to compute securely arithmetic circuits over any finite field $$\mathbb {F}_{p^k}$$. Our protocol consists of a preprocessing phase that is both independent of the function to be computed and of the inputs, and a much more efficient online phase where the actual computation takes place. The online phase is unconditionally secure and has total computational and communication complexity linear in n, the number of players, where earlier work was quadratic in n. Moreover, the work done by each player is only a small constant factor larger than what one would need to compute the circuit in the clear. We show this is optimal for computation in large fields. In practice, for 3 players, a secure 64-bit multiplication can be done in 0.05 ms. Our preprocessing is based on a somewhat homomorphic cryptosystem. We extend a scheme by Brakerski et al., so that we can perform distributed decryption and handle many values in parallel in one ciphertext. The computational complexity of our preprocessing phase is dominated by the public-key operations, we need $$On^2/s$$ operations per secure multiplication where s is a parameter that increases with the security parameter of the cryptosystem. Earlier work in this model needed $$\varOmega n^2$$ operations. In practice, the preprocessing prepares a secure 64-bit multiplication for 3 players in about 13 ms.

...read moreread less

1,570 citations

Journal Article•10.1137/110848244•

Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments

[...]

Aydin Buluc, John R. Gilbert

26 Jul 2012-SIAM Journal on Scientific Computing

TL;DR: It is demonstrated that the parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memory-efficient in the general case.

...read moreread less

Abstract: Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient algorithms for general sparse-matrix indexing in distributed memory, provided that the underlying SpGEMM implementation is sufficiently flexible and scalable. We demonstrate that our parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memory-efficient in the general case. This algorithm is the first to yield increasing speedup on an unbounded number of processors; our experiments show scaling up to thousands of processors in a variety of test scenarios.

...read moreread less

265 citations

Proceedings Article•10.1109/HPEC.2012.6408660•

Accelerating fully homomorphic encryption using GPU

[...]

Wei Wang¹, Yin Hu¹, Lianmu Chen¹, Xinming Huang¹, Berk Sunar¹ - Show less +1 more•Institutions (1)

Worcester Polytechnic Institute¹

1 Sep 2012

TL;DR: The GH-FHE primitives for the small setting with a dimension of 2048 on NVIDIA C2050 GPU are implemented and the experimental results show the speedup factors of 7.68, 7.4 and 6.59 for encryption, decryption and recrypt respectively, when compared with the existing CPU implementation.

...read moreread less

Abstract: As a major breakthrough, in 2009 Gentry introduced the first plausible construction of a fully homomorphic encryption (FHE) scheme FHE allows the evaluation of arbitrary functions directly on encrypted data on untwisted servers In 2010, Gentry and Halevi presented the first FHE implementation on an IBM x3500 server However, this implementation remains impractical due to the high latency of encryption and recryption The Gentry-Halevi (GH) FHE primitives utilize multi-million-bit modular multiplications and additions which are time-consuming tasks for a general purpose computer In the GH-FHE implementation, the most computationally intensive arithmetic operation is modular multiplication In this paper, the million-bit modular multiplication is computed in two steps For large number multiplication, Strassen's FFT based algorithm is employed and accelerated on a graphics processing unit (GPU) through its massive parallelism Subsequently, Barrett modular reduction algorithm is applied to implement modular reduction As an experimental study, we implement the GH-FHE primitives for the small setting with a dimension of 2048 on NVIDIA C2050 GPU The experimental results show the speedup factors of 768, 74 and 659 for encryption, decryption and recrypt respectively, when compared with the existing CPU implementation

...read moreread less

149 citations

Journal Article•10.1109/TNNLS.2012.2185059•

Fast and Efficient Second-Order Method for Training Radial Basis Function Networks

[...]

Tiantian Xie¹, Hao Yu¹, J. Hewlett¹, Pawel Rozycki, Bogdan M. Wilamowski¹ - Show less +1 more•Institutions (1)

Auburn University¹

10 Feb 2012-IEEE Transactions on Neural Networks

TL;DR: This paper proposes an improved second order (ISO) algorithm for training radial basis function (RBF) networks that can normally reach smaller training/testing error with much less number of RBF units.

...read moreread less

Abstract: This paper proposes an improved second order (ISO) algorithm for training radial basis function (RBF) networks. Besides the traditional parameters, including centers, widths and output weights, the input weights on the connections between input layer and hidden layer are also adjusted during the training process. More accurate results can be obtained by increasing variable dimensions. Initial centers are chosen from training patterns and other parameters are generated randomly in limited range. Taking the advantages of fast convergence and powerful search ability of second order algorithms, the proposed ISO algorithm can normally reach smaller training/testing error with much less number of RBF units. During the computation process, quasi Hessian matrix and gradient vector are accumulated as the sum of related sub matrices and vectors, respectively. Only one Jacobian row is stored and used for multiplication, instead of the entire Jacobian matrix storage and multiplication. Memory reduction benefits the computation speed and allows the training of problems with basically unlimited number of patterns. Several practical discrete and continuous classification problems are applied to test the properties of the proposed ISO training algorithm.

...read moreread less

131 citations

Faster Algorithms for Rectangular Matrix Multiplication

[...]

François Le Gall¹•Institutions (1)

University of Tokyo¹

27 Aug 2012

TL;DR: A new algorithm for multiplying an n × n^k matrix by an n–k × n matrix, which is better than all known algorithms for rectangular matrix multiplication and recovers exactly the complexity of the algorithm by Coppersmith and Winograd.

...read moreread less

Abstract: Let $\alpha$ be the maximal value such that the product of an $n\times n^\alpha$ matrix by an $n^\alpha\times n$ matrix can be computed with $n^{2+o(1)}$ arithmetic operations. In this paper we show that $\alpha>0.30298$, which improves the previous record $\alpha>0.29462$ by Coppersmith (Journal of Complexity, 1997). More generally, we construct a new algorithm for multiplying an $n\times n^k$ matrix by an $n^k\times n$ matrix, for any value $k eq 1$. The complexity of this algorithm is better than all known algorithms for rectangular matrix multiplication. In the case of square matrix multiplication (i.e., for $k=1$), we recover exactly the complexity of the algorithm by Coppersmith and Wino grad (Journal of Symbolic Computation, 1990). These new upper bounds can be used to improve the time complexity of several known algorithms that rely on rectangular matrix multiplication. For example, we directly obtain a $O(n^{2.5302})$-time algorithm for the all-pairs shortest paths problem over directed graphs with small integer weights, where $n$ denotes the number of vertices, and also improve the time complexity of sparse square matrix multiplication.

...read moreread less

126 citations

Proceedings Article•10.1145/2312005.2312044•

Communication-optimal parallel algorithm for strassen's matrix multiplication

[...]

Grey Ballard¹, James Demmel¹, Olga Holtz¹, Benjamin Lipshitz¹, Oded Schwartz¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

25 Jun 2012

TL;DR: In this article, a new parallel algorithm based on Strassen's fast matrix multiplication algorithm is presented, which is communication-optimal and exhibits perfect strong scaling within the maximum possible range.

...read moreread less

Abstract: Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication. The algorithm outperforms all known parallel matrix multiplication algorithms, classical and Strassen-based, both asymptotically and in practice. A critical bottleneck in parallelizing Strassen's algorithm is the communication between the processors. Ballard, Demmel, Holtz, and Schwartz (SPAA '11) prove lower bounds on these communication costs, using expansion properties of the underlying computation graph. Our algorithm matches these lower bounds, and so is communication-optimal. It exhibits perfect strong scaling within the maximum possible range.Benchmarking our implementation on a Cray XT4, we obtain speedups over classical and Strassen-based algorithms ranging from 24% to 184% for a fixed matrix dimension n=94080, where the number of processors ranges from 49 to 7203.Our parallelization approach generalizes to other fast matrix multiplication algorithms.

...read moreread less

124 citations

Journal Article•10.1080/10508406.2011.611445•

When the Classroom Floor Becomes the Complex Plane: Addition and Multiplication as Ways of Bodily Navigation.

[...]

Ricardo Nemirovsky¹, Chris Rasmussen¹, George Sweeney¹, Megan Wawro²•Institutions (2)

San Diego State University¹, Virginia Tech²

01 Apr 2012-The Journal of the Learning Sciences

TL;DR: In this article, the authors contribute a perspective on mathematical embodied cognition consistent with a phenomenological understanding of perception and body motion, based on the analysis of 4 selected episodes in 1 session of an undergraduate mathematics class.

...read moreread less

Abstract: In this article we contribute a perspective on mathematical embodied cognition consistent with a phenomenological understanding of perception and body motion. It is based on the analysis of 4 selected episodes in 1 session of an undergraduate mathematics class. The theme of this particular class session was the geometric interpretation of the addition and multiplication of complex numbers. On the basis of these episodes, the article examines 2 conjectures: (a) The mathematical insights developed by an individual or a group are expressed in and constituted by perceptuo-motor activity, and (b) the learning of mathematical ideas is shaped in nondeterministic ways by the setting or learning environment.

...read moreread less

115 citations

Posted Content•

Communication-Optimal Parallel Algorithm for Strassen's Matrix Multiplication

[...]

Grey Ballard¹, James Demmel¹, Olga Holtz¹, Benjamin Lipshitz¹, Oded Schwartz¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

14 Feb 2012-arXiv: Data Structures and Algorithms

TL;DR: A new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication is obtained, and it exhibits perfect strong scaling within the maximum possible range.

...read moreread less

Abstract: Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication. The algorithm outperforms all known parallel matrix multiplication algorithms, classical and Strassen-based, both asymptotically and in practice. A critical bottleneck in parallelizing Strassen's algorithm is the communication between the processors. Ballard, Demmel, Holtz, and Schwartz (SPAA'11) prove lower bounds on these communication costs, using expansion properties of the underlying computation graph. Our algorithm matches these lower bounds, and so is communication-optimal. It exhibits perfect strong scaling within the maximum possible range. Benchmarking our implementation on a Cray XT4, we obtain speedups over classical and Strassen-based algorithms ranging from 24% to 184% for a fixed matrix dimension n=94080, where the number of nodes ranges from 49 to 7203. Our parallelization approach generalizes to other fast matrix multiplication algorithms.

...read moreread less

109 citations

Proceedings Article•10.1109/FOCS.2012.80•

Faster Algorithms for Rectangular Matrix Multiplication

[...]

François Le Gall¹•Institutions (1)

University of Tokyo¹

05 Apr 2012-arXiv: Data Structures and Algorithms

TL;DR: In this article, it was shown that α > 0.30298, which improved the previous record of α>0.29462 by Coppersmith and Winograd.

...read moreread less

Abstract: Let {\alpha} be the maximal value such that the product of an n x n^{\alpha} matrix by an n^{\alpha} x n matrix can be computed with n^{2+o(1)} arithmetic operations. In this paper we show that \alpha>0.30298, which improves the previous record \alpha>0.29462 by Coppersmith (Journal of Complexity, 1997). More generally, we construct a new algorithm for multiplying an n x n^k matrix by an n^k x n matrix, for any value k eq 1. The complexity of this algorithm is better than all known algorithms for rectangular matrix multiplication. In the case of square matrix multiplication (i.e., for k=1), we recover exactly the complexity of the algorithm by Coppersmith and Winograd (Journal of Symbolic Computation, 1990). These new upper bounds can be used to improve the time complexity of several known algorithms that rely on rectangular matrix multiplication. For example, we directly obtain a O(n^{2.5302})-time algorithm for the all-pairs shortest paths problem over directed graphs with small integer weights, improving over the O(n^{2.575})-time algorithm by Zwick (JACM 2002), and also improve the time complexity of sparse square matrix multiplication.

...read moreread less

101 citations

Journal Article•10.1109/TVLSI.2011.2158595•

Efficient FPGA Implementations of Point Multiplication on Binary Edwards and Generalized Hessian Curves Using Gaussian Normal Basis

[...]

Reza Azarderakhsh¹, Arash Reyhani-Masoleh¹•Institutions (1)

University of Western Ontario¹

01 Aug 2012-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This is the first FPGA implementation of point multiplication on binary Edwards and generalized Hessian curves represented by ω-coordinates, and it is demonstrated how parallelization in higher levels can be performed by full resource utilization of computing point addition and point-doubling formulas.

...read moreread less

Abstract: Efficient implementation of point multiplication is crucial for elliptic curve cryptographic systems. This paper presents the implementation results of an elliptic curve crypto-processor over binary fields GF(2m) on binary Edwards and generalized Hessian curves using Gaussian normal basis (GNB). We demonstrate how parallelization in higher levels can be performed by full resource utilization of computing point addition and point-doubling formulas for both binary Edwards and generalized Hessian curves. Then, we employ the ω-coordinate differential formulations for computing point multiplication. Using a lookup-table (LUT)-based pipelined and efficient digit-level GNB multiplier, we evaluate the LUT complexity and time-area tradeoffs of the proposed crypto-processor on an FPGA. We also compare the implementation results of point multiplication on these curves with the ones on the traditional binary generic curve. To the best of the authors' knowledge, this is the first FPGA implementation of point multiplication on binary Edwards and generalized Hessian curves represented by ω-coordinates.

...read moreread less

84 citations

Journal Article•10.1093/COMJNL/BXR119•

RNS-Based Elliptic Curve Point Multiplication for Massive Parallel Architectures

[...]

Samuel Antao¹, Jean-Claude Bajard², Leonel Sousa¹•Institutions (2)

Technical University of Lisbon¹, Pierre-and-Marie-Curie University²

01 May 2012-The Computer Journal

TL;DR: An experimental analysis of the scalability, based on OpenCL descriptions of the proposed algorithms, suggest that further advantage can be obtained from the proposed RNS approach for GPUs and EC curves supported by underlying finite fields of smaller size, regarding implementations on general purpose multi-cores.

...read moreread less

Abstract: Acceleration of cryptographic applications on massive parallel computing platforms, such as Graphic Processing Units (GPUs), becomes a real challenge concerning practical implementations. In this paper, we propose a parallel algorithm for Elliptic Curve (EC) point multiplication in order to compute EC cryptography on these platforms. The proposed approach relies on the usage of the Residue Number System (RNS) to extract parallelism on high-precision integer arithmetic. Results suggest a maximum throughput of 9827 EC multiplications per second and minimum latency of 29.2Â ms for a 224-bit underlying field, in a commercial Nvidia 285 GTX GPU. Performances up to an order of magnitude better in latency and 122% in throughput are achieved regarding other approaches reported in the related art. An experimental analysis of the scalability, based on OpenCL descriptions of the proposed algorithms, suggest that further advantage can be obtained from the proposed RNS approach for GPUs and EC curves supported by underlying finite fields of smaller size, regarding implementations on general purpose multi-cores.

...read moreread less

Book Chapter•10.1007/978-3-642-35999-6_11•

Efficient Implementation of Bilinear Pairings on ARM Processors

[...]

Gurleen Grewal¹, Reza Azarderakhsh¹, Patrick Longa², Shi Hu³, David Jao¹ - Show less +1 more•Institutions (3)

University of Waterloo¹, Microsoft², Stanford University³

15 Aug 2012

TL;DR: This paper investigates the efficient computation of the Optimal-Ate pairing over Barreto-Naehrig curves in software at different security levels on ARM processors, exploiting state-of-the-art techniques and proposing new optimizations to speed up the computation in the tower field and curve arithmetic.

...read moreread less

Abstract: As hardware capabilities increase, low-power devices such as smartphones represent a natural environment for the efficient implementation of cryptographic pairings Few works in the literature have considered such platforms despite their growing importance in a post-PC world In this paper, we investigate the efficient computation of the Optimal-Ate pairing over Barreto-Naehrig curves in software at different security levels on ARM processors We exploit state-of-the-art techniques and propose new optimizations to speed up the computation in the tower field and curve arithmetic In particular, we extend the concept of lazy reduction to inversion in extension fields, analyze an efficient alternative for the sparse multiplication used inside the Miller’s algorithm and reduce further the cost of point/line evaluation formulas in affine and projective homogeneous coordinates In addition, we study the efficiency of using M-type sextic twists in the pairing computation and carry out a detailed comparison between affine and projective coordinate systems Our implementations on various mass-market smartphones and tablets significantly improve the state-of-the-art of pairing computation on ARM-powered devices, outperforming by at least a factor of 37 the best previous results in the literature

...read moreread less

Proceedings Article•10.1109/INPAR.2012.6339602•

Efficient sparse matrix-vector multiplication on cache-based GPUs

[...]

Istvan R eguly¹, Michael B. Giles²•Institutions (2)

Pázmány Péter Catholic University¹, University of Oxford²

13 May 2012

TL;DR: This paper discusses efficient implementations of sparse matrix-vector multiplication on NVIDIA's Fermi architecture, the first to introduce conventional L1 caches to GPUs, and focuses on the compressed sparse row (CSR) format for developing general purpose code.

...read moreread less

Abstract: Sparse matrix-vector multiplication is an integral part of many scientific algorithms. Several studies have shown that it is a bandwidth-limited operation on current hardware. On cache-based architectures the main factors that influence performance are spatial locality in accessing the matrix, and temporal locality in re-using the elements of the vector.

...read moreread less

Proceedings Article•10.1109/ISCAS.2012.6272072•

Pipelined adder graph optimization for high speed multiple constant multiplication

[...]

Martin Kumm¹, Peter Zipf¹, Mathias Faust², Chip-Hong Chang²•Institutions (2)

University of Kassel¹, Nanyang Technological University²

20 May 2012

TL;DR: RPAG outperforms previous methods which are based on pipelining the solutions of conventional MCM algorithms and often produces better results compared to the prominent Hcub algorithm with minimal total AD constraint.

...read moreread less

Abstract: This paper addresses the direct optimization of pipelined adder graphs (PAGs) for high speed multiple constant multiplication (MCM). The optimization opportunities are described and a definition of the pipelined multiple constant multiplication (PMCM) problem is given. It is shown that the PMCM problem is a generalization of the MCM problem with limited adder depth (AD). A novel algorithm to solve the PMCM problem heuristically, called RPAG, is presented. RPAG outperforms previous methods which are based on pipelining the solutions of conventional MCM algorithms. A flexible cost evaluation is used which enables the optimization for FPGA or ASIC targets on high or low abstraction levels. Results for both technologies are given and compared with the most recent methods. Even for the special case of limited AD it is shown that RPAG often produces better results compared to the prominent Hcub algorithm with minimal total AD constraint.

...read moreread less

Journal Article•10.1109/TC.2010.276•

Block Recombination Approach for Subquadratic Space Complexity Binary Field Multiplication Based on Toeplitz Matrix-Vector Product

[...]

M.A. Hasan¹, Nicolas Méloni, A. H. Namin², Christophe Negre³•Institutions (3)

University of Waterloo¹, Advanced Micro Devices², Dali University³

01 Feb 2012-IEEE Transactions on Computers

TL;DR: This paper presents a new method for parallel binary finite field multiplication which results in subquadratic space complexity and shows that block recombination can be used for efficient implementation of the GHASH function of Galois Counter Mode (GCM).

...read moreread less

Abstract: In this paper, we present a new method for parallel binary finite field multiplication which results in subquadratic space complexity. The method is based on decomposing the building blocks of the Fan-Hasan subquadratic Toeplitz matrix-vector multiplier. We reduce the space complexity of their architecture by recombining the building blocks. In comparison to other similar schemes available in the literature, our proposal presents a better space complexity while having the same time complexity. We also show that block recombination can be used for efficient implementation of the GHASH function of Galois Counter Mode (GCM).

...read moreread less

Journal Article•10.1037/A0025056•

Retrieval-induced forgetting of arithmetic facts.

[...]

Jamie I. D. Campbell¹, Valerie A. Thompson¹•Institutions (1)

University of Saskatchewan¹

01 Jan 2012-Journal of Experimental Psychology: Learning, Memory and Cognition

TL;DR: The results support the view that addition and multiplication facts are stored in an interrelated semantic network and that RIF of competing addition facts is an intrinsic process of multiplication fact retrieval.

...read moreread less

Abstract: Retrieval-induced forgetting (RIF) is a widely studied phenomenon of human memory, but RIF of arithmetic facts remains relatively unexplored. In 2 experiments, we investigated RIF of simple addition facts (2 + 3 = 5) from practice of their multiplication counterparts (2 × 3 = 6). In both experiments, robust RIF expressed in response times occurred only for high-strength small-number addition facts with sums ≤ 10, indicating that RIF from multiplication practice was interference dependent. RIF of addition-fact memory was produced by multiplication retrieval (2 × 3 = ?) but not multiplication study (2 × 3 = 6), supporting an inhibitory mechanism of RIF in arithmetic memory. Finally, RIF occurred with multiplication practiced in word format (three × four) and addition tested later in digit format (3 + 4), which provides evidence that digit and written-word formats for arithmetic accessed a common semantic retrieval network. The results support the view that addition and multiplication facts are stored in an interrelated semantic network and that RIF of competing addition facts is an intrinsic process of multiplication fact retrieval.

...read moreread less

Patent•

Universal fpga/asic matrix-vector multiplication architecture

[...]

John D. Davis¹, Eric S. Chung¹, Srinidhi Kestur¹•Institutions (1)

Microsoft¹

14 Oct 2012

TL;DR: In this article, a hardware optimized sparse matrix representation referred to as the Compressed Variable-Length Bit Vector (CVBV) format is used to take advantage of the capabilities of FPGAs and reduce storage and bandwidth requirements across the matrices.

...read moreread less

Abstract: A universal single-bitstream FPGA library or ASIC implementation accelerates matrix-vector multiplication processing multiple matrix encodings including dense and multiple sparse formats. A hardware-optimized sparse matrix representation referred to herein as the Compressed Variable-Length Bit Vector (CVBV) format is used to take advantage of the capabilities of FPGAs and reduce storage and bandwidth requirements across the matrices compared to that typically achieved when using the Compressed Sparse Row (CSR) format in typical CPU- and GPU-based approaches. Also disclosed is a class of sparse matrix formats that are better suited for FPGA implementations than existing formats reducing storage and bandwidth requirements. A partitioned CVBV format is described to enable parallel decoding.

...read moreread less

Journal Article•10.5430/JCT.V2N1P10•

Children’s Conceptions of Area Measurement and Their Strategies for Solving Area Measurement Problems

[...]

Hsin-Mei E. Huang¹, Klaus G. Witz²•Institutions (2)

Taipei Municipal University of Education¹, University of Illinois at Urbana–Champaign²

23 Dec 2012-Journal of Combinatorial Theory, Series A

TL;DR: This paper investigated children's understanding of area measurement, including the concept of area and the area formula of a rectangle, as well as their strategic knowledge for solving area measurement problems, and found that the children who had a good understanding of the concepts of the area and formula exhibited competency in identifying geometric shapes, using formulas for determining areas, and self correcting mistakes.

...read moreread less

Proceedings Article•10.1109/ACCT.2012.43•

Design and Simulation of 32-Point FFT Using Radix-2 Algorithm for FPGA Implementation

[...]

Asmita Haveliya¹•Institutions (1)

Amity University¹

7 Jan 2012

TL;DR: The synthesis results show that the computation for calculating the 32-point Fast Fourier transform is efficient in terms of speed.

...read moreread less

Abstract: The Fast Fourier Transform (FFT) is one of the rudimentary operations in field of digital signal and image processing. Some of the very vital applications of the fast fourier transform include Signal analysis, Sound filtering, Data compression, Partial differential equations, Multiplication of large integers, Image filtering etc. Fast Fourier transform (FFT) is an efficient implementation of the discrete Fourier transform (DFT). This paper concentrates on the development of the Fast Fourier Transform (FFT), based on Decimation-In-Time (DIT) domain, Radix-2 algorithm, this paper uses VHDL as a design entity, and their Synthesis by Xilinx Synthesis Tool on Vertex kit has been done. The input of Fast Fourier transform has been given by a PS2 KEYBOARD using a test bench and output has been displayed using the waveforms on the Xilinx Design Suite 12.1. The synthesis results show that the computation for calculating the 32-point Fast Fourier transform is efficient in terms of speed.

...read moreread less

Journal Article•10.1007/S10649-011-9330-5•

The inverse relation between multiplication and division: Concepts, procedures, and a cognitive framework

[...]

Katherine M. Robinson¹, Jo-Anne LeFevre²•Institutions (2)

University of Regina¹, Carleton University²

01 Mar 2012-Educational Studies in Mathematics

TL;DR: This article reviewed research on children and adults' use of shortcut procedures that make use of the inverse relation between multiplication and division on two kinds of problems: inversion problems and associativity problems.

...read moreread less

Abstract: Researchers have speculated that children find it more difficult to acquire conceptual understanding of the inverse relation between multiplication and division than that between addition and subtraction. We reviewed research on children and adults’ use of shortcut procedures that make use of the inverse relation on two kinds of problems: inversion problems (e.g., $ {9} \times {24} \div {24} $) and associativity problems (e.g., $ {9} \times {24} \div {8} $). Both can be solved more easily if the division of the second and third numbers is performed before the multiplication of the first and second numbers. The findings we reviewed suggest that understanding and use of the inverse relation between multiplication and division develops relatively slowly and is difficult for both children and adults to implement in shortcut procedures if they are not flexible problem solvers. We use the findings to expand an existing model, highlight some similarities and differences in solvers’ use of conceptual knowledge across operations, and discuss educational implications of the findings.

...read moreread less

Book Chapter•10.1007/978-3-642-30023-3_15•

Automatic Differentiation Through the Use of Hyper-Dual Numbers for Second Derivatives

[...]

Jeffrey A. Fike¹, Juan J. Alonso¹•Institutions (1)

Stanford University¹

1 Jan 2012

TL;DR: One particular number system is developed, termed hyper-dual numbers, which produces exact first- and second-derivative information, which is demonstrated on an unstructured, parallel, unsteady Reynolds-Averaged Navier-Stokes solver.

...read moreread less

Abstract: Automatic Differentiation techniques are typically derived based on the chain rule of differentiation. Other methods can be derived based on the inherent mathematical properties of generalized complex numbers that enable first-derivative information to be carried in the non-real part of the number. These methods are capable of producing effectively exact derivative values. However, when second-derivative information is desired, generalized complex numbers are not sufficient. Higher-dimensional extensions of generalized complex numbers, with multiple non-real parts, can produce accurate second-derivative information provided that multiplication is commutative. One particular number system is developed, termed hyper-dual numbers, which produces exact first- and second-derivative information. The accuracy of these calculations is demonstrated on an unstructured, parallel, unsteady Reynolds-Averaged Navier-Stokes solver.

...read moreread less

Journal Article•10.1016/J.PHYSA.2012.03.018•

Effects of aspiration on public cooperation in structured populations

[...]

Han-Xin Yang¹, Zhihai Rong², Pei-Min Lu¹, Yong-Zhi Zeng¹•Institutions (2)

Fuzhou University¹, Donghua University²

01 Aug 2012-Physica A-statistical Mechanics and Its Applications

TL;DR: In this paper, the authors introduce a deterministic win-stay-lose-shift rule into the spatial public goods game, according to which a player will change its current strategy only if its payoff is below a predefined aspiration level.

...read moreread less

Abstract: We introduce a deterministic win-stay-lose-shift rule into the spatial public goods game, according to which a player will change its current strategy only if its payoff is below a predefined aspiration level. Simulation results on the square lattice and scale-free network indicate that the aspiration level greatly affects the evolution of cooperation. For small multiplication factors, the frequency of cooperation increases to 0.5 as the aspiration level increases. For large multiplication factors, intermediate levels of aspiration prove optimal for the successful evolution of public cooperation. Some qualitative analyses are provided to explain the above results. Besides, we have found that there exists a ping-pong vibration of cooperation at some specific values of multiplication factors and aspiration levels.

...read moreread less

Proceedings Article•10.1109/ACSSC.2012.6489164•

Imprecise arithmetic for low power image processing

[...]

Pietro Albicocco¹, Gian Carlo Cardarilli¹, Alberto Nannarelli, Massimo Petricca¹, Marco Re¹ - Show less +1 more•Institutions (1)

University of Rome Tor Vergata¹

1 Nov 2012

TL;DR: With the proposed “sloppy” operations, this work obtains a reduction in delay, area and power dissipation, and the error introduced is still acceptable for applications such as image processing.

...read moreread less

Abstract: Sometimes reducing the precision of a numerical processor, by introducing errors, can lead to significant performance (delay, area and power dissipation) improvements without compromising the overall quality of the processing. In this work, we show how to perform the two basic operations, addition and multiplication, in an imprecise manner by simplifying the hardware implementation. With the proposed “sloppy” operations, we obtain a reduction in delay, area and power dissipation, and the error introduced is still acceptable for applications such as image processing.

...read moreread less

Journal Article•10.5120/7466-0564•

Design of 4x4 bit Vedic Multiplier using EDA Tool

[...]

Pushpalata Verma

30 Jun 2012-International Journal of Computer Applications

TL;DR: A high speed 4x4 bit Vedic Multiplier (VM) based on Vertically & Crosswise method of Vedic mathematics, a general multiplication formulae equally applicable to all cases of multiplication is presented.

...read moreread less

Abstract: The need of high speed multiplier is increasing as the need of high speed processors are increasing. A Multiplier is one of the key hardware blocks in most fast processing system which is not only a high delay block but also a major source of power dissipation. A conventional processor requires substantially more hardware resources and processing time in the multiplication operation, rather than addition and subtraction. This paper presents a high speed 4x4 bit Vedic Multiplier (VM) based on Vertically & Crosswise method of Vedic mathematics, a general multiplication formulae equally applicable to all cases of multiplication. It is based on generating all partial products and their sum in one step. The coding is done in VHDL (Very High Speed Integrated Circuit Hardware Descriptive Language) while the synthesis and simulation is done using EDA (Electronic Design Automation) tool XilinxISE12.1i. The combinational path delay of 4x4 bit Vedic multiplier obtained after synthesis is compared with normal multipliers and found that the proposed Vedic multiplier circuit seems to have better performance in

...read moreread less

Journal Article•10.1109/TC.2011.78•

Efficient Hardware Implementation of Fp-Arithmetic for Pairing-Friendly Curves

[...]

Junfeng Fan¹, Frederik Vercauteren¹, Ingrid Verbauwhede¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 May 2012-IEEE Transactions on Computers

TL;DR: A new method to speed up IFp-arithmetic in hardware for pairing-friendly curves, such as the well-known Barreto-Naehrig (BN) curves, using Montgomery reduction in a polynomial ring combined with a coefficient reduction phase using a pseudo-Mersenne number is described.

...read moreread less

Abstract: This paper describes a new method to speed up IFp-arithmetic in hardware for pairing-friendly curves, such as the well-known Barreto-Naehrig (BN) curves. We explore the characteristics of the modulus defined by these curves and choose curve parameters such that IFp multiplication becomes more efficient. The proposed algorithm uses Montgomery reduction in a polynomial ring combined with a coefficient reduction phase using a pseudo-Mersenne number. As an application, we show that the performance of pairings on BN curves in hardware can be significantly improved, resulting in a factor 2.5 speedup compared with state-of-the-art hardware implementations.

...read moreread less

Posted Content•

Iwasawa theory of Heegner points on abelian varieties of GL_2 type

[...]

Benjamin Howard

28 Feb 2012-arXiv: Number Theory

TL;DR: In this paper, the authors generalized Perrin- Riou's Iwasawa main conjecture for Heegner points on elliptic curves to abelian varieties of GL2-type.

...read moreread less

Abstract: In an earlier paper the author proved one divisibility of Perrin- Riou's Iwasawa main conjecture for Heegner points on elliptic curves. In the present paper, that result is generalized to abelian varieties of GL2-type (i.e. abelian varieties with real multiplication defined over totally real fields) under the hypothesis that the abelian variety is associated to a Hilbert modular form via a construction of Zhang.

...read moreread less

Proceedings Article•10.1145/2159430.2159436•

High-performance sparse matrix-vector multiplication on GPUs for structured grid computations

[...]

Jeswin Godwin¹, Justin Holewinski¹, P. Sadayappan¹•Institutions (1)

Ohio State University¹

3 Mar 2012

TL;DR: A new sparse matrix storage format that takes advantage of the diagonal structure of matrices for stencil operations on structured grids, specifically optimize for the case of higher degrees of freedom, where formats such as DIA are forced to explicitly represent many zero elements in the sparse matrix.

...read moreread less

Abstract: In this paper, we address efficient sparse matrix-vector multiplication for matrices arising from structured grid problems with high degrees of freedom at each grid node. Sparse matrix-vector multiplication is a critical step in the iterative solution of sparse linear systems of equations arising in the solution of partial differential equations using uniform grids for discretization. With uniform grids, the resulting linear system Ax = b has a matrix A that is sparse with a very regular structure. The specific focus of this paper is on sparse matrices that have a block structure due to the large number of unknowns at each grid point. Sparse matrix storage formats such as Compressed Sparse Row (CSR) and Diagonal format (DIA) are not the most effective for such matrices.In this work, we present a new sparse matrix storage format that takes advantage of the diagonal structure of matrices for stencil operations on structured grids. Unlike other formats such as the Diagonal storage format (DIA), we specifically optimize for the case of higher degrees of freedom, where formats such as DIA are forced to explicitly represent many zero elements in the sparse matrix. We develop efficient sparse matrix-vector multiplication for structured grid computations on GPU architectures using CUDA [25].

...read moreread less

Book Chapter•10.1007/978-3-642-35416-8_5•

Multi-precision Multiplication for Public-Key Cryptography on Embedded Microprocessors

[...]

Hwajeong Seo¹, Howon Kim¹•Institutions (1)

Pusan National University¹

16 Aug 2012

TL;DR: This paper proposes a novel method, i.e., “consecutive operand caching”, which reduces the number of required load instructions by caching the operands and boosts the speed of multi-precision multiplication by 3.85%, as compared to previous best known results.

...read moreread less

Abstract: In this paper, we revisit the “operand caching” method for multi-precision multiplication, which reduces the number of required load instructions by caching the operands [6]. With the previous method, we can achieve high performance in terms of multiplication speed with modern micro-processors. However, this method does not provide full operand caching when changing the row of partial products. To overcome this problem, we propose a novel method, i.e., “consecutive operand caching”. We divide partial products and reconstruct them yielding common operands between previous and new partial products. Finally, we reduce the number of load instructions and boost the speed of multi-precision multiplication by 3.85%, as compared to previous best known results.

...read moreread less

Journal Article•10.1515/CRELLE.2011.149•

Canonical subgroups over Hilbert modular varieties

[...]

Eyal Z. Goren, Payman L Kassaei

01 Jan 2012-Crelle's Journal

TL;DR: Goren et al. as mentioned in this paper developed a theory of canonical subgroups for abelian modular varieties with real multiplication and obtained new results on the geometry of Hilbert modular varieties in positive characteristic.

...read moreread less

Abstract: We obtain new results on the geometry of Hilbert modular varieties in positive characteristic and morphisms between them. Using these results and methods of rigid geometry, we develop a theory of canonical subgroups for abelian varieties with real multiplication. To cite this article: E.Z. Goren, P.L Kassaei, C. R. Acad. Sci. Paris, Ser. I 347 (2009).

...read moreread less

Notes on the Truncated Fourier Transform

[...]

Joris van der Hoeven

1 Jan 2012

TL;DR: In this paper, a truncated version of the classical Fast Fourier Transform (TFT) was introduced for polynomial multiplication with real coefficients, which has the nice property of eliminating the jumps in the complexity at powers of two.

...read moreread less

Abstract: In a previous paper [vdH04], we introduced a truncated version of the classical Fast Fourier Transform. When applied to polynomial multiplication, this algorithm has the nice property of eliminating the “jumps” in the complexity at powers of two. When applied to the multiplication of multivariate polynomials or truncated multivariate power series, a non-trivial asymptotic factor was gained with respect to the best previously known algorithms. In the present note, we correct two errors which slipped into the previous paper and we give a new application to the multiplication of polynomials with real coefficients. We also give some further hints on how to implement the TFT in practice.

...read moreread less

...

Expand