Top 439 papers published in the topic of Multiplication in 2015

Showing papers on "Multiplication published in 2015"

Posted Content•

Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems

[...]

Colin Raffel, Daniel P. W. Ellis¹•Institutions (1)

29 Dec 2015-arXiv: Learning

TL;DR: A simplified model of attention is proposed which is applicable to feed-forward neural networks and can solve the synthetic "addition" and "multiplication" long-term memory problems for sequence lengths which are both longer and more widely varying than the best published results for these tasks.

...read moreread less

Abstract: We propose a simplified model of attention which is applicable to feed-forward neural networks and demonstrate that the resulting model can solve the synthetic "addition" and "multiplication" long-term memory problems for sequence lengths which are both longer and more widely varying than the best published results for these tasks.

...read moreread less

355 citations

Journal Article•10.1016/J.INS.2014.08.024•

The arithmetic of discrete Z-numbers

[...]

Rafik A. Aliev, Akif V. Alizadeh, Oleg H. Huseynov¹•Institutions (1)

Azerbaijan State Oil Academy¹

01 Jan 2015-Information Sciences

TL;DR: In this article, the main critical problem that naturally arises in processing Z-number-based information is computation with Z-numbers, which is a more adequate concept for description of real-world information.

...read moreread less

301 citations

Book Chapter•10.1007/978-3-662-46497-7_20•

Graph-Induced Multilinear Maps from Lattices

[...]

Craig Gentry¹, Sergey Gorbunov², Shai Halevi¹•Institutions (2)

IBM¹, Massachusetts Institute of Technology²

23 Mar 2015

TL;DR: In this article, a graph-induced multilinear encoding scheme from lattices was proposed, in which the arithmetic operations that are allowed are restricted through an explicitly defined directed graph (somewhat similar to the asymmetric variant of previous schemes).

...read moreread less

Abstract: Graded multilinear encodings have found extensive applications in cryptography ranging from non-interactive key exchange protocols, to broadcast and attribute-based encryption, and even to software obfuscation. Despite seemingly unlimited applicability, essentially only two candidate constructions are known (GGH and CLT). In this work, we describe a new graph-induced multilinear encoding scheme from lattices. In a graph-induced multilinear encoding scheme the arithmetic operations that are allowed are restricted through an explicitly defined directed graph (somewhat similar to the “asymmetric variant” of previous schemes). Our construction encodes Learning With Errors (LWE) samples in short square matrices of higher dimensions. Addition and multiplication of the encodings corresponds naturally to addition and multiplication of the LWE secrets. Security of the new scheme is not known to follow from LWE hardness (or any other “nice” assumption), at present it requires making new hardness assumptions.

...read moreread less

299 citations

Proceedings Article•10.1145/2746539.2746609•

Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture

[...]

Monika Henzinger¹, Sebastian Krinninger¹, Danupon Nanongkai², Thatchaphol Saranurak²•Institutions (2)

University of Vienna¹, Royal Institute of Technology²

14 Jun 2015

TL;DR: In this article, it was shown that there is no truly subcubic (O(n3-e) time algorithm for the online Boolean matrix-vector multiplication problem.

...read moreread less

Abstract: Consider the following Online Boolean Matrix-Vector Multiplication problem: We are given an n x n matrix M and will receive n column-vectors of size n, denoted by v1, ..., vn, one by one. After seeing each vector vi, we have to output the product Mvi before we can see the next vector. A naive algorithm can solve this problem using O(n3) time in total, and its running time can be slightly improved to O(n3/log2 n) [Williams SODA'07]. We show that a conjecture that there is no truly subcubic (O(n3-e)) time algorithm for this problem can be used to exhibit the underlying polynomial time hardness shared by many dynamic problems. For a number of problems, such as subgraph connectivity, Pagh's problem, d-failure connectivity, decremental single-source shortest paths, and decremental transitive closure, this conjecture implies tight hardness results. Thus, proving or disproving this conjecture will be very interesting as it will either imply several tight unconditional lower bounds or break through a common barrier that blocks progress with these problems. This conjecture might also be considered as strong evidence against any further improvement for these problems since refuting it will imply a major breakthrough for combinatorial Boolean matrix multiplication and other long-standing problems if the term "combinatorial algorithms" is interpreted as "Strassen-like algorithms" [Ballard et al. SPAA'11].The conjecture also leads to hardness results for problems that were previously based on diverse problems and conjectures -- such as 3SUM, combinatorial Boolean matrix multiplication, triangle detection, and multiphase -- thus providing a uniform way to prove polynomial hardness results for dynamic algorithms; some of the new proofs are also simpler or even become trivial. The conjecture also leads to stronger and new, non-trivial, hardness results, e.g., for the fully-dynamic densest subgraph and diameter problems.

...read moreread less

277 citations

Journal Article•10.1145/2699470•

Optimizing Sparse Matrix—Matrix Multiplication for the GPU

[...]

Steven Dalton¹, Luke N. Olson¹, Nathan Bell²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Google²

26 Oct 2015-ACM Transactions on Mathematical Software

TL;DR: The implementation is fully general and the optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively.

...read moreread less

Abstract: Sparse matrix--matrix multiplication (SpGEMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpGEMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpGEMM operation into three highly parallel phases: expansion, sorting, and contraction, and introduce a set of complementary bandwidth-saving performance optimizations. Our implementation is fully general and our optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively.

...read moreread less

146 citations

Journal Article•10.1137/15M104253X•

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

[...]

Ariful Azad¹, Grey Ballard², Aydin Buluc¹, James Demmel, Laura Grigori, Oded Schwartz³, Sivan Toledo, Samuel Williams - Show less +4 more•Institutions (3)

Lawrence Berkeley National Laboratory¹, Sandia National Laboratories², Hebrew University of Jerusalem³

03 Oct 2015-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors present the first implementation of the 3D SpGEMM formulation that exploits multiple (intra-node and inter-node) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrency.

...read moreread less

Abstract: Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first ever implementation of the 3D SpGEMM formulation that also exploits multiple (intra-node and inter-node) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.

...read moreread less

131 citations

Journal Article•10.1007/S10207-014-0271-8•

Secure floating point arithmetic and private satellite collision analysis

[...]

Liina Kamm¹, Jan Willemson²•Institutions (2)

University of Tartu¹, Cybernetica²

01 Nov 2015-International Journal of Information Security

TL;DR: In this paper, it is shown that it is possible and indeed feasible to use secure multiparty computation (SMC) for calculating the probability of a collision between two satellites, using basic floating point arithmetic operators (addition and multiplication) for multiparty computations.

...read moreread less

Abstract: In this paper, we show that it is possible and, indeed, feasible to use secure multiparty computation (SMC) for calculating the probability of a collision between two satellites. For this purpose, we first describe basic floating point arithmetic operators (addition and multiplication) for multiparty computations. The operators are implemented on the $${\textsc {Sharemind}}$$SHAREMIND SMC engine. We discuss the implementation details, provide methods for evaluating example elementary functions (inverse, square root, exponentiation of $$e$$e, error function). Using these primitives, we implement a satellite conjunction analysis algorithm and give benchmark results for the primitives as well as the conjunction analysis itself.

...read moreread less

116 citations

Journal Article•10.1137/130948811•

GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

[...]

Felix Gremse, Andreas Höfter, Lars Ole Schwen, Fabian Kiessling, Uwe Naumann - Show less +1 more

22 Jan 2015-SIAM Journal on Scientific Computing

TL;DR: An algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs, is presented, implemented by iterative row merging, similar to merge sort.

...read moreread less

Abstract: We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs. SpGEMM is implemented by iterative row merging, similar to merge sort, exc...

...read moreread less

82 citations

Patent•

Multiplication operations in memory

[...]

Sanjay Tiwari¹•Institutions (1)

Micron Technology¹

24 Aug 2015

TL;DR: In this paper, the authors provide apparatuses and methods for performing multi-variable bit-length multiplication operations in a memory array, including AND operations, OR operations and shift operations without transferring data via an input/output (IO) line.

...read moreread less

Abstract: Examples of the present disclosure provide apparatuses and methods for performing multi-variable bit-length multiplication operations in a memory An example method comprises performing a multiplication operation on a first vector and a second vector The first vector includes a number of first elements stored in a group of memory cells coupled to a first access line and a number of sense lines of a memory array The second vector includes a number of second elements stored in a group of memory cells coupled to a second access line and the number of sense lines of the memory array The example multiplication operation can include performing a number of AND operations, OR operations and SHIFT operations without transferring data via an input/output (I/O) line

...read moreread less

73 citations

Patent•

Secure Computation Using a Server Module

[...]

Mariana Raykova¹, Seny Kamara¹•Institutions (1)

Microsoft¹

19 Oct 2015

TL;DR: In this article, a server module evaluates a circuit based on concealed inputs provided by respective participant modules, to provide a concealed output, so that no party to the transaction (including the sever module) discovers any other party's non-concealed inputs.

...read moreread less

Abstract: A server module evaluates a circuit based on concealed inputs provided by respective participant modules, to provide a concealed output. By virtue of this approach, no party to the transaction (including the sever module) discovers any other party's non-concealed inputs. In a first implementation, the server module evaluates a garbled Boolean circuit. This implementation also uses a three-way oblivious transfer technique to provide a concealed input from one of the participant modules to the serer module. In a second implementation, the server module evaluates an arithmetic circuit based on ciphertexts that have been produced using a fully homomorphic encryption technique. This implementation modifies multiplication operations that are performed in the evaluation of the arithmetic circuit by a modifier factor; this removes bounds placed on the number of the multiplication operations that can be performed.

...read moreread less

69 citations

Journal Article•10.1109/TVLSI.2014.2355854•

High-Throughput Modular Multiplication and Exponentiation Algorithms Using Multibit-Scan–Multibit-Shift Technique

[...]

Abdalhossein Rezai¹, Parviz Keshavarzi¹•Institutions (1)

Semnan University¹

01 Sep 2015-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A new and efficient Montgomery modular multiplication architecture based on a new digit serial computation that relaxes the high-radix partial multiplication to a binary multiplication and performs several multiplications of consecutive zero bits in one clock cycle instead of several clock cycles is presented.

...read moreread less

Abstract: Modular exponentiation with a large modulus and exponent is a fundamental operation in many public-key cryptosystems. This operation is usually accomplished by repeating modular multiplications. Montgomery modular multiplication has been widely used to relax the quotient determination. The carry–save adder has been employed to reduce the critical path. This paper presents and evaluates a new and efficient Montgomery modular multiplication architecture based on a new digit serial computation. The proposed architecture relaxes the high-radix partial multiplication to a binary multiplication. It also performs several multiplications of consecutive zero bits in one clock cycle instead of several clock cycles. Moreover, the right-to-left and left-to-right modular exponentiation architectures have been modified to use the proposed modular multiplication architecture as its structural unit. We provide the implementation results on a Xilinx Virtex 5 FPGA demonstrating that the total computation time and throughput rate of the proposed architectures outperform most results so far in the literatures.

...read moreread less

Journal Article•10.1016/J.PARCO.2015.04.004•

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

[...]

Weifeng Liu¹, Brian Vinter¹•Institutions (1)

University of Copenhagen¹

24 Apr 2015-arXiv: Mathematical Software

TL;DR: In this paper, a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor is proposed, where the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector.

...read moreread less

Abstract: Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at this https URL

...read moreread less

Journal Article•10.1109/TPDS.2014.2323062•

Parallel and High-Speed Computations of Elliptic Curve Cryptography Using Hybrid-Double Multipliers

[...]

Reza Azarderakhsh¹, Arash Reyhani-Masoleh²•Institutions (2)

Rochester Institute of Technology¹, University of Western Ontario²

01 Jun 2015-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper proposes efficient and high speed architectures to implement point multiplication on binary Edwards and generalized Hessian curves and employs a newly proposed digit-level hybrid-double Gaussian normal basis multiplier to reduce the latency of point multiplication.

...read moreread less

Abstract: High-performance and fast implementation of point multiplication is crucial for elliptic curve cryptographic systems. Recently, considerable research has investigated the implementation of point multiplication on different curves over binary extension fields. In this paper, we propose efficient and high speed architectures to implement point multiplication on binary Edwards and generalized Hessian curves. We perform a data-flow analysis and investigate maximum number of parallel multipliers to be employed to reduce the latency of point multiplication on these curves. Then, we modify the addition and doubling formulations and employ a newly proposed digit-level hybrid-double Gaussian normal basis multiplier to remove the data dependencies and hence reduce the latency of point multiplication. To the best of our knowledge, this is the first time that one employs hybrid-double multiplication technique to reduce the computation time of point multiplication. Moreover, we have implemented our proposed architectures for point multiplication on FPGA and obtained the results of timing and area. Our results indicate that the proposed scheme is one step forward to improve the performance of point multiplication on binary Edward and generalized Hessian curves.

...read moreread less

Journal Article•

Efficient Ring-LWE Encryption on 8-bit AVR Processors

[...]

Zhe Liu¹, Hwajeong Seo¹, Sujoy Sinha Roy², Johann Großschädl², Howon Kim³, Ingrid Verbauwhede³ - Show less +2 more•Institutions (3)

University of Luxembourg¹, Pusan National University², Katholieke Universiteit Leuven³

01 Jan 2015-Lecture Notes in Computer Science

TL;DR: In this paper, a carefully optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128 was presented, which achieved a speedup of 590 k, 672 k, and 276 k clock cycles for key generation, encryption, and decryption, respectively.

...read moreread less

Abstract: Public-key cryptography based on the “ring-variant” of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a carefully-optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128. Our research contributions include several optimizations for the Number Theoretic Transform (NTT) used for polynomial multiplication. More concretely, we describe the Move-and-Add (MA) and the Shift-Add-Multiply-Subtract-Subtract (SAMS2) technique to speed up the performance-critical multiplication and modular reduction of coefficients, respectively. We take advantage of incompletely-reduced intermediate results to minimize the total number of reduction operations and use a special coefficient-storage method to decrease the RAM footprint of NTT multiplications. In addition, we propose a byte-wise scanning strategy to improve the performance of a discrete Gaussian sampler based on the Knuth-Yao random walk algorithm. For medium-term security, our ring-LWE implementation needs 590 k, 672 k, and 276 k clock cycles for key-generation, encryption, and decryption, respectively. On the other hand, for long-term security, the execution time of key-generation, encryption, and decryption amount to 2.2 M, 2.6 M, and 686 k cycles, respectively. These results set new speed records for ring-LWE encryption on an 8-bit processor and outperform related RSA and ECC implementations by an order of magnitude.

...read moreread less

Journal Article•10.1109/TVLSI.2014.2375640•

Scalable Elliptic Curve Cryptosystem FPGA Processor for NIST Prime Curves

[...]

Kung Chi Cinnati Loi¹, Seok-Bum Ko¹•Institutions (1)

University of Saskatchewan¹

05 Jan 2015-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The architecture and the implementation of a high-performance scalable elliptic curve cryptography processor (ECP) that can support all five NIST recommended prime curves without the need to reconfigure the hardware is presented.

...read moreread less

Abstract: The architecture and the implementation of a high-performance scalable elliptic curve cryptography processor (ECP) are presented. The proposed ECP is able to support all five prime field elliptic curves recommended by the National Institute of Standards and Technology (NIST). The design takes advantage of the high-performance capabilities of the DSP48E slices available in Xilinx field-programmable gate arrays (FPGAs) to achieve high speed and low hardware resource utilization. The proposed design parallelizes the underlying prime field operations to reduce the latency of the elliptic curve point multiplication (ECPM) operation. Prime field inversion is performed efficiently using the same arithmetic blocks as the ones used for prime field multiplication and addition/subtraction. To the best of the authors' knowledge, the proposed scalable ECP is the fastest and smallest ECP that can support all five NIST recommended prime curves without the need to reconfigure the hardware. It can compute the ECPM between 1.709 and 28.04 ms using a Xilinx Virtex-5 FPGA.

...read moreread less

Posted Content•

Efficient Ring-LWE Encryption on 8-bit AVR Processors.

[...]

Zhe Liu¹, Hwajeong Seo¹, Sujoy Sinha Roy², Johann Großschädl², Howon Kim³, Ingrid Verbauwhede³ - Show less +2 more•Institutions (3)

University of Luxembourg¹, Pusan National University², Katholieke Universiteit Leuven³

01 Jan 2015-IACR Cryptology ePrint Archive

TL;DR: In this article, a carefully optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128 was presented, which achieved a speedup of 590 k, 672 k, and 276 k clock cycles for key generation, encryption, and decryption, respectively.

...read moreread less

Proceedings Article•10.1109/BIOCAS.2015.7348414•

High-dimensional computing with sparse vectors

[...]

Mika Laiho¹, Jussi Poikonen¹, Pentti Kanerva², Eero Lehtonen¹•Institutions (2)

University of Turku¹, University of California, Berkeley²

1 Oct 2015

TL;DR: It is shown that the HRR operations of addition, multiplication, and permutation can be realized with sparse vectors, making an energy-efficient implementation possible and proposing a processor that has both data and instructions embedded in the same high-dimensional vector.

...read moreread less

Abstract: Computing with high-dimensional vectors in a manner that resembles computing with numbers is based on Plate's Holographic Reduced Representation (HRR) and is used to model human cognition. Here we examine its hardware realization under constraints suggested by the properties of the brain's circuits. The sparseness of neural firing suggests that the vectors should be sparse. We show that the HRR operations of addition, multiplication, and permutation can be realized with sparse vectors, making an energy-efficient implementation possible. Furthermore, we propose a processor that has both data and instructions embedded in the same high-dimensional vector. The operation is highlighted with a sequence memory example.

...read moreread less

Journal Article•10.1016/J.NEUCOM.2014.01.070•

Distributed Extreme Learning Machine with kernels based on MapReduce

[...]

Xin Bi¹, Xiangguo Zhao¹, Guoren Wang¹, Pan Zhang¹, Chao Wang¹ - Show less +1 more•Institutions (1)

Northeastern University (China)¹

03 Feb 2015-Neurocomputing

TL;DR: This paper proposes a distributed solution named Distributed Kernelized ELM (DK-ELM), which realizes an implementation of ELM with kernels on MapReduce, and experimental results show that DK- ELM has good scalability for massive learning applications.

...read moreread less

Journal Article•10.1109/TCSI.2014.2348072•

Novel Design Algorithm for Low Complexity Programmable FIR Filters Based on Extended Double Base Number System

[...]

Jiajia Chen¹, Chip-Hong Chang², Feng Feng¹, Weiao Ding¹, Jiatao Ding¹ - Show less +1 more•Institutions (2)

Singapore University of Technology and Design¹, Nanyang Technological University²

01 Jan 2015-IEEE Transactions on Circuits and Systems

TL;DR: This paper presents a new design paradigm for the programmable FIR filters by exploiting the extended double base number system (EDBNS) due to its sparsity and innate abstraction of the sum of binary shifted partial products, which can be maximized by a direct mapping from the quasi-minimum EDBNS.

...read moreread less

Abstract: Coefficient multipliers are the stumbling blocks in programmable finite impulse response (FIR) digital filters. As the filter coefficients change either dynamically or periodically, the search for common subexpressions for multiplierless implementation needs to be performed over the entire gamut of integers of the desired precision, and the amount of shifts associated with each identified common subexpression needs to be memorized. The complexity of a quality search is thus beyond the existing design algorithms based on conventional binary and signed digit representations. This paper presents a new design paradigm for the programmable FIR filters by exploiting the extended double base number system (EDBNS). Due to its sparsity and innate abstraction of the sum of binary shifted partial products, the sharing of adders in the time-multiplexed multiple constant multiplication block of the programmable FIR filters can be maximized by a direct mapping from the quasi-minimum EDBNS. The multiplexing cost can be further reduced by merging double base terms. Logic synthesis results on more than one hundred programmable filters with filter taps ranging from 10 to 100 and coefficient word lengths of 8, 12, and 16 bits show that the average logic complexity and critical path delay of the programmable FIR filters designed by our proposed algorithm have been reduced by up to 47.81% and 14.32%, respectively over the existing design methods.

...read moreread less

Journal Article•10.1016/J.CPC.2015.06.003•

The MIXMAX random number generator

[...]

Konstantin G. Savvidy¹, Konstantin G. Savvidy²•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, Nanjing University²

01 Nov 2015-Computer Physics Communications

TL;DR: This paper provides a solution to the problem of determining the maximal period of unimodular matrix generators of pseudo-random numbers, formulate the necessary and sufficient condition to attain the maximum period, and presents a family of specific generators in the MIXMAX family with superior performance and excellent statistical properties.

...read moreread less

Journal Article•10.1307/MMJ/1427203284•

Compact bilinear commutators: the weighted case

[...]

Árpád Bényi¹, Wendolín Damián², Kabe Moen³, Rodolfo H. Torres⁴•Institutions (4)

Western Washington University¹, University of Seville², University of Alabama³, University of Kansas⁴

01 Mar 2015-Michigan Mathematical Journal

TL;DR: In this article, the bilinear Calderon-Zygmund operators and multiplication by functions in a certain subspace of the space of functions of bounded mean oscillations are shown to be compact on appropriate products of weighted Lebesgue spaces.

...read moreread less

Abstract: Commutators of bilinear Calder\'on-Zygmund operators and multiplication by functions in a certain subspace of the space of functions of bounded mean oscillations are shown to be compact on appropriate products of weighted Lebesgue spaces.

...read moreread less

Journal Article•10.1016/J.FFA.2014.10.008•

A survey of some recent bit-parallel GF ( 2 n ) multipliers

[...]

Haining Fan¹, M. Anwar Hasan²•Institutions (2)

Tsinghua University¹, University of Waterloo²

01 Mar 2015-Finite Fields and Their Applications

TL;DR: This paper surveys bit-parallel multipliers for finite field GF according to quadratic and subquadratic arithmetic complexities of the underlying algorithms, various bases used for representing the field elements, and design approaches that rely on polynomial and matrix operations.

...read moreread less

Proceedings Article•10.1145/2755996.2756653•

Output-Sensitive Algorithms for Sumset and Sparse Polynomial Multiplication

[...]

Andrew Arnold¹, Daniel S. Roche²•Institutions (2)

University of Waterloo¹, United States Naval Academy²

24 Jun 2015

TL;DR: In this paper, the sumset (Minkowski sum) of two integer sets is computed by sparse interpolation algorithms and results from analytic number theory, which is used as part of the sparse multiplication algorithm.

...read moreread less

Abstract: We present randomized algorithms to compute the sumset (Minkowski sum) of two integer sets, and to multiply two univariate integer polynomials given by sparse representations. Our algorithm for sumset has cost softly linear in the combined size of the inputs and output. This is used as part of our sparse multiplication algorithm, whose cost is softly linear in the combined size of the inputs, output, and the sumset of the supports of the inputs. As a subroutine, we present a new method for computing the coefficients of a sparse polynomial, given a set containing its support. Our multiplication algorithm extends to multivariate Laurent polynomials over finite fields and rational numbers. Our techniques are based on sparse interpolation algorithms and results from analytic number theory.

...read moreread less

Journal Article•10.1016/J.PARCO.2015.04.004•

Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors

[...]

Weifeng Liu¹, Brian Vinter¹•Institutions (1)

University of Copenhagen¹

1 Nov 2015

TL;DR: In this paper, a speculative segmented sum strategy for the CSR-based SpMV algorithm is proposed, where the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector.

...read moreread less

Abstract: A speculative segmented sum strategy for the CSR-based SpMV.Utilizing both GPU cores and CPU cores in a heterogeneous processor.No format conversion or tuning overhead for input sparse matrices in the CSR format.High speedup over the CSR-vector algorithm running irregular matrices.No performance penalty for most regular matrices. Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect result. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms.

...read moreread less

Proceedings Article•10.1109/IPDPSW.2015.77•

Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU

[...]

Carl Yang¹, Yangzihao Wang¹, John D. Owens¹•Institutions (1)

University of California, Davis¹

25 May 2015

TL;DR: A promising algorithm for sparse-matrix sparse-vector multiplication (SpMSpV) on the GPU is implemented and the scalability of three approaches -- no sorting, merge sorting, and radix sorting -- in solving this problem is examined.

...read moreread less

Abstract: We implement a promising algorithm for sparse-matrix sparse-vector multiplication (SpMSpV) on the GPU. An efficient k-way merge lies at the heart of finding a fast parallel SpMSpV algorithm. We examine the scalability of three approaches -- no sorting, merge sorting, and radix sorting -- in solving this problem. For breadth-first search (BFS), we achieve a 1.26x speedup over state-of-the-art sparse-matrix dense-vector (SpMV) implementations. The algorithm seems generalize able for single-source shortest path (SSSP) and sparse-matrix sparse-matrix multiplication, and other core graph primitives such as maximal independent set and bipartite matching.

...read moreread less

Journal Article•10.1007/S13389-015-0093-2•

Multiprecision multiplication on AVR revisited

[...]

Michael Hutter¹, Peter Schwabe²•Institutions (2)

Cryptography Research¹, Radboud University Nijmegen²

14 Apr 2015-Journal of Cryptographic Engineering

TL;DR: In this article, the authors present new speed records for multiprecision multiplication on the AVR ATmega family of 8-bit microcontrollers, achieving a speedup of 1,969 cycles for the multiplication of two 160-bit integers.

...read moreread less

Abstract: This paper presents new speed records for multiprecision multiplication on the AVR ATmega family of 8-bit microcontrollers. For example, our software takes only 1,969 cycles for the multiplication of two 160-bit integers; this is more than 15 % faster than that demonstrated in previous work. For 256-bit inputs, our software is not only the first to break through the 6,000-cycle barrier; with only 4,771 cycles it also breaks through the 5,000-cycle barrier and is more than 21 % faster than previous work. We achieve these speed records by carefully optimizing the Karatsuba multiplication technique for AVR ATmega. One might expect that subquadratic-complexity Karatsuba multiplication is only faster than algorithms with quadratic complexity for large inputs. This paper shows that it is in fact faster than fully unrolled product-scanning multiplication already for surprisingly small inputs, starting at 48 bits. Our results thus make Karatsuba multiplication the method of choice for high-performance implementations of elliptic-curve cryptography on AVR ATmega microcontrollers.

...read moreread less

Journal Article•10.1039/C4SC02930E•

Ternary DNA computing using 3 × 3 multiplication matrices

[...]

Ron Orbach¹, Sivan Lilienthal¹, Michael Klein¹, Raphael D. Levine¹, Françoise Remacle², Itamar Willner¹ - Show less +2 more•Institutions (2)

Hebrew University of Jerusalem¹, University of Liège²

19 Jan 2015-Chemical Science

TL;DR: In this paper, the use of three-valued oligonucleotide inputs to construct a 3 × 3 multiplication table was demonstrated using DNA as a functional material for ternary computing, and in particular, the system consisted of two threevalued inputs of −1, 0, +1 and a fluorophore/quencher functional hairpin acting as computational and reporter module.

...read moreread less

Abstract: Non-Boolean computations implementing operations on multi-valued variables beyond base 2 allow enhanced computational complexity. We introduce DNA as a functional material for ternary computing, and in particular demonstrate the use of three-valued oligonucleotide inputs to construct a 3 × 3 multiplication table. The system consists of two three-valued inputs of −1; 0; +1 and a fluorophore/quencher functional hairpin acting as computational and reporter module. The interaction of the computational hairpin module with the different values of the inputs yields a 3 × 3 multiplication matrix consisting of nine nanostructures that are read out by three distinct fluorescence intensities. By combining three different hairpin computational modules, each modified with a different fluorophore/quencher pair, and using different sets of inputs, the parallel operation of three multiplication tables is demonstrated.

...read moreread less

Proceedings Article•10.1145/2755573.2755613•

Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication

[...]

Grey Ballard¹, Alex Druinsky², Nicholas Knight³, Oded Schwartz⁴•Institutions (4)

Sandia National Laboratories¹, Lawrence Berkeley National Laboratory², University of California, Berkeley³, Hebrew University of Jerusalem⁴

13 Jun 2015

TL;DR: In this paper, the authors characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure.

...read moreread less

Abstract: The performance of parallel algorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computation to improve application-specific algorithms for multiplying sparse matrices.

...read moreread less

Proceedings Article•10.1145/2833179.2833186•

Scalable Task-Based Algorithm for Multiplication of Block-Rank-Sparse Matrices

[...]

Justus A. Calvin¹, Cannada A. Lewis¹, Edward F. Valeev¹•Institutions (1)

Virginia Tech¹

01 Sep 2015-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, a task-based formulation of Scalable Universal Matrix Multiplication Algorithm (SUMMA) is applied to the multiplication of hierarchy-free, rank-structured matrices that appear in the domain of quantum chemistry (QC).

...read moreread less

Abstract: A task-based formulation of Scalable Universal Matrix Multiplication Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is applied to the multiplication of hierarchy-free, rank-structured matrices that appear in the domain of quantum chemistry (QC). The novel features of our formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and (2) fine-grained task-based composition. These features make it tolerant of the load imbalance due to the irregular matrix structure and eliminate all artifactual sources of global synchronization.Scalability of iterative computation of square-root inverse of block-rank-sparse QC matrices is demonstrated; for full-rank (dense) matrices the performance of our SUMMA formulation usually exceeds that of the state-of-the-art dense MM implementations (ScaLAPACK and Cyclops Tensor Framework).

...read moreread less

Journal Article•10.1007/S11858-015-0675-6•

Natural number bias in operations with missing numbers

[...]

Konstantinos P. Christou¹•Institutions (1)

University of Western Macedonia¹

05 Feb 2015-Zdm

TL;DR: It is argued that knowledge about operations between natural numbers needs to be inhibited for students to overcome the natural number bias and to reason with numbers beyond the scope of natural numbers.

...read moreread less

Abstract: This study investigates the hypothesis that there is a natural number bias that influences how students understand the effects of arithmetical operations involving both Arabic numerals and numbers that are represented by symbols for missing numbers. It also investigates whether this bias correlates with other aspects of students’ understanding of the number concept beyond natural numbers. Natural number bias has been characterized as the interference of natural number knowledge in reasoning about non-natural numbers. Quantitative data is presented showing that in the case of operations between numbers and missing numbers this bias acts in two main ways. First, it shapes students’ anticipations about the expected outcome of each operation, that is, that the result of addition or multiplication “must” be bigger than the initial numbers and the result of subtraction or division “must” be smaller. Second, it causes students to think that missing numbers stand mostly for natural numbers; this tendency would lead students to make decisions about the general results of operations by substituting only natural numbers for the missing number symbols. It is argued that knowledge about operations between natural numbers needs to be inhibited for students to overcome the natural number bias and to reason with numbers beyond the scope of natural numbers.

...read moreread less

...

Expand