Top 428 papers published in the topic of Multiplication in 2016

Showing papers on "Multiplication published in 2016"

Proceedings Article•10.1145/2897937.2898010•

Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication

[...]

Miao Hu¹, John Paul Strachan¹, Zhiyong Li¹, Emmanuelle J. Merced Grafals¹, Noraica Davila¹, Catherine Graves¹, Si-Ty Lam¹, Ning Ge, Jianhua Yang², R. Stanley Williams¹ - Show less +6 more•Institutions (2)

Hewlett-Packard¹, University of Massachusetts Amherst²

5 Jun 2016

TL;DR: The Dot-Product Engine (DPE) is developed as a high density, high power efficiency accelerator for approximate matrix-vector multiplication, invented a conversion algorithm to map arbitrary matrix values appropriately to memristor conductances in a realistic crossbar array.

...read moreread less

Abstract: Vector-matrix multiplication dominates the computation time and energy for many workloads, particularly neural network algorithms and linear transforms (e.g, the Discrete Fourier Transform). Utilizing the natural current accumulation feature of memristor crossbar, we developed the Dot-Product Engine (DPE) as a high density, high power efficiency accelerator for approximate matrix-vector multiplication. We firstly invented a conversion algorithm to map arbitrary matrix values appropriately to memristor conductances in a realistic crossbar array, accounting for device physics and circuit issues to reduce computational errors. The accurate device resistance programming in large arrays is enabled by close-loop pulse tuning and access transistors. To validate our approach, we simulated and benchmarked one of the state-of-the-art neural networks for pattern recognition on the DPEs. The result shows no accuracy degradation compared to software approach (99 % pattern recognition accuracy for MNIST data set) with only 4 Bit DAC/ADC requirement, while the DPE can achieve a speed-efficiency product of 1,000× to 10,000× compared to a custom digital ASIC.

...read moreread less

778 citations

Proceedings Article•10.1145/2976749.2978357•

MASCOT: Faster Malicious Arithmetic Secure Computation with Oblivious Transfer

[...]

Marcel Keller¹, Emmanuela Orsini¹, Peter Scholl¹•Institutions (1)

University of Bristol¹

24 Oct 2016

TL;DR: In this article, a secure multi-party computation of arithmetic circuits over a finite field with oblivious transfer has been proposed, which is based on an arithmetic view of oblivious transfer, with careful consistency checks and other techniques to obtain malicious security.

...read moreread less

Abstract: We consider the task of secure multi-party computation of arithmetic circuits over a finite field. Unlike Boolean circuits, arithmetic circuits allow natural computations on integers to be expressed easily and efficiently. In the strongest setting of malicious security with a dishonest majority --- where any number of parties may deviate arbitrarily from the protocol --- most existing protocols require expensive public-key cryptography for each multiplication in the preprocessing stage of the protocol, which leads to a high total cost. We present a new protocol that overcomes this limitation by using oblivious transfer to perform secure multiplications in general finite fields with reduced communication and computation. Our protocol is based on an arithmetic view of oblivious transfer, with careful consistency checks and other techniques to obtain malicious security at a cost of less than 6 times that of semi-honest security. We describe a highly optimized implementation together with experimental results for up to five parties. By making extensive use of parallelism and SSE instructions, we improve upon previous runtimes for MPC over arithmetic circuits by more than 200 times.

...read moreread less

454 citations

Journal Article•10.1016/J.INS.2016.08.078•

The arithmetic of continuous Z-numbers

[...]

Rafik A. Aliev¹, Oleg H. Huseynov, Lala M. Zeinalova•Institutions (1)

Georgia State University¹

10 Dec 2016-Information Sciences

TL;DR: This work developed basic arithmetic operations such as addition, subtraction, multiplication and division, and some algebraic operations as maximum, minimum, square and square root of continuous Z-numbers.

...read moreread less

178 citations

Journal Article•10.19086/DA.1245•

On cap sets and the group-theoretic approach to matrix multiplication

[...]

Jonah Blasiak, Thomas Church, Henry Cohn, Joshua A. Grochow, Eric Naslund, Will Sawin, Christopher Umans - Show less +3 more

21 May 2016-arXiv: Combinatorics

TL;DR: It is shown that a variant of tensor rank due to Tao gives a quantitative understanding of the notion of unstable tensor from geometric invariant theory.

...read moreread less

Abstract: In 2003, Cohn and Umans described a framework for proving upper bounds on the exponent $\omega$ of matrix multiplication by reducing matrix multiplication to group algebra multiplication, and in 2005 Cohn, Kleinberg, Szegedy, and Umans proposed specific conjectures for how to obtain $\omega=2$. In this paper we rule out obtaining $\omega=2$ in this framework from abelian groups of bounded exponent. To do this we bound the size of tricolored sum-free sets in such groups, extending the breakthrough results of Croot, Lev, Pach, Ellenberg, and Gijswijt on cap sets. As a byproduct of our proof, we show that a variant of tensor rank due to Tao gives a quantitative understanding of the notion of unstable tensor from geometric invariant theory.

...read moreread less

152 citations

Journal Article•10.1109/TVLSI.2016.2535398•

Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation

[...]

Georgios Zervakis¹, Kostas Tsoumanis¹, Sotirios Xydis¹, Dimitrios Soudris¹, Kiamal Pekmestzi¹ - Show less +1 more•Institutions (1)

National Technical University of Athens¹

01 Oct 2016-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper proves in a mathematically rigorous manner that in partial product perforation, the imposed errors are bounded and predictable, depending only on the input distribution, in terms of power dissipation and error.

...read moreread less

Abstract: Approximate computing has received significant attention as a promising strategy to decrease power consumption of inherently error tolerant applications. In this paper, we focus on hardware-level approximation by introducing the partial product perforation technique for designing approximate multiplication circuits. We prove in a mathematically rigorous manner that in partial product perforation, the imposed errors are bounded and predictable, depending only on the input distribution. Through extensive experimental evaluation, we apply the partial product perforation method on different multiplier architectures and expose the optimal architecture–perforation configuration pairs for different error constraints. We show that, compared with the respective exact design, the partial product perforation delivers reductions of up to 50% in power consumption, 45% in area, and 35% in critical delay. In addition, the product perforation method is compared with the state-of-the-art approximation techniques, i.e., truncation, voltage overscaling, and logic approximation, showing that it outperforms them in terms of power dissipation and error.

...read moreread less

142 citations

Journal Article•10.1038/SREP19655•

Secure Multiparty Quantum Computation for Summation and Multiplication.

[...]

Run-hua Shi¹, Yi Mu², Hong Zhong¹, Jie Cui¹, Shun Zhang¹ - Show less +1 more•Institutions (2)

Anhui University¹, Information Technology University²

21 Jan 2016-Scientific Reports

TL;DR: This paper presents a novel and efficient quantum approach to securely compute the summation and multiplication of multiparty private inputs, respectively and can ensure the unconditional security and the perfect privacy protection based on the physical principle of quantum mechanics.

...read moreread less

Abstract: As a fundamental primitive, Secure Multiparty Summation and Multiplication can be used to build complex secure protocols for other multiparty computations, specially, numerical computations. However, there is still lack of systematical and efficient quantum methods to compute Secure Multiparty Summation and Multiplication. In this paper, we present a novel and efficient quantum approach to securely compute the summation and multiplication of multiparty private inputs, respectively. Compared to classical solutions, our proposed approach can ensure the unconditional security and the perfect privacy protection based on the physical principle of quantum mechanics.

...read moreread less

116 citations

Journal Article•10.1016/J.COGNITION.2015.10.002•

Running the number line: Rapid shifts of attention in single-digit arithmetic.

[...]

Romain Mathieu¹, Romain Mathieu², Audrey Gourjon², Auriane Couderc², Catherine Thevenot¹, Catherine Thevenot³, Jérôme Prado² - Show less +3 more•Institutions (3)

University of Geneva¹, Centre national de la recherche scientifique², University of Lausanne³

01 Jan 2016-Cognition

TL;DR: The results demonstrate that solving single-digit addition and subtraction, but not multiplication, is associated with horizontal shifts of attention, and support the idea that mental movements to the left or right of a sequential representation of numbers are elicited during single- digit arithmetic.

...read moreread less

91 citations

Journal Article•10.1109/TVLSI.2015.2391274•

A High-Speed FPGA Implementation of an RSD-Based ECC Processor

[...]

Hamad Marzouqi¹, Mahmoud Al-Qutayri¹, Khaled Salah¹, Dimitrios Schinianakis², Thanos Stouraitis² - Show less +1 more•Institutions (2)

Khalifa University¹, University of Patras²

01 Jan 2016-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The proposed processor employs extensive pipelining techniques for Karatsuba-Ofman method to achieve high throughput multiplication and supports the recommended NIST curve P256 and is based on an extended NIST reduction scheme.

...read moreread less

Abstract: In this paper, an exportable application-specific instruction-set elliptic curve cryptography processor based on redundant signed digit representation is proposed. The processor employs extensive pipelining techniques for Karatsuba–Ofman method to achieve high throughput multiplication. Furthermore, an efficient modular adder without comparison and a high-throughput modular divider, which results in a short datapath for maximized frequency, are implemented. The processor supports the recommended NIST curve P256 and is based on an extended NIST reduction scheme. The proposed processor performs single-point multiplication employing points in affine coordinates in 2.26 ms and runs at a maximum frequency of 160 MHz in Xilinx Virtex 5 (XC5VLX110T) field-programmable gate array.

...read moreread less

84 citations

Journal Article•10.1137/15M104253X•

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

[...]

Ariful Azad¹, Grey Ballard², Aydin Buluc¹, James Demmel, Laura Grigori, Oded Schwartz³, Sivan Toledo, Samuel Williams - Show less +4 more•Institutions (3)

Lawrence Berkeley National Laboratory¹, Sandia National Laboratories², Hebrew University of Jerusalem³

08 Nov 2016-SIAM Journal on Scientific Computing

TL;DR: In this paper, the authors present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrency.

...read moreread less

Abstract: Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdos--Renyi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.

...read moreread less

71 citations

Proceedings Article•10.1109/IPDPS.2016.117•

Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication

[...]

Penporn Koanantakool¹, Ariful Azad¹, Aydin Buluc¹, Dmitriy Morozov¹, Sang-Yun Oh¹, Leonid Oliker¹, Katherine Yelick¹ - Show less +3 more•Institutions (1)

Lawrence Berkeley National Laboratory¹

23 May 2016

TL;DR: This paper analyzes the communication lower bounds and compares the communication costs of various classic parallel algorithms in the context of sparse-dense matrix-matrix multiplication and presents new communication-avoiding algorithms based on a 1D decomposition, called 1.5D.

...read moreread less

Abstract: Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of applications in many areas such as machine learning and graph algorithms. However, most previous work on parallel matrix multiplication considered only both dense or both sparse matrix operands. This paper analyzes the communication lower bounds and compares the communication costs of various classic parallel algorithms in the context of sparse-dense matrix-matrix multiplication. We also present new communication-avoiding algorithms based on a 1D decomposition, called 1.5D, which -- while suboptimal in dense-dense and sparse-sparse cases -- outperform the 2D and 3D variants both theoretically and in practice for sparse-dense multiplication. Our analysis separates one-time costs from per iteration costs in an iterative machine learning context. Experiments demonstrate speedups up to 100x over a baseline 3D SUMMA implementation and show parallel scaling over 10 thousand cores.

...read moreread less

68 citations

Journal Article•10.1109/TIFS.2015.2491261•

Efficient Implementation of NIST-Compliant Elliptic Curve Cryptography for 8-bit AVR-Based Sensor Nodes

[...]

Zhe Liu¹, Hwajeong Seo², Johann Großschädl¹, Howon Kim²•Institutions (2)

University of Luxembourg¹, Pusan National University²

01 Jul 2016-IEEE Transactions on Information Forensics and Security

TL;DR: A highly optimized software implementation of standards-compliant elliptic curve cryptography (ECC) for wireless sensor nodes equipped with an 8-bit AVR microcontroller and its implementation of scalar multiplication has a highly regular execution profile, which helps to protect against certain side-channel attacks.

...read moreread less

Abstract: In this paper, we introduce a highly optimized software implementation of standards-compliant elliptic curve cryptography (ECC) for wireless sensor nodes equipped with an 8-bit AVR microcontroller. We exploit the state-of-the-art optimizations and propose novel techniques to further push the performance envelope of a scalar multiplication on the NIST P-192 curve. To illustrate the performance of our ECC software, we develope the prototype implementations of different cryptographic schemes for securing communication in a wireless sensor network, including elliptic curve Diffie–Hellman (ECDH) key exchange, the elliptic curve digital signature algorithm (ECDSA), and the elliptic curve Menezes–Qu–Vanstone (ECMQV) protocol. We obtain record-setting execution times for fixed-base, point variable-base, and double-base scalar multiplication. Compared with the related work, our ECDH key exchange achieves a performance gain of roughly 27% over the best previously published result using the NIST P-192 curve on the same platform, while our ECDSA performs twice as fast as the ECDSA implementation of the well-known TinyECC library. We also evaluate the impact of Karatsuba’s multiplication technique on the overall execution time of a scalar multiplication. In addition to offering high performance, our implementation of scalar multiplication has a highly regular execution profile, which helps to protect against certain side-channel attacks. Our results show that NIST-compliant ECC can be implemented efficiently enough to be suitable for resource-constrained sensor nodes.

...read moreread less

Proceedings Article•10.1109/ISVLSI.2016.48•

Design of Division Circuits for Stochastic Computing

[...]

Te-Hsuan Chen¹, John P. Hayes¹•Institutions (1)

University of Michigan¹

1 Jul 2016

TL;DR: A novel division technique called CORDIV is proposed that exploits correlation between the input parameters and not only has lower cost than previous stochastic dividers, but is also significantly more accurate.

...read moreread less

Abstract: Stochastic computing (SC) encodes data in the signal probabilities associated with pseudo-random bit-streams. It enables very low-area and low-power arithmetic operations using standard VLSI circuits, it is also highly error-tolerant. While addition, subtraction and multiplication have extremely simple SC implementations, this is not true for division. Known stochastic dividers employ sequential logic circuits whose accuracy, convergence properties, etc., are unsatisfactory or not well under-stood. As a result, division is usually avoided or approximated in SC design. We first review and analyze in depth the existing design approaches to stochastic division. We then propose a novel division technique called CORDIV that exploits correlation between the input parameters. CORDIV not only has lower cost than previous stochastic dividers, but is also significantly more accurate. Area is reduced mainly because CORDIV requires less overhead for stochastic number conversion. We provide experimental data showing a typical 3x reduction in area and about a 10x improvement in accuracy.

...read moreread less

Journal Article•10.1145/3015144•

Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication

[...]

Grey Ballard¹, Alex Druinsky², Nicholas Knight³, Oded Schwartz⁴•Institutions (4)

Sandia National Laboratories¹, Lawrence Berkeley National Laboratory², New York University³, Hebrew University of Jerusalem⁴

26 Dec 2016

TL;DR: In this article, a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM) is proposed, which correctly describes both the interprocessor communication volume along a critical path in a parallel computation and also the volume of data moving through the memory hierarchy in a sequential computation.

...read moreread less

Abstract: We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM), a key computational kernel in scientific computing and data analysis whose performance is often communication bound. This model correctly describes both the interprocessor communication volume along a critical path in a parallel computation and also the volume of data moving through the memory hierarchy in a sequential computation. We show that identifying a communication-optimal algorithm for particular input matrices is equivalent to solving a hypergraph partitioning problem. Our approach is nonzero structure dependent, meaning that we seek the best algorithm for the given input matrices.In addition to our three-dimensional fine-grained model, we also propose coarse-grained one-dimensional and two-dimensional models that correspond to simpler SpGEMM algorithms. We explore the relations between our models theoretically, and we study their performance experimentally in the context of three applications that use SpGEMM as a key computation. For each application, we find that at least one coarse-grained model is as communication efficient as the fine-grained model. We also observe that different applications have affinities for different algorithms.Our results demonstrate that hypergraphs are an accurate model for reasoning about the communication costs of SpGEMM as well as a practical tool for exploring the SpGEMM algorithm design space.

...read moreread less

Proceedings Article•

Tractable Operations for Arithmetic Circuits of Probabilistic Models

[...]

Yujia Shen¹, Arthur Choi¹, Adnan Darwiche¹•Institutions (1)

University of California, Los Angeles¹

1 Jan 2016

TL;DR: A recently proposed arithmetic circuit representation, the Probabilistic Sentential Decision Diagram (PSDD), is considered and it is shown that PSDD supports a polytime multiplication operator, while they do not support a poly time operator for summing-out variables.

...read moreread less

Abstract: We consider tractable representations of probability distributions and the polytime operations they support. In particular, we consider a recently proposed arithmetic circuit representation, the Probabilistic Sentential Decision Diagram (PSDD). We show that PSDD supports a polytime multiplication operator, while they do not support a polytime operator for summing-out variables. A polytime multiplication operator make PSDDs suitable for a broader class of applications compared to arithmetic circuits, which do not in general support multiplication. As one example, we show that PSDD multiplication leads to a very simple but effective compilation algorithm for probabilistic graphical models: represent each model factor as a PSDD, and then multiply them.

...read moreread less

Proceedings Article•10.1109/ICCPCT.2016.7530294•

Area efficient modified vedic multiplier

[...]

G. Challa Ram, Y. Rama Lakshmanna, D. Sudha Rani, K. Bala Sindhuri

18 Mar 2016

TL;DR: The efficiency of Urdhva Tiryagbhyam (vertical and crosswise) Vedic method for multiplication which is different from the process of normal multiplication is presented and is the most efficient algorithm that gives minimum delay for multiplication for all types of numbers irrespective of their size.

...read moreread less

Abstract: This paper describes the design of high speed Vedic multiplier that uses the techniques of Vedic mathematics based on 16 sutras (algorithms) to improve the performance. In this paper the efficiency of Urdhva Tiryagbhyam (vertical and crosswise) Vedic method for multiplication which is different from the process of normal multiplication is presented. Urdhva-Tiryagbhyam is the most efficient algorithm that gives minimum delay for multiplication for all types of numbers irrespective of their size. Vedic multiplier is coded in Verilog HDL and stimulated and synthesized by using XILINX software 12.2 on Spartan 3E kit. Further the design of array multiplier is compared with the proposed multiplier in terms of delay, memory and power consumption.

...read moreread less

Book Chapter•10.1007/978-3-662-49890-3_15•

Reverse-Engineering the S-Box of Streebog, Kuznyechik and STRIBOBr1

[...]

Alex Biryukov¹, Léo Perrin¹, Aleksei Udovenko¹•Institutions (1)

University of Luxembourg¹

8 May 2016

TL;DR: In this article, the authors reverse-engineer the S-Box and reveal its hidden structure, which is based on a sort of 2-round Feistel Network where exclusive-or is replaced by a finite field multiplication.

...read moreread less

Abstract: The Russian Federation's standardization agency has recently published a hash function called Streebog and a 128-bit block cipher called Kuznyechik. Both of these algorithms use the same 8-bit S-Box but its design rationale was never made public. In this paper, we reverse-engineer this S-Box and reveal its hidden structure. It is based on a sort of 2-round Feistel Network where exclusive-or is replaced by a finite field multiplication. This structure is hidden by two different linear layers applied before and after. In total, five different 4-bit S-Boxes, a multiplexer, two 8-bit linear permutations and two finite field multiplications in a field of size $$2^{4}$$ are needed to compute the S-Box. The knowledge of this decomposition allows a much more efficient hardware implementation by dividing the area and the delay by 2.5 and 8 respectively. However, the small 4-bit S-Boxes do not have very good cryptographic properties. In fact, one of them has a probability 1 differential. We then generalize the method we used to partially recover the linear layers used to whiten the core of this S-Box and illustrate it with a generic decomposition attack against 4-round Feistel Networks whitened with unknown linear layers. Our attack exploits a particular pattern arising in the Linear Approximations Table of such functions.

...read moreread less

Journal Article•

Stages in Constructing and Coordinating Units Additively and Multiplicatively (Part 2).

[...]

Catherine Ulrich

01 Jan 2016-for the learning of mathematics

TL;DR: A framework of how students develop their ability to construct and coordinate arithmetical units is laid out, which explains precisely why Aiden and Emma have these difficulties and what changed in order for them to resolve them.

...read moreread less

Posted Content•

Randomness Complexity of Private Circuits for Multiplication.

[...]

Sonia Belaïd, Fabrice Benhamouda, Alain Passelègue, Emmanuel Prouff, Adrian Thillard, Damien Vergnaud - Show less +2 more

01 Jan 2016-IACR Cryptology ePrint Archive

TL;DR: In this paper, the authors studied the randomness complexity of multiplication algorithms secure in the d-probing model and provided new theoretical characterizations and constructions, new practical constructions and a new efficient algorithmic tool to analyze the security of such schemes.

...read moreread less

Abstract: Many cryptographic algorithms are vulnerable to side channel analysis and several leakage models have been introduced to better understand these flaws. In 2003, Ishai, Sahai and Wagner introduced the d-probing security model, in which an attacker can observe at most d intermediate values during a processing. They also proposed an algorithm that securely performs the multiplication of 2 bits in this model, using only $$dd+1/2$$dd+1/2 random bits to protect the computation. We study the randomness complexity of multiplication algorithms secure in the d-probing model. We propose several contributions: we provide new theoretical characterizations and constructions, new practical constructions and a new efficient algorithmic tool to analyze the security of such schemes. We start with a theoretical treatment of the subject: we propose an algebraic model for multiplication algorithms and exhibit an algebraic characterization of the security in the d-probing model. Using this characterization, we prove a linear in d lower bound and a quasi-linear non-constructive upper bound for this randomness cost. Then, we construct a new generic algorithm to perform secure multiplication in the d-probing model that only uses $$d + d^2/4$$d+d2/4 random bits. From a practical point of view, we consider the important cases $$d \le 4$$d≤4 that are actually used in current real-life implementations and we build algorithms with a randomness complexity matching our theoretical lower bound for these small-order cases. Finally, still using our algebraic characterization, we provide a new dedicated verification tool, based on information set decoding, which aims at finding attacks on algorithms for fixed order d at a very low computational cost.

...read moreread less

Posted Content•

A work-efficient parallel sparse matrix-sparse vector multiplication algorithm

[...]

Ariful Azad¹, Aydin Buluc¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

25 Oct 2016-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse is presented.

...read moreread less

Abstract: We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the emerging GraphBLAS standard and is the workhorse of many graph algorithms including breadth-first search, bipartite graph matching, and maximal independent set. As thread counts increase, existing multithreaded SpMSpV algorithms can spend more time accessing the sparse matrix data structure than doing arithmetic. Our shared-memory parallel SpMSpV algorithm is work efficient in the sense its total work is proportional to the number of arithmetic operations required. The key insight is to avoid each thread individually scan the list of matrix columns. Our algorithm is simple to implement and operates on existing column-based sparse matrix formats. It performs well on diverse matrices and vectors with heterogeneous sparsity patterns. A high-performance implementation of the algorithm attains up to 15x speedup on a 24-core Intel Ivy Bridge processor and up to 49x speedup on a 64-core Intel KNL manycore processor. In contrast to implementations of existing algorithms, the performance of our algorithm is sustained on a variety of different input types include matrices representing scale-free and high-diameter graphs.

...read moreread less

Journal Article•10.1109/TC.2015.2498606•

Optimised Multiplication Architectures for Accelerating Fully Homomorphic Encryption

[...]

Xiaolin Cao, Ciara Moore¹, Maire O'Neill¹, Elizabeth O'Sullivan¹, Neil Hanley¹ - Show less +1 more•Institutions (1)

Queen's University Belfast¹

01 Sep 2016-IEEE Transactions on Computers

TL;DR: Two optimised multiplier architectures for large integer multiplication are proposed, including a low-latency hardware architecture of an integer-FFT multiplier and the use of low Hamming weight (LHW) parameters to create a novel hardware architecture for largeinteger multiplication in integer-based FHE schemes.

...read moreread less

Abstract: Large integer multiplication is a major performance bottleneck in fully homomorphic encryption (FHE) schemes over the integers. In this paper two optimised multiplier architectures for large integer multiplication are proposed. The first of these is a low-latency hardware architecture of an integer-FFT multiplier. Secondly, the use of low Hamming weight (LHW) parameters is applied to create a novel hardware architecture for large integer multiplication in integer-based FHE schemes. The proposed architectures are implemented, verified and compared on the Xilinx Virtex-7 FPGA platform. Finally, the proposed implementations are employed to evaluate the large multiplication in the encryption step of FHE over the integers. The analysis shows a speed improvement factor of up to 26.2 for the low-latency design compared to the corresponding original integer-based FHE software implementation. When the proposed LHW architecture is combined with the low-latency integer-FFT accelerator to evaluate a single FHE encryption operation, the performance results show that a speed improvement by a factor of approximately 130 is possible.

...read moreread less

Journal Article•10.1002/NME.5119•

Parallel implementation of spectral element method for Lamb wave propagation modeling

[...]

Pawel Kudela¹•Institutions (1)

Polish Academy of Sciences¹

11 May 2016-International Journal for Numerical Methods in Engineering

TL;DR: The proposed spectral element method implementation is based on sparse matrix storage of local shape function derivatives calculated at Gauss–Lobatto–Legendre points and it has been found that computation on multicore GPU is up to 14 times faster than on single CPU.

...read moreread less

Abstract: Summary The proposed spectral element method implementation is based on sparse matrix storage of local shape function derivatives calculated at Gauss–Lobatto–Legendre points. The algorithm utilizes two basic operations: multiplication of sparse matrix by vector and element-by-element vectors multiplication. Compute-intensive operations are performed for a part of equation of motion derived at the degree of freedom level of 3D isoparametric spectral elements. The assembly is performed at the force vector in such a way that atomic operations are minimized. This is achieved by a new mesh coloring technique The proposed parallel implementation of spectral element method on GPU is applied for the first time for Lamb wave simulations. It has been found that computation on multicore GPU is up to 14 times faster than on single CPU. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

Proceedings Article•10.1109/DSD.2016.70•

Flexible FPGA-Based Architectures for Curve Point Multiplication over GF(p)

[...]

Dorian Amiet¹, Andreas Curiger, Paul Zbinden¹•Institutions (1)

Hochschule für Technik Rapperswil¹

1 Aug 2016

TL;DR: This paper presents a novel hardware architecture that calculates the elliptic curve point multiplication (ECPM) with the best performance reported so far for ECC point multiplication for arbitrary prime field curves without the use of FPGA reconfiguration.

...read moreread less

Abstract: Elliptic curve cryptography (ECC) is widely used as an efficient mechanism to secure private data using public-key protocols. This paper focuses on ECC over prime fields (GF(p)). We present a novel hardware architecture that calculates the elliptic curve point multiplication (ECPM). Our processor supports arbitrary prime fields with sizes up to 1024 bits. Different standards, which use curves in short Weierstrass form are supported. A Xilinx Virtex-7 implementation of the proposed hardware architecture takes from 0.69 ms for a 192-bit point multiplication up to 9.7 ms for 512-bit. The implementation takes only 20 DSP slices and 6816 LUTs. To the authors knowledge, this is the best performance reported so far for ECC point multiplication for arbitrary prime field curves without the use of FPGA reconfiguration.

...read moreread less

Journal Article•10.1111/J.1931-0846.2015.12137.X•

Border as Method, or, the Multiplication of Labor

[...]

Kuan-Chi Wang¹•Institutions (1)

University of Oregon¹

01 Jul 2016-Geographical Review

TL;DR: Mezzadra and Neilson as discussed by the authors described the border as a method for the multiplication of labor, and showed that the method can be used to increase the productivity of workers.

...read moreread less

Abstract: Border as Method, or, the Multiplication of Labor. By Sandro Mezzadra and Brett Neilson. xiv and 379 pp.; bibliog., index. Durham, N.C.: Duke University Press, 2013. $24.62 (paper), isbn 0822355035...

...read moreread less

Book Chapter•10.1007/978-3-662-53008-5_16•

On the Communication Required for Unconditionally Secure Multiplication

[...]

Ivan Damgård¹, Jesper Buus Nielsen¹, Antigoni Polychroniadou¹, Michael Raskin¹•Institutions (1)

Aarhus University¹

14 Aug 2016

TL;DR: In this article, it was shown that for the honest majority setting, and for the dishonest majority setting with preprocessing, any gate-by-gate protocol must communicate a constant number of bits for every multiplication gate, where n is the number of players.

...read moreread less

Abstract: Many information-theoretic secure protocols are known for general secure multi-party computation, in the honest majority setting, and in the dishonest majority setting with preprocessing. All known protocols that are efficient in the circuit size of the evaluated function follow the same "gate-by-gate" design pattern: we work through an arithmetic boolean circuit on secret-shared inputs, such that after we process a gate, the output of the gate is represented as a random secret sharing among the players. This approach usually allows non-interactive processing of addition gates but requires communication for every multiplication gate. Thus, while information-theoretic secure protocols are very efficient in terms of computational work, they seem to require more communication and more rounds than computationally secure protocols. Whether this is inherent is an open and probably very hard problem. However, in this work we show that it is indeed inherent for protocols that follow the "gate-by-gate" design pattern. We present the following results:In the honest majority setting, as well as for dishonest majority with preprocessing, any gate-by-gate protocol must communicate $$\varOmega n$$ bits for every multiplication gate, where n is the number of players.In the honest majority setting, we show that one cannot obtain a bound that also grows with the field size. Moreover, for a constant number of players, amortizing over several multiplication gates does not allow us to save on the computational work, and --- in a restricted setting --- we show that this also holds for communication. All our lower bounds are met upi¾?to a constant factor by known protocols that follow the typical gate-by-gate paradigm. Our results imply that a fundamentally new approach must be found in order to improve the communication complexity of known protocols, such as BGW, GMW, SPDZ etc.

...read moreread less

Journal Article•10.1007/S40509-015-0056-4•

Relativity of arithmetic as a fundamental symmetry of physics

[...]

Marek Czachor¹, Marek Czachor²•Institutions (2)

Gdańsk University of Technology¹, Vrije Universiteit Brussel²

1 Jun 2016

TL;DR: In this article, a change of realization of arithmetic, without altering the remaining structures of a given equation, plays the same role as a symmetry transformation, and an appropriate construction of arithmetic turns out to be particularly important for dynamical systems in fractal space-times.

...read moreread less

Abstract: Arithmetic operations can be defined in various ways, even if one assumes commutativity and associativity of addition and multiplication, and distributivity of multiplication with respect to addition. In consequence, whenever one encounters ‘plus’ or ‘times’ one has certain freedom of interpreting this operation. This leads to some freedom in definitions of derivatives, integrals and, thus, practically all equations occurring in natural sciences. A change of realization of arithmetic, without altering the remaining structures of a given equation, plays the same role as a symmetry transformation. An appropriate construction of arithmetic turns out to be particularly important for dynamical systems in fractal space-times. Simple examples from classical and quantum, relativistic and nonrelativistic physics are discussed, including the eigenvalue problem for a quantum harmonic oscillator. It is explained why the change of arithmetic is not equivalent to the usual change of variables.

...read moreread less

Journal Article•10.1016/J.NEUCOM.2015.01.091•

Integer undirected graphical models for resource-constrained systems

[...]

Nico Piatkowski¹, Sangkyun Lee¹, Katharina Morik¹•Institutions (1)

Technical University of Dortmund¹

15 Jan 2016-Neurocomputing

TL;DR: A new class of probabilistic graphical models is proposed that approximates the full joint probability distribution of discrete multivariate random variables by relying only on integer addition/multiplication and binary bit shift operations.

...read moreread less

Proceedings Article•10.1109/UPCON.2016.7894719•

FPGA implementation of complex multiplier using minimum delay Vedic real multiplier architecture

[...]

K. Deergha Rao¹, Ch. Gangadhar, Praveen K. Korrai²•Institutions (2)

Osmania University¹, Indian Institute of Technology Kharagpur²

1 Jan 2016

TL;DR: Two possible architectures are proposed for a Vedic real multiplier based on the URDHVA TIRYAKBHYAM (Vertically and cross wise) sutra of Indian Vedic mathematics and an expression for path delay of an N×N Vedicreal multiplier with minimum path delay architecture is developed.

...read moreread less

Abstract: Complex numbers multiplication is a key arithmetic operation to be performed with high speed and less consumption of power in high performance systems such as wireless communications. Hence, in this paper, two possible architectures are proposed for a Vedic real multiplier based on the URDHVA TIRYAKBHYAM (Vertically and cross wise) sutra of Indian Vedic mathematics and an expression for path delay of an N×N Vedic real multiplier with minimum path delay architecture is developed. Then, architectures of four Vedic real multipliers solution, three Vedic real multipliers solution of complex multiplier are presented. The architecture of Vedic real multiplier with minimum path delay is used in the implementation of complex multiplier. The architectures for the four multiplier solution and three multiplier solution of complex multiplier for 32 × 32 bit complex numbers multiplication are coded in VHDL and implemented through Xilinx ISE 13.4 navigator and Modelsim 5.6. Finally, the results are compared with that of the four and three real multipliers solutions using the conventional Booth and Array multipliers.

...read moreread less

Journal Article•10.1002/SEC.1706•

Efficient arithmetic on ARM-NEON and its application for high-speed RSA implementation

[...]

Hwajeong Seo¹, Zhe Liu², Johann Großschädl², Howon Kim¹•Institutions (2)

Pusan National University¹, University of Luxembourg²

01 Dec 2016-Security and Communication Networks

TL;DR: A novel Double Operand Scanning (DOS) method to speed-up multi-precision squaring with non-redundant representations on SIMD architecture, compatible with separated Montgomery algorithms and highly efficient for RSA crypto system is introduced.

...read moreread less

Abstract: Advanced modern processors support Single Instruction Multiple Data (SIMD) instructions (e.g. Intel-AVX, ARM-NEON) and a massive body of research on vector-parallel implementations of modular arithmetic, which are crucial components for modern public-key cryptography ranging from RSA, ElGamal, DSA and ECC, have been conducted. In this paper, we introduce a novel Double Operand Scanning (DOS) method to speed-up multi-precision squaring with non-redundant representations on SIMD architecture. The DOS technique partly doubles the operands and computes the squaring operation without ReadAfter-Write (RAW) dependencies between source and destination variables. Furthermore, we presented Karatsuba Cascade Operand Scanning (KCOS) multiplication and Karatsuba Double Operand Scanning (KDOS) squaring by adopting additive and subtractive Karatsuba’s methods, respectively. The proposed multiplication and squaring methods are compatible with separated Montgomery algorithms and these are highly efficient for RSA crypto system. Finally, our proposed multiplication/squaring, separated Montgomery multiplication/squaring and RSA encryption outperform the best-known results by 22/41%, 25/33% and 30% on the Cortex-A15 platform.

...read moreread less

Posted Content•

On the geometry of border rank algorithms for matrix multiplication and other tensors with symmetry

[...]

Joseph M. Landsberg, Mateusz Michałek

29 Jan 2016-arXiv: Algebraic Geometry

TL;DR: It is proved that border rank algorithms for tensors with symmetry come in families that include representatives with normal forms, which will be useful both to develop new efficient algorithms and to prove lower complexity bounds.

...read moreread less

Abstract: We establish basic information about border rank algorithms for the matrix multiplication tensor and other tensors with symmetry. We prove that border rank algorithms for tensors with symmetry (such as matrix multiplication and the determinant polynomial) come in families that include representatives with normal forms. These normal forms will be useful both to develop new efficient algorithms and to prove lower complexity bounds. We derive a border rank version of the substitution method used in proving lower bounds for tensor rank. We use this border-substitution method and a normal form to improve the lower bound on the border rank of matrix multiplication by one, to 2n^2- n+1. We also point out difficulties that will be formidable obstacles to future progress on lower complexity bounds for tensors because of the "wild" structure of the Hilbert scheme of points.

...read moreread less

Journal Article•10.1049/IET-CDT.2015.0055•

Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach

[...]

Miguel Morales-Sandoval¹, Arturo Diaz-Perez¹•Institutions (1)

Instituto Politécnico Nacional¹

19 Apr 2016-Iet Computers and Digital Techniques

TL;DR: The proposed designs for IDDMM are well suited to be implemented in modern FPGAs, making use of available dedicated multipliers and memory blocks reducing drastically the FPGA's standard logic while keeping an acceptable performance compared with other implementation approaches.

...read moreread less

Abstract: This study presents a scalable hardware architecture for modular multiplication in prime fields GF( p ). A novel iterative digit-digit Montgomery multiplication (IDDMM) algorithm is proposed and two hardware architectures that compute that algorithm are described. The input operands (multiplicand, multiplier and modulus) are represented using as radix β = 2 k . Multiplication over GF( p ) is possible using almost the same hardware since the complexity of multiplier's kernel module depends mainly on k and not on p . The novel hardware architectures of GF( p ) multipliers were evaluated on three Xilinx FPGA families. Design trade-offs were analysed considering different operand sizes commonly used in cryptography and different digits sizes. The proposed designs for IDDMM are well suited to be implemented in modern FPGAs, making use of available dedicated multipliers and memory blocks reducing drastically the FPGA's standard logic while keeping an acceptable performance compared with other implementation approaches. From the Virtex5 implementation, the proposed MM multiplier reaches a throughput of 242 Mbps using only 219 FPGA slices and achieving a 1024-bit modular multiplication in 4.21μs. This is 26 times less area resources than similar related works in the literature with an improved efficiency of 7x.

...read moreread less

...

Expand