TL;DR: A fully homomorphic encryption scheme, using only elementary modular arithmetic, that reduces the security of the scheme to finding an approximate integer gcd, and investigates the hardness of this task, building on earlier work of Howgrave-Graham.
Abstract: We construct a simple fully homomorphic encryption scheme, using only elementary modular arithmetic. We use Gentry’s technique to construct a fully homomorphic scheme from a “bootstrappable” somewhat homomorphic scheme. However, instead of using ideal lattices over a polynomial ring, our bootstrappable encryption scheme merely uses addition and multiplication over the integers. The main appeal of our scheme is the conceptual simplicity.
We reduce the security of our scheme to finding an approximate integer gcd – i.e., given a list of integers that are near-multiples of a hidden integer, output that hidden integer. We investigate the hardness of this task, building on earlier work of Howgrave-Graham.
TL;DR: In this article, it was shown that ω ≤ log 2 7 < 2.8074, which is better than the value of 3 we had previously, and showed how cubing and raising to the fourth power of Coppersmith and Winograd's complicated algorithm can improve the precision of matrix multiplication.
Abstract: The evaluation of the product of two matrices can be very computationally expensive. The multiplication of two n×n matrices, using the “default” algorithm can take O(n3) field operations in the underlying field k. It is therefore desirable to find algorithms to reduce the “cost” of multiplying two matrices together. If multiplication of two n× n matrices can be obtained in O(nα) operations, the least upper bound for α is called the exponent of matrix multiplication and is denoted by ω. A bound for ω < 3 was found in 1968 by Strassen in his algorithm. He found that multiplication of two 2× 2 matrices could be obtained in 7 multiplications in the underlying field k, as opposed to the 8 required to do the same multiplication previously. Using recursion, we are able to show that ω ≤ log2 7 < 2.8074, which is better than the value of 3 we had previously. In chapter 1, we look at various techniques that have been found for reducing ω. These include Pan’s Trilinear Aggregation, Bini’s Border Rank and Schonhage’s Asymptotic Sum inequality. In chapter 2, we look in detail at the current best estimate of ω found by Coppersmith and Winograd. We also propose a different method of evaluating the “value” of trilinear forms. Chapters 3 and 4 build on the work of Coppersmith and Winograd and examine how cubing and raising to the fourth power of Coppersmith and Winograd’s “complicated” algorithm affect the value of ω, if at all. Finally, in chapter 5, we look at the Group-Theoretic context proposed by Cohn and Umans, and see how we can derive some of Coppersmith and Winograd’s values using this method, as well as showing how working in this context can perhaps be more conducive to showing ω = 2.
TL;DR: Brent and Zimmermann as discussed by the authors present algorithms that are ready to implement in your favorite language, while keeping a high-level description and avoiding too low-level or machine-dependent details.
Abstract: Modern Computer Arithmetic focuses on arbitrary-precision algorithms for efficiently performing arithmetic operations such as addition, multiplication and division, and their connections to topics such as modular arithmetic, greatest common divisors, the Fast Fourier Transform (FFT), and the computation of elementary and special functions. Brent and Zimmermann present algorithms that are ready to implement in your favorite language, while keeping a high-level description and avoiding too low-level or machine-dependent details. The book is intended for anyone interested in the design and implementation of efficient high-precision algorithms for computer arithmetic, and more generally efficient multiple-precision numerical algorithms. It may also be used in a graduate course in mathematics or computer science, for which exercises are included. These vary considerably in difficulty, from easy to small research projects, and expand on topics discussed in the text. Solutions are available from the authors.
TL;DR: Comparison study of different multipliers of Ancient Indian Vedic Mathematics is done for low power requirement and high speed to improve the speed, area parameters of multipliers.
Abstract: A typical processor central processing unit devotes a considerable amount of processing time in performing arithmetic operations, particularly multiplication operations. Multiplication is one of the basic arithmetic operations and it requires substantially more hardware resources and processing time than addition and subtraction. In fact, 8.72% of all the instruction in typical processing units is multiplication. In this paper, comparative study of different multipliers is done for low power requirement and high speed. The paper gives information of “Urdhva Tiryakbhyam” algorithm of Ancient Indian Vedic Mathematics which is utilized for multiplication to improve the speed, area parameters of multipliers. Vedic Mathematics suggests one more formula for multiplication of large number i.e. “Nikhilam Sutra” which can increase the speed of multiplier by reducing the number of iterations.
TL;DR: In this paper, techniques for implementing multipliers using memory blocks in an integrated circuit (IC) are provided. The disclosed techniques may reduce the number of memory blocks required to implement various multiplication operations.
Abstract: Techniques for implementing multipliers using memory blocks in an integrated circuit (IC) are provided. The disclosed techniques may reduce the number of memory blocks required to implement various multiplication operations. A plurality of generated products is normalized. The normalized products are scaled to generate a plurality of scaled products. Scaled products with the least root mean square (RMS) error are identified. The scaled products with the least RMS error are then stored in a plurality of memory blocks in an IC. The scaled products may have a reduced number of bits compared to the plurality of generated products that have not been normalized and scaled.
TL;DR: Evidence is provided for an untrained, intuitive process of calculating multiplicative numerical relationships, providing a further foundation for formal arithmetic instruction.
TL;DR: These effects were found when the participants executed the easiest strategy and when they solved easy problems, and the implications for models of strategy choices were discussed.
Abstract: Three experiments tested whether switching between strategies involves a cost. In three experiments, participants had to give approximate products to two-digit multiplication problems (e.g., 47 x 76). They were told which strategy to use (Experiments 1 and 2) or could choose among strategies (Experiment 3). The participants showed poorer performance when they used different strategies on two consecutive trials than when they used the same strategy. They also used the same strategy over two consecutive problems more often than they used different strategies. These effects, termed strategy switch costs, were found when the participants executed the easiest strategy and when they solved easy problems. We discuss possible processes underlying these strategy switch costs and the implications of these strategy switch costs for models of strategy choices.
TL;DR: The proposed architectures of two parallel decimal multipliers have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.
Abstract: The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.
TL;DR: The results strongly indicate that binary curves are the most efficient alternative for the implementation of elliptic curve cryptography in the MICAz Mote, a popular sensor platform.
Abstract: The deployment of cryptography in sensor networks is a challenging task, given the limited computational power and the resource-constrained
nature of the sensoring devices. This paper presents the implementation of
elliptic curve cryptography in the MICAz Mote, a popular sensor platform.
We present optimization techniques for arithmetic in binary fields, including
squaring, multiplication and modular reduction at two different security levels.
Our implementation of field multiplication and modular reduction algorithms
focuses on the reduction of memory accesses and appears as the fastest result
for this platform. Finite field arithmetic was implemented in C and Assembly
and elliptic curve arithmetic was implemented in Koblitz and generic binary
curves. We illustrate the performance of our implementation with timings for
key agreement and digital signature protocols. In particular, a key agreement
can be computed in 0.40 seconds and a digital signature can be computed and
verified in 1 second at the 163-bit security level. Our results strongly indicate
that binary curves are the most efficient alternative for the implementation of
elliptic curve cryptography in this platform.
TL;DR: Two types of multistate Hopfield neural networks, based on commutative quaternion that are similar to Hamilton's quaternions but with Commutative multiplication are explored, i.e., the energies monotonically decreases with respect to the changes of the network states.
Abstract: This paper explores two types of multistate Hopfield neural networks, based on commutative quaternions that are similar to Hamilton's quaternions but with commutative multiplication. In one type of the networks, the state of a neuron is represented by two kinds of phases and one real number. The other type of the networks adopts the decomposed form of commutative quaternion, i.e., the state of a neuron consists of a combination of two complex values. We have investigated the stabilities of these networks, i.e., the energies monotonically decreases with respect to the changes of the network states.
TL;DR: In this paper, the memristance of the newly found circuit element is used to represent signals instead of voltages or currents, and a new circuit is designed for programming the memory of a memristor with a predetermined analog value.
Abstract: In almost all of the currently working circuits, especially in analog circuits implementing signal processing applications, basic arithmetic operations such as multiplication, addition, subtraction and division are performed on values which are represented by voltages or currents. However, in this paper, we propose a new and simple method for performing analog arithmetic operations which in this scheme, signals are represented and stored through a memristance of the newly found circuit element, i.e. memristor, instead of voltage or current. Some of these operators such as divider and multiplier are much simpler and faster than their equivalent voltage-based circuits and they require less chip area. In addition, a new circuit is designed for programming the memristance of the memristor with predetermined analog value. Presented simulation results demonstrate the effectiveness and the accuracy of the proposed circuits.
TL;DR: A secure spread spectrum communication scheme using multiplication modulation that multiplies the message by chaotic signal lends itself to cheap implementation and can therefore be used effectively for ensuring security and privacy in commercial consumer electronics products.
TL;DR: An overview of both analog and digital approaches offered in the literature for addition and multiplication will be described, and Memristor-based designs of an adder and a multiplier are presented.
Abstract: This paper describes strategies for performing arithmetic operations in memristor-based structures An overview of both analog and digital approaches offered in the literature for addition and multiplication will be described Memristor-based designs of an adder and a multiplier are presented
TL;DR: An efficient software implementation of characteristic 2 fields making extensive use of vector instruction sets commonly found in desktop processors and follows the trend of accelerating implementations of cryptography through PTLU-style instructions is described.
Abstract: In this paper we describe an efficient software implementation of characteristic 2 fields making extensive use of vector instruction sets commonly found in desktop processors. Field elements are represented in a split form so performance-critical field operations can be formulated in terms of simple operations over 4-bit sets. In particular, we detail techniques for implementing field multiplication, squaring, square root extraction and present a constant-memory lookup-based multiplication strategy. Our representation makes extensive use of the parallel table lookup (PTLU) instruction recently introduced in popular desktop platforms and follows the trend of accelerating implementations of cryptography through PTLU-style instructions. We present timings for several binary fields commonly employed for curve-based cryptography and illustrate the presented techniques with executions of the ECDH and ECDSA protocols over binary curves at the 128-bit and 256-bit security levels standardized by NIST. Our implementation results are compared with publicly available benchmarking data.
TL;DR: BP is an E4 ring spectrum as discussed by the authors, and the E4 structure is unique up to automorphism, and it can be used to detect anomalies in the spectrum and to identify anomalies.
Abstract: BP is an E4 ring spectrum. The E4 structure is unique up to automorphism.
TL;DR: Low level algorithms as mentioned in this paper use bit wizardry and permutations and their operations to find paths in directed graphs and search paths for directed graphs in directed graph graphs, using the GP language.
Abstract: Low level algorithms.- Bit wizardry.- Permutations and their operations.- Sorting and searching.- Data structures.- Combinatorial generation.- Conventions and considerations.- Combinations.- Compositions.- Subsets.- Mixed radix numbers.- Permutations.- Multisets.- Gray codes for string with restrictions.- Parenthesis strings.- Integer partitions.- Set partitions.- Necklaces and Lyndon words.- Hadamard and conference matrices.- Searching paths in directed graphs.- Fast transforms.- The Fourier transform.- Convolution, correlation, and more FFT algorithms.- The Walsh transform and its relatives.- The Haar transform.- The Hartley transform.- Number theoretic transforms (NTTs).- Fast wavelet transforms.- Fast arithmetic.- Fast multiplication and exponentiation.- Root extraction.- Iterations for the inversion of a function.- The AGM, elliptic integrals, and algorithms for computing.- Logarithm and exponential function.- Computing the elementary functions with limited resources.- Numerical evaluation of power series.- Cyclotomic polynomials, product forms, and continued fractions.- Synthetic Iterations.-. Algorithms for finite fields.- Modular arithmetic and some number theory.- Binary polynomials.- Shift registers.- Binary finite fields.- The electronic version of the book.- Machine used for benchmarking.- The GP language.- Bibliography.- Index.
TL;DR: The method generalizes some earlier methods and combines them with the recently introduced complexity notion [email protected]^"q(@?), which denotes the minimum number of multiplications needed to obtain the coefficients of the product of two arbitrary @?-term polynomials modulo x^@? in F"q[x].
TL;DR: A minimal logic depth GD algorithm which requires no lookup table and consumes less switching power than the latest LD constrained GD methods based on the Glitch Path Count and Glitches Path Score metrics.
Abstract: Research on optimization of fixed coefficient FIR filters modeled as Multiple Constant Multiplication (MCM) has been ongoing for two decades. An analysis of Minimal Signed Digit (MSD) reveals that potential good solutions are omitted by Common Subexpression Elimination (CSE) algorithms as they are hidden in the MSD representations. Some CSE algorithms ensure that all coefficients are implemented at minimal Logic Depth (LD) which is advantageous from power saving perspective. Imposing this requirement on a graph dependant (GD) algorithm reduces the search space as well as the runtime. It also eliminates the long critical path of GD algorithm. This paper presents a minimal logic depth GD algorithm which requires no lookup table. Simulation results show that it has lower number of adders than CSE algorithms while having the minimal logic depth. For all filters tested, it consumes less switching power than the latest LD constrained GD methods based on the Glitch Path Count and Glitch Path Score metrics.
TL;DR: A numerical analysis, carried out by using the Perturb and Observe MPPT technique as a benchmark reference, confirms the validity of the proposed approach.
TL;DR: This work investigates a novel optical VMM (OVMM) using five logic operations with the modified signed-digit (MSD) number system and proposes a new implementation method that can be used to realize the MSD multiplication in parallel.
Abstract: Applying the parallelism of optical computing, we present a novel method of vector-matrix multiplication (VMM) based on a new optical computing platform, the ternary optical computer, which can reconfigure any two-input trivalued logic optical processor at runtime, according to the decrease-radix design principle. In this work, we investigate a novel optical VMM (OVMM) using five logic operations with the modified signed-digit (MSD) number system. To simplify the computation process, we realize a carry-free optical addition in three steps, which is independent of the length of the operands. And a new implementation method is proposed that can be used to realize the MSD multiplication in parallel. Based on the generation of partial products in parallel and the binary-addition-tree algorithm, the multiplication can be implemented with the MSD addition. Our initial experiments have been performed to verify the proposed OVMM method. The results show that the proposed method of OVMM is feasible and correct.
TL;DR: This comprehensive reference provides researchers with the thorough understanding of number representations that is a necessary foundation for designing efficient arithmetic algorithms.
Abstract: Fundamental arithmetic operations support virtually all of the engineering, scientific, and financial computations required for practical applications, from cryptography, to financial planning, to rocket science. This comprehensive reference provides researchers with the thorough understanding of number representations that is a necessary foundation for designing efficient arithmetic algorithms. Using the elementary foundations of radix number systems as a basis for arithmetic, the authors develop and compare alternative algorithms for the fundamental operations of addition, multiplication, division, and square root with precisely defined roundings. Various finite precision number systems are investigated, with the focus on comparative analysis of practically efficient algorithms for closed arithmetic operations over these systems. Each chapter begins with an introduction to its contents and ends with bibliographic notes and an extensive bibliography. The book may also be used for graduate teaching: problems and exercises are scattered throughout the text and a solutions manual is available for instructors.
TL;DR: In this paper, the authors present a volume about the various aspects of Church Placement, and the authors describe how to read it in one volume and how to selliered through it.
Abstract: Wow! This book seems to have it all. Has there ever been so much written in one volume about the various aspects of church planting? Just holding the volume in my hand was a bit intimidating. Nearly everyone who has written about church planting in recent years is either quoted or alluded to in this work. OK so I was a little overwhelmed in the beginning. However, I took heart, began at the beginning and soldiered through. Below are my thoughts as I read.
TL;DR: The implementation results show that the proposed 128‐point mixed‐radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128‐ point FFT architectures.
Abstract: In this paper, we present a fast Fourier transform (FFT) processor with four parallel data paths for multiband orthogonal frequency-division multiplexing ultrawideband systems. The proposed 128-point FFT processor employs both a modified radix-2 4 algorithm and a radix-2 3 algorithm to significantly reduce the numbers of complex constant multipliers and complex booth multipliers. It also employs substructure-sharing multiplication units instead of constant multipliers to efficiently conduct multiplication operations with only addition and shift operations. The proposed FFT processor is implemented and tested using 0.18 µm CMOS technology with a supply voltage of 1.8 V. The hardware- efficient 128-point FFT processor with four data streams can support a data processing rate of up to 1 Gsample/s while consuming 112 mW. The implementation results show that the proposed 128-point mixed-radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128-point FFT architectures.
TL;DR: Since the circuit uses only one DSP48E1 block and one Block RAM, the implementation is close to optimal in the sense that it has only less than 3% overhead in multiplication and no further improvement is possible as long as Montgomery multiplication based algorithm is used.
Abstract: The main contribution of this paper is to present an efficient hardware algorithm for RSA encryption/decryption based on Montgomery multiplication. Modern FPGAs have a number of embedded DSP blocks (DSP48E1) and embedded memory blocks (BRAM). Our hardware algorithm supporting 2048-bit RSA encryption/decryption is designed to be implemented using one DSP48E1, one BRAM and few logic blocks (slices) in the Xilinx Virtex-6 family FPGA. The implementation results showed that our RSA module for 2048-bit RSA encryption/decryption runs in 277.26ms. Quite surprisingly, the multiplier in DSP48E1 used to compute Montgomery multiplication works in more than 97% clock cycles over all clock cycles. Hence, our implementation is close to optimal in the sense that it has only less than 3% overhead in multiplication and no further improvement is possible as long as Montgomery multiplication based algorithm is used. Also, since our circuit uses only one DSP48E1 block and one Block RAM, we can implement a number of RSA modules in an FPGA that can work in parallel to attain high throughput RSA encryption/decryption.
TL;DR: This work proposes different implementations of the sparse matrix-dense vector multiplication (SpMV) for finite fields and rings Z /m Z and uses this library and a new parallelisation of the sigma-basis algorithm in a parallel block Wiedemann rank implementation over finite fields.
Abstract: We propose different implementations of the sparse matrix-dense vector multiplication (SpMV) for finite fields and rings Z /mZ. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve the speed of SpMV in the LinBox library, and henceforth the speed of its black-box algorithms. Besides, we use this library and a new parallelisation of the sigma-basis algorithm in a parallel block Wiedemann rank implementation over finite fields.
TL;DR: This paper shows that the refinement rules of interpolating and approximating univariate subdivision schemes with odd-width masks of finite support can be derived ones from the others by simple operations on the mask coefficients, and provides a constructive method for the definition of novel refinement algorithms.
TL;DR: It is shown that by splitting the equations defined over a block cipher (an SP-network) into two sets, one can determine the exact number of linearly independent equations which can be generated in algebraic attacks within each of these sets of a certain degree.
Abstract: This paper is about counting linearly independent equations for so-called algebraic attacks on block ciphers. The basic idea behind many of these approaches, e.g., XL, is to generate a large set of equations from an initial set of equations by multiplication of existing equations by the variables in the system. One of the most difficult tasks is to determine the exact number of linearly independent equations one obtain in the attacks. In this paper, it is shown that by splitting the equations defined over a block cipher (an SP-network) into two sets, one can determine the exact number of linearly independent equations which can be generated in algebraic attacks within each of these sets of a certain degree. While this does not give us a direct formula for the success of algebraic attacks on block ciphers, it gives some interesting bounds on the number of equations one can obtain from a given block cipher. Our results are applied to the AES and to a variant of the AES, and the exact numbers of linearly independent equations in the two sets that one can generate by multiplication of an initial set of equations are given. Our results also indicate, in a novel way, that the AES is not vulnerable to the algebraic attacks as defined here.
TL;DR: Three optimizations include: (1) optimized CSR storage format, (2) optimized threads mapping, and (3) avoiding divergence judgment.
Abstract: in recent years, GPUs have attracted the attention of many application developers as powerful massively parallel system. CUDA as a general purpose parallel computing architecture make GPUs an appealing choice to solve many complex computational problems in a more efficient way. In this paper, we discuss implementing optimizing spare matrix-vector multiplication on NVIDIA GPUs using CUDA programming model. We outline three optimizations include: (1) optimized CSR storage format, (2) optimized threads mapping, and (3) avoiding divergence judgment. We experimentally evaluate our optimizations on GeForce 9600 GTX, connect to Windows xp 64-bit system. In comparison with NVIDIA's SpMV library and NVIDIA's CUDDPA library, the results show that optimizing sparse matrix-vector multiplication on CUDA achieves better performance than other SpMV implementations.
TL;DR: In this paper, the authors introduced the concept of S-functions, which is a function that calculates the output bit using only the inputs of the i-th position and a finite state S[i].
Abstract: An increasing number of cryptographic primitives use operations such as addition modulo 2n, multiplication by a constant and bitwise Boolean functions as a source of non-linearity. In NIST's SHA-3 competition, this applies to 6 out of the 14 second-round candidates. In this paper, we generalize such constructions by introducing the concept of S-functions. An S-function is a function that calculates the i-th output bit using only the inputs of the i-th bit position and a finite state S[i]. Although S-functions have been analyzed before, this paper is the first to present a fully general and efficient framework to determine their differential properties. A precursor of this framework was used in the cryptanalysis of SHA-1. We show how to calculate the probability that given input differences lead to given output differences, as well as how to count the number of output differences with non-zero probability. Our methods are rooted in graph theory, and the calculations can be efficiently performed using matrix multiplications.