TL;DR: In this paper, a somewhat homomorphic encryption scheme using elementary modular arithmetic is described. But the main appeal of their approach is the conceptual simplicity. And the security of their scheme is reduced to finding an approximate integer gcd, i.e., given a list of integers that are near-multiples of a hidden integer, output that hidden integer.
Abstract: We describe a very simple “somewhat homomorphic” encryption scheme using only elementary modular arithmetic, and use Gentry’s techniques to convert it into a fully homomorphic scheme. Compared to Gentry’s construction, our somewhat homomorphic scheme merely uses addition and multiplication over the integers rather than working with ideal lattices over a polynomial ring. The main appeal of our approach is the conceptual simplicity. We reduce the security of our somewhat homomorphic scheme to finding an approximate integer gcd – i.e., given a list of integers that are near-multiples of a hidden integer, output that hidden integer. We investigate the hardness of this task, building on earlier work of HowgraveGraham.
TL;DR: This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices.
Abstract: Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns. We explore SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes. The techniques we propose are efficient, successfully utilizing large percentages of peak bandwidth. Furthermore, they deliver excellent total throughput, averaging 16 GFLOP/s and 10 GFLOP/s in double precision for structured grid and unstructured mesh matrices, respectively, on a GeForce GTX 285. This is roughly 2.8 times the throughput previously achieved on Cell BE and more than 10 times that of a quad-core Intel Clovertown system.
TL;DR: Hardware Implementation of Finite-Field Arithmetic describes algorithms and circuits for executing finite-field operations, including addition, subtraction, multiplication, squaring, exponentiation, and division.
Abstract: Implement Finite-Field Arithmetic in Specific Hardware (FPGA and ASIC)
Master cutting-edge electronic circuit synthesis and design with help from this detailed guide. Hardware Implementation of Finite-Field Arithmetic describes algorithms and circuits for executing finite-field operations, including addition, subtraction, multiplication, squaring, exponentiation, and division.
This comprehensive resource begins with an overview of mathematics, covering algebra, number theory, finite fields, and cryptography. The book then presents algorithms which can be executed and verified with actual input data. Logic schemes and VHDL models are described in such a way that the corresponding circuits can be easily simulated and synthesized. The book concludes with a real-world example of a finite-field application--elliptic-curve cryptography. This is an essential guide for hardware engineers involved in the development of embedded systems.
Get detailed coverage of:
Modulo m reduction
Modulo m addition, subtraction, multiplication, and exponentiation
Operations over GF(p) and GF(pm)
Operations over the commutative ring Zp[x]/f(x)
Operations over the binary field GF(2m) using normal, polynomial, dual, and triangular
Table of contents
Chapter 1. Mathematical background
Chapter 2. Mod m reduction
Chapter 3. Mod m operations
Chapter 4. Operations over GF(p)
Chapter 5. Operations over Zp [x] / f(x)
Chapter 6. Operations over GF(pn)
Chapter 7. Operations over GF(2m) - Polynomial bases
Chapter 8. Operations over GF(2m) - Normal bases
Chapter 9. Operations over GF(2m) - Other bases
Chapter 10. Elliptic curve cryptographyAppendix A. p = 2(192) - 2(64) - 1
Appendix B. Optical Extension Fields
Appendix C. Binary Fields
Appendix D. Ada versus VHDL
Index
TL;DR: This paper compares and proves implementation of normal multiplication and Vedic multiplication (using Urdhva Tiryakbhyam Sutra) on digital hardware requires same number of multiplication and addition operations.
Abstract: Aim of this paper is to compare and prove implementation of normal multiplication and Vedic multiplication (using Urdhva Tiryakbhyam Sutra) on digital hardware requires same number of multiplication and addition operations.It makes difference only for mental calculations. Few VHDL codes has been developed for this. All multipliers has been tested for 16X16 multiplications for comparison. Test vectors has been given through a text file. Implementation has been done for the Xilinx FPGA device, Virtex XCV 300 -6PQ240. Various multiplier implementations such as Array multiplier, Multiplier Macro, Vedic multiplier with full partitioning, Vedic multiplier using 4 bit macro, multiplier using 4 bit macro, fully Recursive Vedic multiplier, Vedic multiplier using 8 bit macro have been tested and compared for optimum area and speed.
TL;DR: This paper revisits the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures and extracts useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.
Abstract: In this paper, we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided, and thus unsuccessful attempts for optimization. In order to gain an insight into the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. In addition, we investigate the parallel version of the kernel and report on the corresponding performance results and their relation to each architecture's specific multithreaded configuration. Based on our experiments, we extract useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.
TL;DR: Some evidence is yielded that successful transfer of knowledge between arithmetic operations is accompanied by modifications of brain activation patterns, and the left angular gyrus seems not only to be involved in the retrieval of stored arithmetic facts, but also in the transfer betweenithmetic operations.
TL;DR: A cache-oblivious method to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behavior during sparse matrix-vector multiplication.
Abstract: In this article, we introduce a cache-oblivious method for sparse matrix-vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behavior during sparse matrix-vector multiplication. Matrices are assumed to be stored in row-major format, by means of the compressed row storage (CRS) or its variants incremental CRS and zig-zag CRS. The zig-zag CRS data structure is shown to fit well with the hypergraph metric used in partitioning sparse matrices for the purpose of parallel computation. The separated block-diagonal (SBD) form is shown to be the appropriate matrix structure for cache enhancement. We have implemented a run-time cache simulation library enabling us to analyze cache behavior for arbitrary matrices and arbitrary cache properties during matrix-vector multiplication within a $k$-way set-associative idealized cache model. The results of these simulations are then verified by actual experiments run on various cache architectures. In all these experiments, we use the Mondriaan sparse matrix partitioner in one-dimensional mode. The savings in computation time achieved by our matrix reorderings reach up to 50 percent, in the case of a large link matrix.
TL;DR: This work claims that if the matrix contains only a constant number of distinct values, then reading the matrix once in O(mn) steps is sufficient to preprocess it such that any subsequent application to vectors requires only O( Mn/log(max{m,n})) operations.
TL;DR: New algorithms and properties are presented in this paper which are used in a software implementation of the IEEE 754R decimal floatingpoint arithmetic, with emphasis on using binary operations efficiently.
Abstract: The IEEE Standard 754-1985 for binary floating-point arithmetic [19] was revised [20], and an important addition is the definition of decimal floating-point arithmetic [8], [24]. This is intended mainly to provide a robust reliable framework for financial applications that are often subject to legal requirements concerning rounding and precision of the results, because the binary floating-point arithmetic may introduce small but unacceptable errors. Using binary floating-point calculations to emulate decimal calculations in order to correct this issue has led to the existence of numerous proprietary software packages, each with its own characteristics and capabilities. The IEEE 754R decimal arithmetic should unify the ways decimal floating-point calculations are carried out on various platforms. New algorithms and properties are presented in this paper, which are used in a software implementation of the IEEE 754R decimal floating-point arithmetic, with emphasis on using binary operations efficiently. The focus is on rounding techniques for decimal values stored in binary format, but algorithms are outlined for the more important or interesting operations of addition, multiplication, and division, including the case of nonhomogeneous operands, as well as conversions between binary and decimal floating-point formats. Performance results are included for a wider range of operations, showing promise that our approach is viable for applications that require decimal floating-point calculations. This paper extends an earlier publication [6].
TL;DR: This paper considers the Montgomery multiplication in the binary extension fields and designs two bit-parallel multipliers which are comparable to the best finite field multipliers reported in the literature.
Abstract: Multiplication and squaring are main finite field operations in cryptographic computations and designing efficient multipliers and squarers affect the performance of cryptosystems. In this paper, we consider the Montgomery multiplication in the binary extension fields and study different structures of bit-serial and bit-parallel multipliers. For each of these structures, we study the role of the Montgomery factor, and then by using appropriate factors, propose new architectures. Specifically, we propose two bit-serial multipliers for general irreducible polynomials, and then derive bit-parallel Montgomery multipliers for two important classes of irreducible polynomials. In this regard, first we consider trinomials and provide a way for finding efficient Montgomery factors which results in a low time complexity. Then, we consider type-II irreducible pentanomials and design two bit-parallel multipliers which are comparable to the best finite field multipliers reported in the literature. Moreover, we consider squaring using this family of irreducible polynomials and show that this operation can be performed very fast with the time complexity of two XOR gates.
TL;DR: In order to improve the speed of parallel decimal multiplication, a new PPG method is presented, fine-tune the PPR method of one of the full solutions and the final addition scheme of the other; thus, assembling a new full solution is presented.
Abstract: Hardware support for decimal computer arithmetic is regaining popularity. One reason is the recent growth of decimal computations in commercial, scientific, financial, and Internet-based computer applications. Newly commercialized decimal arithmetic hardware units use radix-10 sequential multipliers that are rather slow for multiplication-intensive applications. Therefore, the future relevant processors are likely to host fast parallel decimal multiplication circuits. The corresponding hardware algorithms are normally composed of three steps: partial product generation (PPG), partial product reduction (PPR), and final carry-propagating addition. The state of the art is represented by two recent full solutions with alternative designs for all the three aforementioned steps. In addition, PPR by itself has been the focus of other recent studies. In this paper, we examine both of the full solutions and the impact of a PPR-only design on the appropriate one. In order to improve the speed of parallel decimal multiplication, we present a new PPG method, fine-tune the PPR method of one of the full solutions and the final addition scheme of the other; thus, assembling a new full solution. Logical Effort analysis and 0.13 mum synthesis show at least 13 percent speed advantage, but at a cost of at most 36 percent additional area consumption.
TL;DR: This work describes a flexible hardware processor for performing computationally expensive modular addition, subtraction, multiplication, and inversion over prime finite fields GF(p) .
Abstract: Exchange of private information over a public medium must incorporate a method for data protection against unauthorized access. Elliptic curve cryptography (ECC) has become widely accepted as an efficient mechanism to secure sensitive data. The main ECC computation is a scalar multiplication, translating into an appropriate sequence of point operations, each involving several modular arithmetic operations. We describe a flexible hardware processor for performing computationally expensive modular addition, subtraction, multiplication, and inversion over prime finite fields GF(p) . The proposed processor supports all five primes p recommended by NIST, whose sizes are 192, 224, 256, 384, and 521 bits. It can also be programmed to automatically execute sequences of modular arithmetic operations. Our field-programmable gate-array implementation runs at 60 MHz and takes between 4 and 40 ms (depending on the used prime) to perform a typical scalar multiplication.
TL;DR: In this paper, a vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register, and a cross multiply add operation is used to generate a partial product.
Abstract: Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.
TL;DR: This work describes a software implementation of pairing-based cryptography and elliptic curve cryptography for the MSP430 microcontroller, which is used in some wireless sensors including the Tmote Sky and TelosB and improves the speed of both pairing computation and point multiplication.
Abstract: The software implementation of cryptographic schemes for wireless sensor networks poses a challenge due to the limited capabilites of the platform. Nevertheless, its feasibility has been shown in recent papers. In this work we describe a software implementation of pairing-based cryptography and elliptic curve cryptography for the MSP430 microcontroller, which is used in some wireless sensors including the Tmote Sky and TelosB. We have implemented the pairing computation for the MNT and BN curves over prime fields along with the ECDSA scheme. The main result of this work is a platform-specific optimization for the multiplication and reduction routines that leads to a 28% speedup in the field multiplication compared to the best known timings published. This optimization consequently improves the speed of both pairing computation and point multiplication.
TL;DR: It is proposed that retrieval efficiency for arithmetic connects diverse performance and strategy-related effects across key arithmetic factors, including arithmetic operation, numerical size, and numeral format, to promote a shift to procedural backup strategies.
Abstract: Educated adults solve simple addition problems primarily by direct memory retrieval, as opposed to by counting or other procedural strategies, but they report using retrieval substantially less often with problems in written-word format (four + eight) compared with digit format (4 + 8). It was hypothesized that retrieval efficiency is relatively low with word operands compared with digits and that this promotes a shift to procedural backup strategies. Consistent with this hypothesis, Experiment 1 demonstrated greater word-format costs on retrieval usage for addition than subtraction, which was due to increased counting for addition but not subtraction. Experiment 2 demonstrated greater word-format costs on retrieval for division than multiplication, which was due to increased use of multiplication-fact reference to solve division problems. Format-related strategy shifts away from retrieval reflected both the efficiency of retrieval for a given operation and the availability of viable alternative strategies. The results demonstrate that calculation processes are not abstracted away from problem surface form. The authors propose that retrieval efficiency for arithmetic connects diverse performance and strategy-related effects across key arithmetic factors, including arithmetic operation, numerical size, and numeral format.
TL;DR: Two different parallelization techniques to speedup the GF(p) elliptic curve multiplication in affine coordinates and the corresponding architectures are proposed and show better throughput of the proposed implementations as compared to existing reported architectures.
TL;DR: This paper presents a simple but highly useful modification of the conventional hardware implementation of accumulation in finite field over GF(2m) by performing the accumulation operation using m number of T flip-flops instead of using a combination ofm number of XOR gates with equal number of D flip-Flops in dependent loop structures.
Abstract: Finite field accumulation is the simplest of all the finite field operations, but at the same time, it is one of the most frequently encountered operations in finite field arithmetic. In this paper, we present a simple but highly useful modification of the conventional hardware implementation of accumulation in finite field over GF(2m) . The critical path, as well as, the hardware-complexity are reduced in the proposed design by performing the accumulation operation using m number of T flip-flops instead of using a combination of m number of XOR gates with equal number of D flip-flops in dependent loop structures. The conventional design is found to involve nearly 39% more area, 53% more delay, and 40% more maximum ac power consumption compared with the proposed accumulator. The proposed finite field accumulator is used further for the implementation of serial/parallel polynomial-basis finite field multiplication and bit-serial inter-conversion between polynomial basis representation and normal basis representation over GF(2m). The area-time complexity of the proposed bit-serial/parallel multiplier is less than half of the best of the corresponding existing structures. The structure proposed for digit-serial/parallel multiplication for trinomials is found to involve nearly 56% less area-time complexity compared with the best of the corresponding existing multipliers; and the existing design of bit-serial basis conversion is found to involve nearly twice area-time complexity compared with the proposed design using the proposed finite field accumulator.
TL;DR: In this article, a method for performing private retrieval of information from a database is presented, in which an index corresponding to information to be retrieved from the database and a query that does not reveal the index to the database is generated.
Abstract: A method, article of manufacture and apparatus for performing private retrieval of information from a database is disclosed. In one embodiment, the method comprising obtaining an index corresponding to information to be retrieved from the database and generating a query that does not reveal the index to the database. The query is an arithmetic function of the index and a secret value, wherein the arithmetic function includes a multiplication group specified by a modulus of a random value whose order is divisible by a prime power, such that the prime power is an order of the random value. The secret value is an arithmetic function of the index that comprises a factorization into prime numbers of the modulus. The method further comprises communicating the query to the database for execution of the arithmetic function against the entirety of the database.
TL;DR: In this paper, the authors discuss implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs, and outline an algorithm and various optimizations, and identify potential future improvements and challenging tasks, which is faster on matrix having many high fill-ratio blocks but slower on matrices with low number of non-zero elements per row.
Abstract: We discuss implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs. We outline an algorithm and various optimizations, and identify potential future improvements and challenging tasks. In comparison with previously published implementation, our implementation is faster on matrices having many high fill-ratio blocks but slower on matrices with low number of non-zero elements per row.
TL;DR: A novel parallelization method utilizing point operation interleaving utilizing parallel field multipliers is presented and FPGA implementations are described showing the practical feasibility of this method.
TL;DR: The training led to a change in calculation strategies: Prior to training, the patient used predominantly time-consuming back-up strategies, after training he relied increasingly on the direct retrieval of arithmetic facts from long-term memory.
Abstract: The present study investigates the effects of a training of arithmetic fact retrieval in a patient suffering from particular difficulties with multiplication facts. Over a period of four weeks simple multiplication facts were trained extensively. The outcome of the training was assessed behaviourally and changes in cerebral activation patterns were investigated using fMRI. The training led to a change in calculation strategies: Prior to training, the patient used predominantly time-consuming back-up strategies, after training he relied increasingly on the direct retrieval of arithmetic facts from long-term memory. Regarding the fMRI results, prefrontal activations were observed for untrained problems, which can be attributed to the application of back-up strategies strongly relying on fronto-executive functions. Interestingly, significant foci of activation for both trained and untrained items were found in the angular gyrus of the right hemisphere, which, however, differed in their exact localisation. For the trained condition, activations were observed in anterior parts of the angular gyrus which may be related to the training-based automatisation in fact retrieval. Activations in the untrained condition were found in a more posterior portion of the angular gyrus, that might be attributable to one of the patient's back-up strategies, namely to recite a whole multiplication row to get to the correct answer.
TL;DR: This paper first explores how to relax the data dependency that exists between multiplication, quotient determination, and modular reduction in the conventional Montgomery modular multiplication algorithm, and proposes a new modularmultiplication algorithm for high-speed hardware design.
Abstract: Modular exponentiation in public-key cryptosystems is usually achieved by repeated modular multiplications on large integers. Designing high-speed modular multiplication is thus very crucial to speed up the decryption/encryption process. In this paper, we first explore how to relax the data dependency that exists between multiplication, quotient determination, and modular reduction in the conventional Montgomery modular multiplication algorithm. Then, we propose a new modular multiplication algorithm for high-speed hardware design. The speed improvement is achieved by reducing the critical path delay from the 4-to-2 to 3-to-2 carry-save addition. The resulting time complexity of our development is further decreased by simultaneously performing the multiplication and modular reduction processes. Experimental results show that the developed modular multiplication can operate at speeds higher than those of related work. When the proposed modular multiplication is applied to modular exponentiation, both time and area-time advantages are obtained.
TL;DR: In this paper, techniques for optimizing sparse matrix-vector multiplication (SpMV) on a graphics processing unit (GPU) are provided. The techniques include receiving a sparse matrixvector multiplication, analyzing the sparse matrix vector multiplication to identify one or more optimizations, and determining whether the sparse vector multiplication is to be reused across computation.
Abstract: Techniques for optimizing sparse matrix-vector multiplication (SpMV) on a graphics processing unit (GPU) are provided. The techniques include receiving a sparse matrix-vector multiplication, analyzing the sparse matrix-vector multiplication to identify one or more optimizations, wherein analyzing the sparse matrix-vector multiplication to identify one or more optimizations comprises analyzing a non-zero pattern for one or more optimizations and determining whether the sparse matrix-vector multiplication is to be reused across computation, optimizing the sparse matrix-vector multiplication, wherein optimizing the sparse matrix-vector multiplication comprises optimizing global memory access, optimizing shared memory access and exploiting reuse and parallelism, and outputting an optimized sparse matrix-vector multiplication.
TL;DR: By integrating ODPs within H(k), the best existing heuristic algorithm for single constant multiplication (SCM), this work can on average significantly improve the run time of the algorithm (typically by one order of magnitude) while still reducing the number of adders.
Abstract: Common subexpression elimination (CSE) algorithms try to minimize the number of adders (or subtracters) required to implement constant multiplication by searching and substituting common patterns in the CSE representation of a constant. CSE algorithms, in general, cannot find certain patterns due to inherent restrictions in the CSE representation. We propose overlapping digit patterns (ODPs) to remove some of these restrictions. We integrate ODPs into H(k), the best existing heuristic algorithm for single constant multiplication (SCM). H(k) is not applicable to the multiple constant multiplication (MCM) problem, so we cannot consider this problem. Generally, H(k) finds solutions very close to optimal, so there is a strict limitation on any further improvement which applies to any new heuristic. Instead, by integrating ODPs within H(k), we can on average significantly improve the run time of the algorithm (typically by one order of magnitude) while still reducing the number of adders.
TL;DR: This work presents the algorithm to multiply univariate polynomials with integer coefficients efficiently using the Number Theoretic transform (NTT) on Graphics Processing Units (GPU) and compared the approach with CPU-based implementations of polynomial and large integer multiplication provided by NTL and GMP libraries.
Abstract: We present the algorithm to multiply univariate polynomials with integer coefficients efficiently using the Number Theoretic transform (NTT) on Graphics Processing Units (GPU). The same approach can be used to multiply large integers encoded as polynomials. Our algorithm exploits fused multiply-add capabilities of the graphics hardware. NTT multiplications are executed in parallel for a set of distinct primes followed by reconstruction using the Chinese Remainder theorem (CRT) on the GPU. Our benchmarking experiences show the NTT multiplication performance up to 77 GMul/s. We compared our approach with CPU-based implementations of polynomial and large integer multiplication provided by NTL and GMP libraries.
TL;DR: A framework for fully redundant decimal arithmetic, where all operands and results belong to the same redundant decimal number system and can be stored and later used as operands of further decimal operations.
Abstract: Hardware implementation of all the basic radix-10 arithmetic operations is evolving as a new trend in the design and implementation of general purpose digital processors. Redundant representation of partial products and remainders is common in the multiplication and division hardware algorithms, respectively. Carry-free implementation of the more frequent add/subtract operations, with the byproduct of enhancing the speed of multiplication and division, is possible with redundant number representation. However, conversion of redundant results to conventional representations entails slow carry propagation that can be avoided if the results are kept in redundant format for later use as operands of other arithmetic operations. Given that redundant decimal representations, contrary to redundant binary, do not necessarily require extra storage, we are motivated to develop a framework for fully redundant decimal arithmetic, where all operands and results belong to the same redundant decimal number system and can be stored and later used as operands of further decimal operations. In this paper, we present a new faster decimal signed digit add/sub unit and show how it can be efficiently used in the design of decimal multipliers and dividers, where all operands and results are represented with the same redundant digit set [–7, 7].
TL;DR: This research employed a new procedure with the aim of assessing the automatic retrieval of multiplication more directly, and argues that this procedure is highly valuable for exploring the mechanisms involved in simple arithmetic solving.
Abstract: It is a common assumption for multiplication-solving models that single-digit multiplications are automatically retrieved. However, the experimental evidence for this is based on paradigms under suspicion. In this research, we employed a new procedure with the aim of assessing the automatic retrieval of multiplication more directly. In two experiments, multiplication automatism was studied using briefly presented primes (stimulus onset asynchrony = 48 msec) in a number-naming task. In Experiment 1, in the congruent conditions, the target and the prime were the same numbers (e.g., prime, 6; target, 6) or the target was the solution to the multiplication prime (e.g., prime, 2×3=; target, 6). In the incongruent conditions, no relationship existed between the primes and the targets (e.g., prime, 32; target, 6; or prime, 4×8=; target, 6). Experiment 2 explored the relevance of the equal sign for the multiplication-priming effect. Data showed that naming was faster when the solution of the multiplication prime matched the target, as compared with the incongruent condition (multiplication-priming effect), and that these effects were found irrespective of the presence of the equal sign. The fact that this priming effect was found even though the participants were unaware of the presentation of the primes supports the automatic character of single-digit multiplication. We conclude by arguing that this procedure is highly valuable for exploring the mechanisms involved in simple arithmetic solving.
TL;DR: A variety of algorithms for basic one by one digit multiplication are proposed and FPGA implementations are presented, and time and area results for sequential and combinational implementations show better figures compared with previous published work.
Abstract: This paper presents a number of approaches to implement decimal multiplication algorithms on Xilinx FPGA’s. A variety of algorithms for basic one by one digit multiplication are proposed and FPGA implementations are presented. Later on N by one digit and N by M digit multiplications are studied. Time and area results for sequential and combinational implementations show better figures compared with previous published work. Comparisons against binary fully-optimized multipliers emphasize the interest of the proposed design techniques.
TL;DR: In this article, the authors describe image encryption by combining the images with several matrices made with letters/numerals and placed in the input plane of a double random phase encoding (DRPE) system.
TL;DR: In this article, a new class of multiplications of distributions in one dimension merging together two different regularizations of distributions is introduced, and some of the features of these multiplications are discussed in a certain detail.
Abstract: We introduce a new class of multiplications of distributions in one dimension merging together two different regularizations of distributions. Some of the features of these multiplications are discussed in a certain detail.
We use our theory to study a certain number of examples, involving products between Dirac delta functions and its successive derivatives.