Top 445 papers published in the topic of Multiplication in 2013

Showing papers on "Multiplication published in 2013"

Book Chapter•10.1007/978-3-642-40041-4_5•

Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based

[...]

Craig Gentry¹, Amit Sahai², Brent Waters³•Institutions (3)

IBM¹, University of California, Los Angeles², University of Texas at Austin³

18 Aug 2013

TL;DR: In this work, a comparatively simple fully homomorphic encryption (FHE) scheme based on the learning with errors (LWE) problem is described, with a new technique for building FHE schemes called the approximate eigenvector method.

...read moreread less

Abstract: We describe a comparatively simple fully homomorphic encryption (FHE) scheme based on the learning with errors (LWE) problem. In previous LWE-based FHE schemes, multiplication is a complicated and expensive step involving “relinearization”. In this work, we propose a new technique for building FHE schemes that we call the approximate eigenvector method. In our scheme, for the most part, homomorphic addition and multiplication are just matrix addition and multiplication. This makes our scheme both asymptotically faster and (we believe) easier to understand.

...read moreread less

1,709 citations

Proceedings Article•10.1109/IPDPS.2013.80•

Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication

[...]

James Demmel¹, David Eliahu¹, Armando Fox¹, Shoaib Kamil², Benjamin Lipshitz¹, Oded Schwartz¹, Omer Spillinger¹ - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

20 May 2013

TL;DR: This work obtains the first communication-optimal algorithm for all dimensions of rectangular matrices by combining the dimension-splitting technique with the recursive BFS/DFS approach, and shows significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.

...read moreread less

Abstract: Communication-optimal algorithms are known for square matrix multiplication. Here, we obtain the first communication-optimal algorithm for all dimensions of rectangular matrices. Combining the dimension-splitting technique of Frigo, Leiserson, Prokop and Ramachandran (1999) with the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (2012) allows for a communication-optimal as well as cache and network-oblivious algorithm. Moreover, the implementation is simple: approximately 50 lines of code for the shared-memory version. Since the new algorithm minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well in practice on both shared and distributed-memory machines. We show significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.

...read moreread less

152 citations

Proceedings Article•10.1145/2486159.2486196•

Communication optimal parallel multiplication of sparse random matrices

[...]

Grey Ballard¹, Aydin Buluc², James Demmel¹, Laura Grigori³, Benjamin Lipshitz¹, Oded Schwartz¹, Sivan Toledo⁴ - Show less +3 more•Institutions (4)

University of California, Berkeley¹, Lawrence Berkeley National Laboratory², French Institute for Research in Computer Science and Automation³, Tel Aviv University⁴

23 Jul 2013

TL;DR: Two new parallel algorithms are obtained and it is proved that they match the expected communication cost lower bound, and hence they are optimal.

...read moreread less

Abstract: Parallel algorithms for sparse matrix-matrix multiplication typically spend most of their time on inter-processor communication rather than on computation, and hardware trends predict the relative cost of communication will only increase. Thus, sparse matrix multiplication algorithms must minimize communication costs in order to scale to large processor counts.In this paper, we consider multiplying sparse matrices corresponding to Erdős-Renyi random graphs on distributed-memory parallel machines. We prove a new lower bound on the expected communication cost for a wide class of algorithms. Our analysis of existing algorithms shows that, while some are optimal for a limited range of matrix density and number of processors, none is optimal in general. We obtain two new parallel algorithms and prove that they match the expected communication cost lower bound, and hence they are optimal.

...read moreread less

133 citations

Journal Article•10.1134/S0965542513120129•

The bilinear complexity and practical algorithms for matrix multiplication

[...]

A. V. Smirnov

27 Dec 2013-Computational Mathematics and Mathematical Physics

TL;DR: A practical algorithm for the exact multiplication of square n × n matrices and the asymptotic arithmetic complexity of this algorithm is O(n2.7743).

...read moreread less

Abstract: A method for deriving bilinear algorithms for matrix multiplication is proposed. New estimates for the bilinear complexity of a number of problems of the exact and approximate multiplication of rectangular matrices are obtained. In particular, the estimate for the boundary rank of multiplying 3 × 3 matrices is improved and a practical algorithm for the exact multiplication of square n × n matrices is proposed. The asymptotic arithmetic complexity of this algorithm is O(n2.7743).

...read moreread less

130 citations

Journal Article•

Two Interpretations of Multidimensional RDM Interval Arithmetic-Multiplication and Division

[...]

Andrzej Piegat¹, Marek Landowski•Institutions (1)

West Pomeranian University of Technology¹

01 Dec 2013-International Journal of Fuzzy Systems

TL;DR: The paper presents two possible interpretations and realization ways of interval multiplication and division: the possibilistic, unconditional interpretation that is of great meaning for fuzzy arithmetic and fuzzy systems, and the probabilistic, conditional interpretation that requires either knowledge of probability density distributions or assumptions concerning these distributions.

...read moreread less

Abstract: The paper presents two possible interpretations and realization ways of interval multiplication and division: the possibilistic, unconditional interpretation that is of great meaning for fuzzy arithmetic and fuzzy systems, and the probabilistic, conditional interpretation that requires either knowledge of probability density distributions or assumptions concerning these distributions. The possibilistic interpretation has a great significance not only for fuzzy arithmetic but also for other sciences that use it such as Computing with Words, Grey Systems, etc. These two interpretations are explained in frame of a new, multidimensional RDM interval-arithmetic. The possibility of realization of interval-arithmetic operations in two ways is an argument for reconciliation of two competing scientific groups that propagate two approaches to uncertainty modeling: the probabilistic and possibilistic one. For many years Professor Zadeh has been claiming in his publications that both approaches are not contradictory but rather complementary.

...read moreread less

70 citations

Journal Article•10.1109/TCSII.2013.2251958•

Low-Cost FIR Filter Designs Based on Faithfully Rounded Truncated Multiple Constant Multiplication/Accumulation

[...]

Shen-Fu Hsiao¹, Jun-Hong Zhang Jian¹, Ming-Chih Chen²•Institutions (2)

National Sun Yat-sen University¹, National Kaohsiung First University of Science and Technology²

03 Apr 2013-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: Low-cost finite impulse response (FIR) designs are presented using the concept of faithfully rounded truncated multipliers and nonuniform coefficient quantization with proper filter order is proposed to minimize total area cost.

...read moreread less

Abstract: Low-cost finite impulse response (FIR) designs are presented using the concept of faithfully rounded truncated multipliers. We jointly consider the optimization of bit width and hardware resources without sacrificing the frequency response and output signal precision. Nonuniform coefficient quantization with proper filter order is proposed to minimize total area cost. Multiple constant multiplication/accumulation in a direct FIR structure is implemented using an improved version of truncated multipliers. Comparisons with previous FIR design approaches show that the proposed designs achieve the best area and power results.

...read moreread less

69 citations

Patent•

Apparatus for performing matrix vector multiplication approximation using crossbar arrays of resistive memory devices

[...]

Richard Linderman, Qing Wu, Garrett S. Rose, Hai Li, Yi Chen, Miao Hu - Show less +2 more

13 Aug 2013

TL;DR: In this article, a crossbar array formed by resistive memory devices serves as a memory array that stores the coefficients of a matrix, combined with input and output analog circuits, is used to perform matrix-vector multiplication approximation operations.

...read moreread less

Abstract: An apparatus that performs the mathematical matrix-vector multiplication approximation operations using crossbar arrays of resistive memory devices (e.g. memristor, resistive random-access memory, spintronics, etc.). A crossbar array formed by resistive memory devices serves as a memory array that stores the coefficients of a matrix. Combined with input and output analog circuits, the crossbar array system realizes the method of performing matrix-vector multiplication approximation operations with significant performance, area and energy advantages over existing methods and designs. This invention also includes an extended method that realizes the auto-associative neural network recall function using the resistive memory crossbar architecture.

...read moreread less

54 citations

Journal Article•10.1109/TC.2012.35•

High-Speed Parallel Decimal Multiplication with Redundant Internal Encodings

[...]

Liu Han¹, Seok-Bum Ko¹•Institutions (1)

University of Saskatchewan¹

01 May 2013-IEEE Transactions on Computers

TL;DR: By considering the tradeoff of designs among three components, the overall delay of the proposed 16 × 16-digit multiplier takes about 11 percent less timing delay with 2 percent less area compared to the current fastest design.

...read moreread less

Abstract: The decimal multiplication is one of the most important decimal arithmetic operations which have a growing demand in the area of commercial, financial, and scientific computing. In this paper, we propose a parallel decimal multiplication algorithm with three components, which are a partial product generation, a partial product reduction, and a final digit-set conversion. First, a redundant number system is applied to recode not only the multiplier, but also multiples of the multiplicand in signed-digit (SD) numbers. Furthermore, we present a multioperand SD addition algorithm to reduce the partial product array. Finally, a digit-set conversion algorithm with a hybrid prefix network to decrease the number of the logic gates on the critical path is discussed. An analysis of the timing delay and an HDL model synthesized under 90 nm technology show that by considering the tradeoff of designs among three components, the overall delay of the proposed 16 × 16-digit multiplier takes about 11 percent less timing delay with 2 percent less area compared to the current fastest design.

...read moreread less

52 citations

Journal Article•10.1109/TCSI.2013.2244434•

Resistive Computing: Memristors-Enabled Signal Multiplication

[...]

Sangho Shin¹, Kyungmin Kim¹, Sung-Mo Kang¹•Institutions (1)

University of California, Santa Cruz¹

08 Apr 2013-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: A memristive multiplier circuit demonstrates a fast and highly sensitive pattern recognition for highly complex inputs.

...read moreread less

Abstract: Memristors-based resistive logic computation units are introduced. By controlling the memristors' conditional set operation adaptively to one of the input polarities, bipolar signal multiplication of an input and a stored reference bit is performed by unipolar memristor devices and control switches. The multiplication result is registered in an output nonvolatile memristor so that the computed output can be accessed anytime later on by reading the output memristor's state. A memristive multiplier circuit demonstrates a fast and highly sensitive pattern recognition for highly complex inputs.

...read moreread less

51 citations

Journal Article•10.1088/1742-6596/473/1/012002•

Free products of large random matrices - a short review of recent developments

[...]

Zdzislaw Burda¹•Institutions (1)

Jagiellonian University¹

16 Dec 2013

TL;DR: In this paper, a generalization of the law of free multiplication to non-Hermitian matrices is discussed and a couple of examples illustrating how to use these methods in practice.

...read moreread less

Abstract: We review methods to calculate eigenvalue distributions of products of large random matrices. We discuss a generalization of the law of free multiplication to non-Hermitian matrices and give a couple of examples illustrating how to use these methods in practice. In particular we calculate eigenvalue densities of products of Gaussian Hermitian and non-Hermitian matrices including combinations of GUE and Ginibre matrices.

...read moreread less

49 citations

Journal Article•10.1103/PHYSREVA.87.012310•

Faster quantum number factoring via circuit synthesis

[...]

Igor L. Markov, Mehdi Saeedi¹•Institutions (1)

University of Southern California¹

14 Jan 2013-Physical Review A

TL;DR: A circuit-synthesis procedure exploits spectral properties of multiplication operators and constructs optimized circuits from the traces of the execution of an appropriate GCD algorithm, reducing gate counts and circuit latency by up to 4-5 times.

...read moreread less

Abstract: A major obstacle to implementing Shor's quantum number-factoring algorithm is the large size of modular-exponentiation circuits. We reduce this bottleneck by customizing reversible circuits for modular multiplication to individual runs of Shor's algorithm. Our circuit-synthesis procedure exploits spectral properties of multiplication operators and constructs optimized circuits from the traces of the execution of an appropriate GCD algorithm. Empirically, gate counts are reduced by 4-5 times, and circuit latency is reduced by larger factors.

...read moreread less

Patent•

Method and system for homomorphicly randomizing an input

[...]

Aviad Kipnis¹, Eliphaz Hibshoosh¹•Institutions (1)

Cisco Systems, Inc.¹

25 Jul 2013

TL;DR: A fully homomorphic method and system for randomizing an input, wherein all computations are over a commutative ring is described in this article, which can also be used for verifying that a returned result of a calculation performed by a third party is valid for any of the calculations described herein.

...read moreread less

Abstract: A fully homomorphic method and system for randomizing an input, wherein all computations are over a commutative ring is described. Equivalent methods for performing the randomization using matrices and polynomials are detailed, as well as ways to mix the matrix and polynomial functions. Addition, multiplication, and division of the matrix and polynomial functions is further described. By performing computations of the functions modulo N over a ring Z N , the functions are usable as encryption functions. The method and system can also be used for verifying that a returned result of a calculation performed by a third party is valid for any of the calculations described herein. Related methods, systems, and apparatus are also described.

...read moreread less

Book Chapter•10.1007/978-3-642-40588-4_10•

Fast Software Polynomial Multiplication on ARM Processors Using the NEON Engine

[...]

Danilo F. Câmara¹, Conrado Porto Lopes Gouvêa¹, Julio López¹, Ricardo Dahab¹•Institutions (1)

State University of Campinas¹

2 Sep 2013

TL;DR: A novel software multiplier for performing a polynomial multiplication of two 64-bit binary polynomials based on the VMULL instruction included in the NEON engine supported in many ARM processors is described, obtaining a fast software multiplication in the binary field $\mathbb{F}_{2^m}$, which is up to 45% faster compared to the best known algorithm.

...read moreread less

Abstract: Efficient algorithms for binary field operations are required in several cryptographic operations such as digital signatures over binary elliptic curves and encryption. The main performance-critical operation in these fields is the multiplication, since most processors do not support instructions to carry out a polynomial multiplication. In this paper we describe a novel software multiplier for performing a polynomial multiplication of two 64-bit binary polynomials based on the VMULL instruction included in the NEON engine supported in many ARM processors. This multiplier is then used as a building block to obtain a fast software multiplication in the binary field $\mathbb{F}_{2^m}$, which is up to 45% faster compared to the best known algorithm. We also illustrate the performance improvement in point multiplication on binary elliptic curves using the new multiplier, improving the performance of standard NIST curves at the 128- and 256-bit levels of security. The impact on the GCM authenticated encryption scheme is also studied, with new speed records. We present timing results of our software implementation on the ARM Cortex-A8, A9 and A15 processors.

...read moreread less

Proceedings Article•10.1109/RECOSOC.2013.6581517•

Dynamically reconfigurable FIR filter architectures with fast reconfiguration

[...]

Martin Kumm¹, Konrad Moller¹, Peter Zipf¹•Institutions (1)

University of Kassel¹

10 Jul 2013

TL;DR: This work compares two finite impulse response (FIR) filter architectures for FPGAs for which the coefficients can be reconfigured during run-time and found that if the input word size is greater than approximately half the number of coefficients, the LUT based multiplication scheme needs less resources than the DA architecture and vice versa.

...read moreread less

Abstract: This work compares two finite impulse response (FIR) filter architectures for FPGAs for which the coefficients can be reconfigured during run-time. One is a recently proposed filter architecture based on distributed arithmetic (DA) and the other is based on a LUT multiplication scheme. Instead of using the common internal configuration access port (ICAP) for reconfiguration which is able to change the logic as well as the routing, it is sufficient to reconfigure only the logic in the regarded architectures. This is realized by using the configurable look-up table (CFGLUT) primitive of Xilinx that allows reconfiguration times which are orders of magnitudes faster than using ICAP. The resulting FIR filter architectures achieves reconfiguration times of typically less than 100 ns. They can be reconfigured with arbitrary coefficients that are only limited by their length and word size. As their resource consumptions depend on different parameters of the filter, a detailed comparison is done. It turned out that if the input word size is greater than approximately half the number of coefficients, the LUT based multiplication scheme needs less resources than the DA architecture and vice versa.

...read moreread less

Journal Article•10.7763/IJCCE.2013.V2.183•

Optimized Multi-Precision Multiplication for Public-Key Cryptography on Embedded Microprocessors

[...]

Hwajeong Seo, Howon Kim

01 Jan 2013-International Journal of Computer and Communication Engineering

TL;DR: A novel method, i.e., "Carry-Once", is proposed, which reduces the number of intermediate result computation by size of result accumulation and improves all multi-precision multiplication techniques having Intermediate result computation and show performance enhancement in terms of speed by up to 2.5%, compared with best known results.

...read moreread less

Abstract: In this paper, we revisit the previous multi-precision multiplication techniques including "operand-scanning", "hybrid-scanning", "operand-caching", "consecutive operand-caching" and "product-scanning." Particularly, the former four methods execute an intermediate result computation which is process for updating the results with a newly computed result by computing a number of addition operations. This operations is expensive, so efficient implementation is required to boost the performance. For this reason, we propose a novel method, i.e., "Carry-Once", which reduces the number of intermediate result computation by size of result accumulation. The main idea is gathering carry values and updating the values at once. This method improves all multi-precision multiplication techniques having intermediate result computation and show performance enhancement in terms of speed by up to 2.5%, compared with best known results.

...read moreread less

Journal Article•10.1137/100813956•

Hypergraph Partitioning Based Models and Methods for Exploiting Cache Locality in Sparse Matrix-Vector Multiplication

[...]

Kadir Akbudak, Enver Kayaaslan, Cevdet Aykanat

06 Jun 2013-SIAM Journal on Scientific Computing

TL;DR: Sparse matrix-vector multiplication (SpMxV) is a kernel operation widely used in iterative linear solvers that multiplies the sparse matrix by a dense vector repeatedly in these solvers.

...read moreread less

Abstract: Sparse matrix-vector multiplication (SpMxV) is a kernel operation widely used in iterative linear solvers. The same sparse matrix is multiplied by a dense vector repeatedly in these solvers. Matric...

...read moreread less

Posted Content•

Pointwise multiplication on vector-valued function spaces with power weights

[...]

Martin Meyries¹, Mark Veraar²•Institutions (2)

Martin Luther University of Halle-Wittenberg¹, Delft University of Technology²

28 Nov 2013-arXiv: Functional Analysis

TL;DR: In this article, the authors investigated pointwise multipliers on vector-valued function spaces with Muckenhoupt weights and proved that the characteristic function of the half-space is a pointwise multiplier on Besselpotential spaces with values in a UMD Banach space.

...read moreread less

Abstract: We investigate pointwise multipliers on vector-valued function spaces over $\mathbb{R}^d$, equipped with Muckenhoupt weights. The main result is that in the natural parameter range, the characteristic function of the half-space is a pointwise multiplier on Bessel-potential spaces with values in a UMD Banach space. This is proved for a class of power weights, including the unweighted case, and extends the classical result of Shamir and Strichartz. The multiplication estimate is based on the paraproduct technique and a randomized Littlewood-Paley decomposition. An analogous result is obtained for Besov and Triebel-Lizorkin spaces.

...read moreread less

Proceedings Article•10.1109/FPL.2013.6645543•

Multiple constant multiplication with ternary adders

[...]

Martin Kumm¹, Martin Hardieck¹, Jens Willkomm¹, Peter Zipf¹, Uwe Meyer-Baese² - Show less +1 more•Institutions (2)

University of Kassel¹, Florida State University²

1 Sep 2013

TL;DR: This work investigates the optimization of pipelined MCM circuits which include ternary adders, and shows experimentally that 27% less operations are needed on average by using ternaries, resulting in 15% slice and 10% ALM reductions, respectively.

...read moreread less

Abstract: The scaling operation, i. e., the multiplication with a single constant is a frequently used operation in many kinds of numeric algorithms. The multiple constant multiplication (MCM) is a generalization where a variable is multiplied by several constants. This kind of operation is heavily used, e. g., in digital filters or discrete transforms. It was shown in recent work that small, fast and power efficient MCM implementations can be realized by using the fast carry chains of FPGAs rather than wasting specialized embedded multipliers. However, in the work so far, only common two-input adders were used. As FPGAs today support ternary adders, i. e., adders with three inputs, this work investigates the optimization of pipelined MCM circuits which include ternary adders. It is shown experimentally that 27% less operations are needed on average by using ternary adders, resulting in 15% slice (Xilinx) and 10% ALM (Altera) reductions, respectively.

...read moreread less

Journal Article•10.1016/J.JMATHB.2012.09.002•

Preservice Elementary Teachers' Knowledge for Teaching the Associative Property of Multiplication: A Preliminary Analysis.

[...]

Meixia Ding¹, Xiaobao Li², Mary Margaret Capraro³•Institutions (3)

Temple University¹, Widener University², Texas A&M University³

01 Mar 2013-The Journal of Mathematical Behavior

TL;DR: This paper examined preservice elementary teachers' knowledge for teaching the associative property (AP) of multiplication and found that most elementary teachers were unable to use concrete contexts (e.g., pictorial representations and word problems) to illustrate AP of multiplication conceptually, particularly due to a fragile understanding of the meaning of multiplication.

...read moreread less

Journal Article•10.3389/FNHUM.2013.00189•

The neural bases of the multiplication problem-size effect across countries.

[...]

Jérôme Prado¹, Jiayan Lu², Li Liu², Qi Dong², Xinlin Zhou², James R. Booth³ - Show less +2 more•Institutions (3)

Centre national de la recherche scientifique¹, Beijing Normal University², Northwestern University³

13 May 2013-Frontiers in Human Neuroscience

TL;DR: The results indicate that differences in educational practices might affect the neural bases of symbolic arithmetic, and suggests that the multiplication problem-size effect might be a verbal retrieval effect in Chinese as compared to American participants.

...read moreread less

Abstract: Multiplication problems involving large numbers (e.g., 9 × 8) are more difficult to solve than problems involving small numbers (e.g., 2 × 3). Behavioral research indicates that this problem-size effect might be due to different factors across countries and educational systems. However, there is no neuroimaging evidence supporting this hypothesis. Here, we compared the neural correlates of the multiplication problem-size effect in adults educated in China and the United States. We found a greater neural problem-size effect in Chinese than American participants in bilateral superior temporal regions associated with phonological processing. However, we found a greater neural problem-size effect in American than Chinese participants in right intra-parietal sulcus (IPS) associated with calculation procedures. Therefore, while the multiplication problem-size effect might be a verbal retrieval effect in Chinese as compared to American participants, it may instead stem from the use of calculation procedures in American as compared to Chinese participants. Our results indicate that differences in educational practices might affect the neural bases of symbolic arithmetic.

...read moreread less

Journal Article•10.1109/TVLSI.2011.2181434•

Low-Complexity Multiplier for $GF(2^{m})$ Based on All-One Polynomials

[...]

Jiafeng Xie¹, Pramod Kumar Meher², Jianjun He¹•Institutions (2)

Central South University¹, Institute for Infocomm Research Singapore²

01 Jan 2013-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: From the application-specific integrated circuit and field-programmable gate array synthesis results, it is found that the proposed design provides significantly less area-delay and power-delay complexities over the best of the existing designs.

...read moreread less

Abstract: This paper presents an area-time-efficient systolic structure for multiplication over GF(2m) based on irreducible all-one polynomial (AOP). We have used a novel cut-set retiming to reduce the duration of the critical-path to one XOR gate delay. It is further shown that the systolic structure can be decomposed into two or more parallel systolic branches, where the pair of parallel systolic branches has the same input operand, and they can share the same input operand registers. From the application-specific integrated circuit and field-programmable gate array synthesis results we find that the proposed design provides significantly less area-delay and power-delay complexities over the best of the existing designs.

...read moreread less

Proceedings Article•10.1109/ICASSP.2013.6637825•

A multiplication-free framework for signal processing and applications in biomedical image analysis

[...]

Alexander Suhre¹, Furkan Keskin¹, Tulin Ersahin¹, Rengul Cetin-Atalay¹, Rashid Ansari², A. Enis Cetin¹ - Show less +2 more•Institutions (2)

Bilkent University¹, University of Illinois at Chicago²

26 May 2013

TL;DR: An application to the problem of cancer cell line image classification is presented that uses the notion of a co-difference matrix that is analogous to a covariance matrix except that the vector products are based on the new proposed framework.

...read moreread less

Abstract: A new framework for signal processing is introduced based on a novel vector product definition that permits a multiplier-free implementation. First a new product of two real numbers is defined as the sum of their absolute values, with the sign determined by product of the hard-limited numbers. This new product of real numbers is used to define a similar product of vectors in RN. The new vector product of two identical vectors reduces to a scaled version of the l1 norm of the vector. The main advantage of this framework is that it yields multiplication-free computationally efficient algorithms for performing some important tasks in signal processing. An application to the problem of cancer cell line image classification is presented that uses the notion of a co-difference matrix that is analogous to a covariance matrix except that the vector products are based on our new proposed framework. Results show the effectiveness of this approach when the proposed co-difference matrix is compared with a covariance matrix.

...read moreread less

Journal Article•

Computation and Communication Efficient Key Distribution Protocol for Secure Multicast Communication

[...]

Pandi Vijayakumar, S. Bose, Arputharaj Kannan, L. Jegatha Deborah

29 Apr 2013-Ksii Transactions on Internet and Information Systems

TL;DR: A new key distribution protocol that focuses on the reduction of computation complexity by performing lesser numbers of multiplication operations using a ternary-tree approach during key updating and reducing the amount of information communicated to the group members during the update operations in the key content is proposed.

...read moreread less

Abstract: Secure multimedia multicast applications involve group communications where group membership requires secured dynamic key generation and updating operations. Such operations usually consume high computation time and therefore designing a key distribution protocol with reduced computation time is necessary for multicast applications. In this paper, we propose a new key distribution protocol that focuses on two aspects. The first one aims at the reduction of computation complexity by performing lesser numbers of multiplication operations using a ternary-tree approach during key updating. Moreover, it aims to optimize the number of multiplication operations by using the existing Karatsuba divide and conquer approach for fast multiplication. The second aspect aims at reducing the amount of information communicated to the group members during the update operations in the key content. The proposed algorithm has been evaluated based on computation and communication complexity and a comparative performance analysis of various key distribution protocols is provided. Moreover, it has been observed that the proposed algorithm reduces the computation and communication time significantly.

...read moreread less

Journal Article•10.1016/J.IPL.2013.02.011•

An algorithm for fast multiplication of sedenions

[...]

Aleksandr Cariow¹, Galina Cariowa¹•Institutions (1)

West Pomeranian University of Technology¹

01 May 2013-Information Processing Letters

TL;DR: A rationalized algorithm for calculating the product of sedenions is presented which reduces the number of underlying multiplications and can compute the same result in only 122 multiplications (or multipliers - in hardware implementation case) and 298 additions.

...read moreread less

10.15866/IREIT.V1I6.6427•

Improving Matrix Multiplication Using Parallel Computing

[...]

Haitham A. Alasha'ary¹, Khaled Matrouk¹, Abdullah Alhasanat¹, Ziad A. Alqadi¹, Hasan Al-Shalabi¹ - Show less +1 more•Institutions (1)

Al-Hussein Bin Talal University¹

30 Nov 2013

TL;DR: In this paper, a method of matrix multiplication was chosen, and a performance analysis was evaluated, and it was seen that the chosen method was very powerful when dealing with matrices with large sizes and implementing the method using parallel computing based on openMP libraries.

...read moreread less

Abstract: Multiplication of large matrices requires a lot of computation time as its complexity is O(n3). Because most image processing applications require higher computational throughputs with minimum time, many sequential and parallel algorithms are developed. In this paper, a method of matrix multiplication was chosen, and analyzed. A performance analysis was evaluated, and it was seen that the chosen method was very powerful when dealing with matrices with large sizes and implementing the method using parallel computing based on openMP libraries

...read moreread less

Journal Article•10.1112/JTOPOL/JTS032•

The multiplication on BP

[...]

Maria Basterra, Michael A. Mandell¹•Institutions (1)

Indiana University¹

01 Jun 2013-Journal of Topology

Proceedings Article•10.1109/ITW.2013.6691274•

Enabling multiplication in lattice codes via Construction A

[...]

Frederique Oggier¹, Jean-Claude Belfiore²•Institutions (2)

Nanyang Technological University¹, Télécom ParisTech²

23 Dec 2013

TL;DR: This paper shows how some of these ideal lattices can be constructed from polynomial codes (generalization of cyclic codes) via Construction A, and illustrates how these lattices enable multiplication.

...read moreread less

Abstract: As a first step towards distributed computations in a wireless network, we introduce ideal lattices, that is lattices built over an ideal of a ring of integers in a number field, as a tool for constructing lattice codes at the physical layer. These lattices are not only additive groups as all lattices, but they are also equipped with a multiplication, which enables polynomial operations at each node of the wireless network. In this paper, we show how some of these ideal lattices can be constructed from polynomial codes (generalization of cyclic codes) via Construction A, and illustrate how these lattices enable multiplication.

...read moreread less

A reconstruction of Joncourt's table of triangular numbers (1762)

[...]

Denis Roegel

1 Jan 2013

TL;DR: In this article, an analysis and reconstruction of Joncourt's table of triangular numbers is presented, which was an alternative to other methods for the computation of squares, the extraction of square roots, and even the multiplication.

...read moreread less

Abstract: This is an analysis and reconstruction of Joncourt's table of triangular numbers, one of only very few such tables, which was an alternative to other methods for the computation of squares, the extraction of square roots, and even the multiplication.

...read moreread less

Proceedings Article•10.1109/HIPC.2013.6799135•

Efficient sparse matrix multiple-vector multiplication using a bitmapped format

[...]

Ramaseshan Kannan¹•Institutions (1)

University of Manchester¹

1 Dec 2013

TL;DR: The mapped blocked row format is proposed: a bitmapped sparse matrix format that stores entries as blocks without a fill overhead, thereby offering blocking without additional storage and bandwidth overheads.

...read moreread less

Abstract: The problem of obtaining high computational throughput from sparse matrix multiple-vector multiplication routines is considered. Current sparse matrix formats and algorithms have high bandwidth requirements and poor reuse of cache and register loaded entries, which restrict their performance. We propose the mapped blocked row format: a bitmapped sparse matrix format that stores entries as blocks without a fill overhead, thereby offering blocking without additional storage and bandwidth overheads. An efficient algorithm decodes bitmaps using de Bruijn sequences and minimizes the number of conditionals evaluated. Performance is compared with that of popular formats, including vendor implementations of sparse BLAS. Our sparse matrix multiple-vector multiplication algorithm achieves high throughput on all platforms and is implemented using platform neutral optimizations.

...read moreread less

Journal Article•10.1186/1687-6180-2013-111•

FIR filter optimization for video processing on FPGAs

[...]

Martin Kumm¹, Diana Fanghänel¹, Konrad Moller¹, Peter Zipf¹, Uwe Meyer-Baese² - Show less +1 more•Institutions (2)

University of Kassel¹, Florida State University²

25 May 2013-EURASIP Journal on Advances in Signal Processing

TL;DR: This work proposes two optimization techniques for high-speed implementations of the required multiplications with the least possible number of FPGA components and a formulation for the pipelined multiple constant multiplication problem is presented.

...read moreread less

Abstract: Two-dimensional finite impulse response (FIR) filters are an important component in many image and video processing systems. The processing of complex video applications in real time requires high computational power, which can be provided using field programmable gate arrays (FPGAs) due to their inherent parallelism. The most resource-intensive components in computing FIR filters are the multiplications of the folding operation. This work proposes two optimization techniques for high-speed implementations of the required multiplications with the least possible number of FPGA components. Both methods use integer linear programming formulations which can be optimally solved by standard solvers. In the first method, a formulation for the pipelined multiple constant multiplication problem is presented. In the second method, also multiplication structures based on look-up tables are taken into account. Due to the low coefficient word size in video processing filters of typically 8 to 12 bits, an optimal solution is found for most of the filters in the benchmark used. A complexity reduction of 8.5% for a Xilinx Virtex 6 FPGA could be achieved compared to state-of-the-art heuristics.

...read moreread less

...

Expand