TL;DR: INTLAB as mentioned in this paper is a toolbox for Matlab supporting real and complex intervals, and vectors, full matrices and sparse matrices over those, which is designed to be very fast.
Abstract: INTLAB is a toolbox for Matlab supporting real and complex intervals, and vectors, full matrices and sparse matrices over those. It is designed to be very fast. In fact, it is not much slower than the fastest pure floating point algorithms using the fastest compilers available (the latter, of course, without verification of the result). Beside the basic arithmetical operations, rigorous input and output, rigorous standard functions, gradients, slopes and multiple precision arithmetic is included in INTLAB. Portability is assured by implementing all algorithms in Matlab itself with exception of exactly one routine for switching the rounding downwards, upwards and to nearest. Timing comparisons show that the used concept achieves the anticipated speed with identical code on a variety of computers, ranging from PC’s to parallel computers. INTLAB is freeware and may be copied from our home page.
TL;DR: A class of Galois field used to achieve fast finite field arithmetic which is called an Optimal Extension Field (OEF) is introduced, well suited for implementation of public-key cryptosystems based on elliptic and hyperelliptic curves.
Abstract: This contribution introduces a class of Galois field used to achieve fast finite field arithmetic which we call an Optimal Extension Field (OEF). This approach is well suited for implementation of public-key cryptosystems based on elliptic and hyperelliptic curves. Whereas previous reported optimizations focus on finite fields of the form GF(p) and GF(2 m ), an OEF is the class of fields GF(p m ), for p a prime of special form and m a positive integer. Modern RISC workstation processors are optimized to perform integer arithmetic on integers of size up to the word size of the processor. Our construction employs well-known techniques for fast finite field arithmetic which fully exploit the fast integer arithmetic found on these processors. In this paper, we describe our methods to perform the arithmetic in an OEF and the methods to construct OEFs. We provide a list of OEFs tailored for processors with 8, 16, 32, and 64 bit word sizes. We report on our application of this approach to construction of elliptic curve cryptosystems and demonstrate a substantial performance improvement over all previous reported software implementations of Galois field arithmetic for elliptic curves.
TL;DR: Optimal Extension Fields (OEFs) as discussed by the authors are a class of Galois fields used to achieve fast finite field arithmetic which are well suited for implementation of public-key cryptosystems based on elliptic and hyperelliptic curves.
Abstract: This contribution introduces a class of Galois field used to achieve fast finite field arithmetic which we call an Optimal Extension Field (OEF). This approach is well suited for implementation of public-key cryptosystems based on elliptic and hyperelliptic curves. Whereas previous reported optimizations focus on finite fields of the form GF(p) and GF(2 m ), an OEF is the class of fields GF(p m ), for p a prime of special form and m a positive integer. Modern RISC workstation processors are optimized to perform integer arithmetic on integers of size up to the word size of the processor. Our construction employs well-known techniques for fast finite field arithmetic which fully exploit the fast integer arithmetic found on these processors. In this paper, we describe our methods to perform the arithmetic in an OEF and the methods to construct OEFs. We provide a list of OEFs tailored for processors with 8, 16, 32, and 64 bit word sizes. We report on our application of this approach to construction of elliptic curve cryptosystems and demonstrate a substantial performance improvement over all previous reported software implementations of Galois field arithmetic for elliptic curves.
TL;DR: The main building blocks used in the VLP arithmetic circuits are presented, the similarities of each arithmetic operator are shown and area/time estimates of these circuits in Xilinx FPGAs are presented.
Abstract: This paper presents the organization of an arithmetic unit for variable long-precision (VLP) operands suitable for reconfigurable computing. The reconfigurable arithmetic coprocessor (RAC) cooperates with the host computer in the VLP tasks. The main design issues addressed in the paper are: (a) mapping of the most frequent and time consuming operations of the VLP arithmetic algorithms to RAG, and (b) design of VLP algorithms that allow reduced reconfiguration time between arithmetic operations. The VLP arithmetic algorithms proposed cover multiplication, division and square root. In this paper we present the main building blocks used in the VLP arithmetic circuits, show the similarities of each arithmetic operator and present area/time estimates of these circuits in Xilinx FPGAs.
TL;DR: The main idea is to treat the output not as a binary number, but as a base 256 (or other) number, allowing a speedup of arithmetic coding by a factor of up to 2.
Abstract: Summary form only given. All integer based arithmetic coding consists of two steps: proportional range restriction and range expansion (renormalisation). Here a method is presented that significantly reduces the complexity of renormalisation, allowing a speedup of arithmetic coding by a factor of up to 2. The main idea is to treat the output not as a binary number, but as a base 256 (or other) number. This requires less renormalisation and no bitwise operations.
TL;DR: Experimental results from a set of typical arithmetic computations found in industry designs indicate that automating CSA optimization with the established algorithm produces designs with significantly faster timing and less area.
Abstract: Carry-save-adder (CSA) is the most often used type of operation In Implementing a fast computation of arithmetics of register-transfer level design in industry. This paper establishes a relationship between the properties of arithmetic computations and several optimizing transformations using CSAs to derive consistently better qualities of results than those of manual implementations. In particular, we introduce two important concepts, operation-duplication and operation-split, which are the main driving techniques of our algorithm for achieving an extensive utilization of CSAs. Experimental results from a set of typical arithmetic computations found in industry designs indicate that automating CSA optimization with our algorithm produces designs with significantly faster timing and less area.
TL;DR: The FPGA contains a routing framework and logic cell structure that is suitable for implementing digital systems for computer arithmetic, image processing, digital signal processing and similar computationally intensive applications.
Abstract: In this paper, we present the design of a novel Field Programmable Gate Array (FPGA) which contains the necessary logic elements to support high performance computer arithmetic. The FPGA contains a routing framework and logic cell structure that is suitable for implementing digital systems for computer arithmetic, image processing, digital signal processing and similar computationally intensive applications. The proposed architecture is flexible, reconfigurable and will support operands of various sizes for fixed point parallel and serial binary computations.
TL;DR: A robust implementation of the Beneath-Beyond algorithm for computing convex hulls in arbitrary dimension is presented and it is suggested that probabilistic modular arithmetic may be of wide interest, as it combines the advantages of modular arithmetic with the speed of randomization.
Abstract: We present a robust implementation of the Beneath-Beyond algorithm for computing convex hulls in arbitrary dimension. Certain techniques used are of independent interest in the implementation of geometric algorithms. In particular, two important, and often complementary, issues are studied, namely exact arithmetic and degeneracy. We focus on integer arithmetic and propose a general and efficient method for its implementation based on modular arithmetic. We suggest that probabilistic modular arithmetic may be of wide interest, as it combines the advantages of modular arithmetic with the speed of randomization. The use of perturbations as a method to cope with input degeneracy is also illustrated. A computationally efficient scheme is implemented which, moreover, greatly simplifies the task of programming. We concentrate on postprocessing, often perceived as the Achilles' heel of perturbations. Experimental results illustrate the dependence of running time on the various input parameters and attempt a comparison with existing programs. Lastly, we discuss the visualization capabilities of our software and illustrate them for problems in computational algebraic geometry. All code is publicly available.
TL;DR: This article describes a collection of Fortran routines for multiple-precision complex arithmetic and elementary functions that provide good exception handling, flexible input and output, trace features, and results that are almost always correctly rounded.
Abstract: This article describes a collection of Fortran routines for multiple-precision complex arithmetic and elementary functions. The package provides good exception handling, flexible input and output, trace features, and results that are almost always correctly rounded. For best efficiency on different machines, the user can change the arithmetic type used to represent the multiple-precision numbers.
TL;DR: The model of arithmetic branching programs, which is a generalization of modular branching programs and is equivalent to complements of dependency programs, was considered in this paper, and it was shown that dependency programs are closed under conjunction over every field.
Abstract: We consider the model of arithmetic branching programs, which is a generalization of modular branching programs. We show that, up to a polynomial factor in size, arithmetic branching programs are equivalent to complements of dependency programs. Using this equivalence we prove that dependency programs are closed under conjunction over every field. Furthermore, we show that span programs, an algebraic model of computation introduced by M. Karchmer and A. Wigderson (1993), are at least as strong as arithmetic programs; every arithmetic program can be simulated by a span program of size nod more than twice the size of the arithmetic program. Using the above results we give a new proof that NL/poly/spl sube//spl oplus/L/poly, first proved by A. Wigderson (1995). Our simulation of NL/poly is more efficient, and it holds for logspace counting classes over every field.
TL;DR: Comparing cost/spl times/delay comparisons with the more conventional approach to show a significant improvement is presented, demonstrating that the presented algorithms are attractive for VLSI systems demanding complex number operations.
Abstract: A class of on-line algorithms for complex number arithmetic is presented These algorithms adopt a redundant complex number system (RCNS) to represent complex numbers as a single number. Such a scheme simplifies the specification of the design, and has the additional effect that single precision complex arithmetic can be easily reconfigured for double-precision real arithmetic. We present cost/spl times/delay comparisons with the more conventional approach to show a significant improvement, demonstrating that the presented algorithms are attractive for VLSI systems demanding complex number operations.
TL;DR: A simple one-step fully parallel high-radix signed-digit arithmetic is proposed for parallel optical computing based on new joint spatial encodings that reduces hardware requirements and improves throughput by reducing the space-bandwidth product needed.
Abstract: High-radix number systems enable higher information storage density, less complexity, fewer system components, and fewer cascaded gates and operations. A simple one-step fully parallel high-radix signed-digit arithmetic is proposed for parallel optical computing based on new joint spatial encodings. This reduces hardware requirements and improves throughput by reducing the space-bandwidth product needed. The high-radix signed-digit arithmetic operations are based on classifying the neighboring input digit pairs into various groups to reduce the computation rules. A new joint spatial encoding technique is developed to present both the operands and the computation rules. This technique increases the spatial bandwidth product (SBWP) of the spatial light modulators (SLMs) of the system. An optical implementation of the proposed high-radix signed-digit arithmetic operations is also presented. It is shown that our one-step trinary signed-digit (TSD) and quarternary signed-digit (QSD) arithmetic units are much simpler and better than all previously reported high-radix signed-digit techniques.
TL;DR: This paper presents the design of a divider that performs either interval or floating-point division that requires only slightly more area than a conventional floating point divider and provides a significant performance improvement over software implementations of interval division.
Abstract: Interval arithmetic provides an efficient method for monitoring and controlling errors in numerical calculations. However existing software packages for interval arithmetic are often too slow for numerically intensive calculations. This paper presents the design of a divider that performs either interval or floating-point division. This divider requires only slightly more area than a conventional floating point divider and provides a significant performance improvement over software implementations of interval division.
TL;DR: In this article, the authors define theories of bounded arithmetic characterizing classes of functions computable by constant-depth threshold circuits of polynomial and quasipolynomial size.
Abstract: We define theories of Bounded Arithmetic characterizing classes of functions computable by constant-depth threshold circuits of polynomial and quasipolynomial size. Then we define certain second-order theories and show that they characterize the functions in the Counting Hierarchy. Finally we show that the former theories are isomorphic to the latter via the so-called RSUV-isomorphism.
TL;DR: By asking afresh exactly what it is the arithmetic coder must do, it is shown how much of the complexity of current coders can be dispensed with and an analysis shows the average loss caused by the revised coder to be bounded in an expected sense.
Abstract: By asking afresh exactly what it is the arithmetic coder must do, we show how much of the complexity of current coders can be dispensed with In particular, we eliminate all multiplicative operations in both the encoder and decoder, replacing them by comparisons and additions The essence of the proposal is a simple piecewise integer mapping Graf (1997) has made use of a similar integer mapping in his proposal for a fast entropy coder Our work is related to but independent of his As in all non-exact coders, some inefficiency is introduced We give an analysis that shows the average loss caused by the revised coder to be bounded in an expected sense by 00861 bits per symbol, which for most compression applications is just one or two percent As an additional modification, we discuss a mechanism that allows multi-bit output of codewords without compromising the precision of the probability estimates that may be employed Finally, we give performance results that show that in combination the two improvements yield a coder as much as 40% faster than previous benchmark arithmetic coders
TL;DR: This paper shows that the systolic control flow can be used for an efficient implementation of arithmetic operations on long operands, e.g. 1024 bits, in the field of cryptography.
Abstract: Instruction systolic arrays have been developed in order to combine the speed and simplicity of systolic arrays with the flexibility of MIMD parallel computer systems. Instruction systolic arrays are available as square arrays of small RISC processors capable of performing integer and floating point arithmetic. In this paper we show, that the systolic control flow can be used for an efficient implementation of arithmetic operations on long operands, e.g. 1024 bits. The demand for long operand arithmetic arises in the field of cryptography. It is shown how the new arithmetic leads to a high-speed implementation for RSA encryption and decryption.
TL;DR: In this paper, the number to be squared is subdivided into sub-numbers having a number of digits compatible with the arithmetic circuit, the individual subnumbers being successively processed.
Abstract: In order to enable calculation of the square of a number comprising many digits by means of an arithmetic circuit which is arranged for the parallel processing of numbers having a substantially smaller number of digits, the number to be squared is subdivided into sub-numbers having a number of digits which is compatible with the arithmetic circuit, the individual sub-numbers being successively processed. For faster processing in the case of squaring operations, the multiplier circuit provided in the arithmetic circuit includes a position shift circuit capable of performing a shift of one position to the left in the case of multiplication of given pairs of sub-numbers, which shift corresponds to a multiplication by the factor 2. As a result, squaring can be performed while using fewer technical means. A method operating on the basis thereof so as to form the square of a large number modulo another large number is also disclosed.
TL;DR: This work investigates how the CAD algorithm can be adapted to the situation when the coefficients are inexact, or, more precisely, Mathematica arbitrary-precision floating point numbers.
Abstract: We study the problem of deciding whether a system of real polynomial equations and inequalities has solutions, and if yes finding a sample solution. For polynomials with exact rational number coefficients the problem can be solved using a variant of the cylindrical algebraic decomposition (CAD) algorithm. We investigate how the CAD algorithm can be adapted to the situation when the coefficients are inexact, or, more precisely, Mathematica arbitrary-precision floating point numbers. We investigate what changes need to be made in algorithms used by CAD, and how reliable are the results we get.
TL;DR: The new relations allow one to calculate directly arithmetic spectrum from unnormalized Haar spectrum and vice versa without the necessity of obtaining the original function.
Abstract: Mutual relations between arithmetic and unnormalized Haar functions are stated. The new relations allow one to calculate directly arithmetic spectrum from unnormalized Haar spectrum and vice versa without the necessity of obtaining the original function. Since both arithmetic and Haar spectra are used widely in many applications, the presented equations should further enhance the scope of their applications.
TL;DR: It is demonstrated that the proposed hardware algorithms of residue arithmetic are useful to implement the reconfigurable current-mode multiple-valued residue arithmetic circuits, which are comparable to the conventional ones.
Abstract: This paper presents new reconfigurable multiple-valued residue arithmetic circuits, in which multiplication and addition can be performed alternatively. In order to construct the reconfigurable arithmetic circuits, we develop shifting-based hardware algorithms for both mod m/sub i/ multipliers and mod m/sub i/ adders. The proposed algorithms utilize three-valued one-hot coding for the representation of each residue digit effectively. By the coding, mod m/sub i/ multiplication can be simply performed by a shift operation and sign inversion. In mod m addition, the operation is decomposed into several operations, which include an inverse operation, two multiplications and an increment operation. It is demonstrated that the proposed hardware algorithms of residue arithmetic are useful to implement the reconfigurable current-mode multiple-valued residue arithmetic circuits, which are comparable to the conventional ones.
TL;DR: This paper motivates the need for arbitrary precision packed arithmetic wherein the width of the sub-datatypes are programmable by the user and proposes an implementation for arithmetic on such packed datatypes.
Abstract: Current day general purpose processors have been enhanced with what is called "media instruction set" to achieve performance gains in applications that are media processing intensive. The instruction set that has been added exploits the fact that media applications have small native datatypes and have widths much less than that supported by commercial processors and the plethora of data-parallelism in such applications. Current processors enhanced with the "media instruction set" support arithmetic on sub-datatypes of only 8-bit, 16-bit, 32-bit and 64-bit precision. In this paper we motivate the need for arbitrary precision packed arithmetic wherein the width of the sub-datatypes are programmable by the user and propose an implementation for arithmetic on such packed datatypes. The proposed scheme has marginal hardware overhead over conventional implementations of arithmetic on processors incorporating a multimedia extended instruction set.
TL;DR: Divisibility number theoretic function congruence arithmetic solving congruences continued fractions seeking prime numbers advanced factoring diophantine equations number curios some unsolved problems in number theory multiple precision arithmetic Internet web sites with information and number theory.
Abstract: Divisibility number theoretic function congruence arithmetic solving congruences continued fractions seeking prime numbers advanced factoring diophantine equations number curios some unsolved problems in number theory multiple precision arithmetic Internet web sites with information and number theory.
TL;DR: An iterative improvement scheme that can be put around any network flow algorithm for integer capacities such that all integers arising can be handled exactly using floating point arithmetic is described.
TL;DR: This paper explores the potential of using multiple precision arithmetic units to effectively support implementation of image and video processing applications as application specific integrated circuits and proposes a new architectural scheme for collaborate addition of sets of variable precision data.
Abstract: Modern image and video processing applications are characterized by a unique combination of arithmetic and computational features: fixed point arithmetic, a variety of short data types, high degree of instruction-level parallelism, strict timing constraints, high computational requirements, and high cost sensitivity. The current generation of behavioral synthesis tools does not address well this type of application. In this paper we explore the potential of using multiple precision arithmetic units to effectively support implementation of image and video processing applications as application specific integrated circuits. A new architectural scheme for collaborate addition of sets of variable precision data is proposed as well as an allocation and assignment methodology for multiple precision arithmetic units. Experimental results indicate the strong advantages of the proposed approach.
TL;DR: Analysis suggests that 24-bit merged arithmetic is required for the EZW algorithm to handle up to a level 6-wavelet transform, which provides equivalent throughput with substantially less complexity.
Abstract: A variation of merged arithmetic is applied to the implementation of the wavelet transform. This approach offers a simple design trade-off between the computational accuracy and the complexity. Our analysis shows that the trade-off is a function of the input data resolution, the number of filter taps, the arithmetic precision, and the level of the wavelet transform. The design parameter can be also fixed for a given number of taps and used to determine the minimum word size for the wavelet coefficients of the transform. The key element of this approach is to introduce a "truncation" within the merged arithmetic reduction process which provides equivalent throughput with substantially less complexity. An experiment has been conducted to verify the analysis, which suggests that 24-bit merged arithmetic is required for the EZW algorithm to handle up to a level 6-wavelet transform.
TL;DR: This dissertation presents several new bit-parallel hardware architectures with low space and time complexity and an analysis and refinement of the complexity of an existing hardware algorithm and a software method highly efficient and suitable for implementation on many 32-bit processor architectures are described.
Abstract: Today's computer and network communication systems rely on authenticated and secure transmission of information, which requires computationally efficient and low bandwidth cryptographic algorithms. Among these cryptographic algorithms are the elliptic curve cryptosystems which use the arithmetic of finite fields. Furthermore, the fields of characteristic two are preferred since they provide carry-free arithmetic and at the same time a simple way to represent field elements on current processor architectures.
Arithmetic in finite field is analogous to the arithmetic of integers. When performing the multiplication operation, the finite field arithmetic uses reduction modulo the generating polynomial. The generating polynomial is an irreducible polynomial over GF(2), and the degree of this polynomial determines the size of the field, thus the bit-lengths of the operands.
The fundamental arithmetic operations in finite fields are addition, multiplication, and inversion operations. The sum of two field elements is computed very easily. However, multiplication operation requires considerably more effort compared to addition. On the other hand, the inversion of a field element requires much more computational effort in terms of time and space. Therefore, we are mainly interested in obtaining implementations of field multiplication and inversion.
In this dissertation, we present several new bit-parallel hardware architectures with low space and time complexity. Furthermore, an analysis and refinement of the complexity of an existing hardware algorithm and a software method highly efficient and suitable for implementation on many 32-bit processor architectures are also described.
TL;DR: It has been known for more than thirty years that the degree of a non-standard model of true arithmetic is a subuniform upper bound for the arithmetic sets (suub), but here a notion of generic enumeration is presented with the property that the degrees of such an enumeration are not an suub.
Abstract: It has been known for more than thirty years that the degree of a non-standard model of true arithmetic is a subuniform upper bound for the arithmetic sets (suub). Here a notion of generic enumeration is presented with the property that the degree of such an enumeration is an suub but not the degree of a non-standard model of true arithmetic. This answers a question posed in the literature. ?
TL;DR: In this paper, a geometric processing processor is provided with first and second interfaces 80 and 86, respectively connected with a host and a renderer and a geometric arithmetic core 70 for processing a geometric operation applied from a host.
Abstract: PROBLEM TO BE SOLVED: To quickly process geometric arithmetic operation, and to quickly input and output data. SOLUTION: A geometric processing processor is provided with first and second interfaces 80 and 86, respectively connected with a host and a renderer and a geometric arithmetic core 70 for processing a geometric arithmetic operation applied from a host. The geometric arithmetic core 70 is provided with plural SIMD(single instruction stream and multiple data stream) type floating point arithmetic units 138-144, floating point power calculation unit 114, integer arithmetic unit 106, sequencer 94 and address generator 96 for controlling them, and for processing data from a host processor in response to an instruction from the host processor, and output controlling part for outputting the processed data through an interface 86 to the renderer. COPYRIGHT: (C)2000,JPO