TL;DR: In this paper, a floating point processor for performing arithmetic operations on floating point numbers includes a first arithmetic operation unit configured to operate on normalized numbers and a second arithmetic operator unit which includes a denormalizer for denormalizing normalized numbers, and a normalizer for normalizing denormalized numbers.
Abstract: A floating point processor for performing arithmetic operations on floating point numbers includes a first arithmetic operation unit configured to operate on normalized numbers and a second arithmetic operation unit which includes a denormalizer for denormalizing normalized numbers and a normalizer for normalizing denormalized numbers. Each arithmetic operation unit has first and second inputs for receiving first and second operands, respectively, and an output for transmitting a result of the arithmetic operation. When an denormalized operand is presented as an input to the arithmetic operation unit configured to operate on normalized numbers, the denormalized input operand is redirected through the second arithmetic unit for normalization of the denormalized operand. The first arithmetic operation unit then performs its arithmetic operation using the normalized input operands. The result of the arithmetic operation is then analyzed to determine whether it has a zero or negative exponent. If the result has a zero or negative exponent, the result is directed through the second arithmetic unit a second time so that the result is denormalized. The denormalized result is then output.
TL;DR: Algorithms and implementation details for the logarithm functions in both single and double precision of IEEE 754 arithmetic are presented here.
Abstract: Algorithms and implementation details for the logarithm functions in both single and double precision of IEEE 754 arithmetic are presented here. With a table of moderate size, the implementation need only working- precision arithmetic and are provably accurate to within 0.57 ulp.
TL;DR: A number system that offers advantages in some situations over conventional floating point and sign/logarithmic number systems is described, and can be implemented with online arithmetic, which would be impractical for a conventional sign logarithm number system.
Abstract: A number system that offers advantages in some situations over conventional floating point and sign/logarithmic number systems is described. Redundant logarithmic arithmetic, like conventional logarithmic arithmetic, relies on table lookups to make the arithmetic unit simpler than an equivalent floating point unit. The cost of 32 bit subtraction in a redundant logarithmic number system is lower than previously published logarithmic subtraction methods. The total memory requirement for a 29-bit redundant logarithmic unit is 16 K words compared to 22 K words by the best previously published conventional sign logarithm unit, assuming similar addition techniques are employed. A redundant logarithmic number system can be implemented with online arithmetic, which would be impractical for a conventional sign logarithm number system. The disadvantages of redundant arithmetic are typical of redundant number systems. First, the redundancy doubles the storage requirements for data values. Second, the representation can become ill-conditioned, especially as a result of iterated multiplications. Third, division and square root operations are more difficult to implement in redundant logarithmic arithmetic. >
TL;DR: In this paper, several algorithms for an implementation of the scalar product are sketched, problems are discussed and solutions are suggested; finally, typical designs and implementations are summarized and illustrated.
Abstract: While the four usual floating-point operations are the basis of real floatingpoint arithmetic, the scalar product is the basis of the operations in higher numerical spaces, such as matrices, vectors, etc. In addition, an exact scalar product is an invaluable tool for the verified solution of numerical problems by means of enclosure methods. Therefore, computer arithmetic including such an exact scalar product is a significant extension of IEEE arithmetic. In this paper, several algorithms for an implementation are sketched, problems are discussed and solutions are suggested; finally, typical designs and implementations are summarized and illustrated.
TL;DR: In this paper, the effects of finite precision arithmetic in three uniformly hyperbolic chaotic dynamical systems: Bernoulli shifts, cat maps, and pseudorandom number generators are explored.
TL;DR: A new class of arithmetic circuits, called feasible-size-magnitude, is introduced and used to show a feasible version of the Weierstrass approximation theorem, which means that a real function is feasible if and only if it can be sup-approximated by a division-free uniform family of feasible- Size-Magnitude arithmetic circuits over R.
Abstract: The connection between computable analysis and computational complexity is investigated by asking what it means to feasibly compute a real function. A new class of arithmetic circuits, called feasible-size-magnitude, is introduced and used to show a feasible version of the Weierstrass approximation theorem. That is, a real function is feasible if and only if it can be sup-approximated by a division-free uniform family of feasible-size-magnitude arithmetic circuits over R. This result involves a counter-intuitive simulation of Boolean circuits by arithmetic ones. It also has implications for algebraic complexity theory.
TL;DR: It is shown that rounding the coefficients of each plane (or line) equation without altering the combinatorial information is NP-complete and avoids numerical problems that arise from using floating-point arithmetic to implement operations on solids.
Abstract: A standard technique in solid modeling is to represent planes (or lines) by explicit equations and to represent vertices and edges implicitly by means of combinatorial information. Numerical problems that arise from using floating-point arithmetic to implement operations on solids can be avoided by using exact arithmetic. Since the execution time of exact arithmetic operators increases with the number of bits required to represent the operands, it is important to avoid increasing the number of bits required to represent the plane (or line) equation coefficients. Set operations on solids do not increase the number of bits required. However, rotating a solid greatly increases the number of bits required, thus adversely affecting efficiency. One proposed solution to this problem is to round the coefficients of each plane (or line) equation without altering the combinatorial information. We show that such rounding is NP-complete.
TL;DR: In this paper, a processor for use in CAT X-ray systems and NMR systems performs floating point arithmetic operations in parallel to shorten processing time, where a single program memory and program sequencing unit operates a set of floating-point arithmetic units to carry out parallel operations.
Abstract: A processor for use in CAT X-ray systems and NMR systems performs floating point arithmetic operations in parallel to shorten processing time. A single program memory and program sequencing unit operates a set of floating point arithmetic units to carry out parallel operations on data set storeed in respective data memories. An integer processor unit executes logical operations and a shared data memory stores constants and other data which is required by the integer processor unit and which is common to the operations performed by all of the floating point arithmetic units.
TL;DR: Alternate formulations of Horner's rule which partitions the algorithm into inner-product computations are studied and it is considered that each has advantages depending on problem size and target technology.
Abstract: Alternate formulations of Horner's rule which partitions the algorithm into inner-product computations are studied. Fixed-point inner products may be implemented with distributed arithmetic structures that use table-lookup in place of multiplication. Distributed arithmetic can be smaller and faster than lumped arithmetic in technologies where memory is cheaper than logic. The partitioned algorithms may be mapped to mesh-connected or tree-connected VLSI architectures. The partitions may be chosen to optimize cost measures and constraints that are functions of area, latency, period, and arithmetic precision. These structures are compared with a tree structure for polynomial evaluation. It is considered that each has advantages depending on problem size and target technology. >
TL;DR: The study shows that on-line arithmetic is effective and feasible in the implementation of numerical computations where the critical path contains long sequences of arithmetic operations.
Abstract: On-line arithmetic algorithms introduce parallelism between sequential operations by overlapping these operations in a digit-pipelined fashion. They can reduce the computation time of long sequences of arithmetic operations. Thus on-line arithmetic complements other approaches such as parallel processing and pipelining which exploit parallelism at the numerical algorithms level.
The implementation characteristics of on-line arithmetic algorithms and their applications are investigated. First the model of on-line computation is defined, and a systematic and unified derivation of on-line algorithms is given. Parameters that affect the implementation efficiency of an on-line computation are identified and discussed. Gate array implementation of on-line arithmetic units are presented with estimations of the cost and performance. A case study is conducted and the on-line approach is compared with conventional schemes. The study shows that on-line arithmetic is effective and feasible in the implementation of numerical computations where the critical path contains long sequences of arithmetic operations.
TL;DR: Here a technique based on program slicing is presented to both automate this process and reduce the amount of reexecution of the correct transformation from a program operating on the constructive reals to a reasonable program using iterated interval arithmetic.
Abstract: The constructive reals provide programmers with a useful mechanism for prototyping numerical programs, and for experimenting with numerical algorithms. Unfortunately, the performance of current implementations is inadequate for some potential applications. In particular, these implementations tend to be space inefficient, in that they essentially require a complete computation history to be maintained.Some numerical analysts (cf. [3]) propose that the programmer instead be provided with variable precision interval arithmetic, and then be required to write code to restart a computation when the intervals become too inaccurate. Though this model is no doubt appropriate at times, it is not an adequate replacement for exact arithmetic. The correct transformation from a program operating on the constructive reals to a reasonable program using iterated interval arithmetic can be nontrivial and error prone. Here we present a technique based on program slicing to both automate this process and reduce the amount of reexecution. Thus the programmer is still free to use the simpler abstraction of exact real arithmetic, but we can provide a more efficient interval arithmetic based implementation. Some preliminary empirical results are presented.
TL;DR: In this paper, the sensitivity of transfer functions with respect to finite wordlength effect errors in the implementation of the coefficients of both shift operator and delta operator parametrizations was analyzed.
Abstract: The sensitivity of transfer functions is analyzed with respect to finite wordlength effect errors in the implementation of the coefficients of both shift operator and delta operator parametrizations. Both the absolute sensitivity, naturally connected to fixed point arithmetic, and the relative sensitivity, naturally connected to floating point arithmetic, are analyzed. In both cases, but particularly the latter, the delta operator parametrizations are shown to produce better sensitivity properties. >
TL;DR: The FastHull algorithm runs faster than any currently known 2D convex hull algorithm for many input point patterns and has linear time performance for many kinds of input patterns.
Abstract: An efficient and numerically correct program called FastHull for computing the convex hulls of finite point sets in the plane is presented. It is based on the Akl-Toussaint algorithm and the MergeHull algorithm. Numerical correctness of the FastHull procedure is ensured by using special routines for interval arithmetic and multiple precision arithmetic. The FastHull algorithm guaranteesO(N logN) running time in the worst case and has linear time performance for many kinds of input patterns. It appears that the FastHull algorithm runs faster than any currently known 2D convex hull algorithm for many input point patterns.
TL;DR: A modular arithmetic circuit using a radix-4 minimum-redundant signed-digit number system is proposed and a performance estimation using SPICE simulation shows that the speed of the proposed arithmetic circuits is comparable to that of the fastest binary circuits.
Abstract: A modular arithmetic circuit using a radix-4 minimum-redundant signed-digit number system is proposed. Any arithmetic circuit can be constructed using a single kind of module with a high degree of parallelism. This modularity is very useful for realizing high-performance semicustom VLSI such as a data-driven arithmetic circuit. The module is composed of an adder, a partial product generator, and a quotient digit generator, which are mainly implemented by multiple-valued bidirectional current-mode circuits. It is easy to design any complex arithmetic circuit using the modules. A performance estimation using SPICE simulation shows that the speed of the proposed arithmetic circuits is comparable to that of the fastest binary circuits. >
TL;DR: Two different approaches to solve systems of polynomial equations with Grobner base techniques are presented, namely the vectorization of the arbitrary precision integer arithmetic and the usage of decomposition techniques.
Abstract: The attempt to solve systems of polynomial equations with Grobner base techniques often leads to large problems which exceed the available computer resources with their requirements for cpu time or storage. The well-known reason for that is the swell of intermediate polynomials, which are generated during the basis calculation and which are in most cases not included in either the given set of polynomials or the resulting Grobner basis. In this paper two different approaches to overcome the problem are presented which benefit from the usage of parallel computers, namely the vectorization of the arbitrary precision integer arithmetic and the usage of decomposition techniques. Especially the decomposition approach, where applicable, leads to massive parallelism in the problem solution, which results in a breakthrough for several problems.
TL;DR: This work describes several integer factorisation algorithms, and considers their suitability for implementation on vector processors and parallel machines.
Abstract: The problem of finding the prime factors of large composite numbers is of practical importance since the advent of public key cryptosystems whose security depends on the presumed diculty of this problem. In recent years the best known integer factorisation algorithms have improved greatly. It is now routine to factor 60-decimal digit numbers, and possible to factor numbers of more than 110 decimal digits. We describe several integer factorisation algorithms, and consider their suitability for implementation on vector processors and parallel machines.
TL;DR: A Connection Machine with 32K processors has been used to carry out calculations in finite fields with as many as 221 elements and of various characteristics; a typical calculation is to determine the number of roots of a large family of polynomials.
Abstract: A Connection Machine (model CM-2) with 32K processors has been used to carry out calculations in finite fields with as many as 221 elements and of various characteristics; a typical calculation is to determine the number of roots of a large family of polynomials. The programs use discrete logarithms, employing a table of “successor” logarithms to perform addition. The table is computed in advance, in parallel. The system can evaluate some 4 × 106 polynomial terms per second; performance is limited by the general communication time needed for table lookup. Orbits of the p-th power bijection (also calculated in parallel) are used to deal with common symmetries arising in the calculations. The techniques are illustrated by calculations to determine the number of rational points of a polynomial surface over several fields, quantities which are useful in analyzing certain cyclic codes.
TL;DR: This paper investigates the implemention of the basic arithmetic operations: addition, subtraction, multiplication, and division in a digital optical processor.
Abstract: Symbolic substitution logic is a powerful tool for realizing optical arithmetic in a digital optical processor, because it is matched to the parallelism of optics and to the properties of optical switching devices. Working with such processor enables the use of very efficient parallel algorithms. We have investigated the implemention of the basic arithmetic operations: addition, subtraction, multiplication, and division.
TL;DR: This paper presents an efficient realization of the arithmetic Fourier-transform algorithm on an optical parallel processor consisting of fiber-optic tapped delay lines and observes that the performance is affected minimally by the errors of the optical system.
Abstract: The arithmetic Fourier transform is a number-theoretic method for calculating the Fourier coefficients of continuous-time signals. In this paper we present an efficient realization of the arithmetic Fourier-transform algorithm on an optical parallel processor consisting of fiber-optic tapped delay lines. The performance of the algorithm in the presence of optical errors and noise is analyzed. From the results of numerical experiments on a simple test signal, we observe that the performance of the algorithm is affected minimally by the errors of the optical system.
TL;DR: In this paper, a register file (444) is employed to store data words and an arithmetic logic unit (450) processes the data words by means of two parallel arithmetic logic units (450a, 450b) that provide fixed point and floating point arithmetic processing operations.
Abstract: REGISTER AND ARITHMETIC LOGIC UNIT Register and arithmetic logic apparatus for use in processing digital data and which provides a split pipeline architecture that operates on multiple data formals. A register file (444) is employed to store data words. An arithmetic logic unit (450) processes the data words by means of two parallel arithmetic logic units (450a, 450b) that provide fixed point and floating point arithmetic processing operations, respec-tively. The two parallel arithmetic logic units (450a, 450b) permit processing of a plurality of predetermined data processing formats, including dual 16 bit fixed point, 32 bit fixed point, 32 bit floating point and logical data processing formats. Post-pro-cessing registers (454, 456) provide for a limiter/shifter register, a length selectable first in, first out buffer for controlling the length of the register pipeline, and logic which provides for queuing of the processed data words. The register file (444) and the fixed point arithmetic logic unit (450a) may be selectively coupled together to function as an accumulator. This function permits processing of the data words such that two 32 bit data words are accumulated into a 64 bit data word, or the dual 16 bit data words are accumulated into two 32 bit data words. Processing using the dual 16 bit format employs a potential overflow scheme that permits a variety of gignal pro-cessing algorithms to function with relatively compact code.
TL;DR: It is shown that the above model of on-line computation can be easily implemented by means of a table look-up system and can be applied to chained on- line computations.
TL;DR: Performance and complexity characteristics of the implementations of on-line arithmetic units for radix-2 floating-point addition, multiplication and division operations are discussed and compared with those of the compatible conventional floating- point algorithms implemented in the same technology.
Abstract: We present gate array designs of on-line arithmetic units for radix-2 floating-point addition, multiplication and division operations. Performance and complexity characteristics of the implementations of on-line arithmetic units are discussed and compared with those of the compatible conventional floating-point algorithms implemented in the same technology.
TL;DR: This paper discusses the appropriate system of representation for the arguments and values of the functions and the regions in which their evaluation is meaningful and shows that the internal calculations can be performed with fixed absolute precisions.
Abstract: In order to render the level-index, li, system of number representation and computer arithmetic a realistic scientific computing environment, it is necessary to develop and analyse algorithms for the (approximate) evaluation of the elementary functions with li arguments and/or outputs. In this paper the basic approaches to this problem are outlined and analysed. For some of the routines, operation times will turn out to be faster than for the basic arithmetic algorithms. It is shown that (as for those arithmetic operations) the internal calculations can be performed with fixed absolute precisions. We also discuss the appropriate system of representation for the arguments and values of the functions and the regions in which their evaluation is meaningful.
TL;DR: A survey of certain characterizations of complexity classes in an algebraic, machine-independent manner is presented, together with some applications to weak theories of arithmetic and higher-order functionals.
Abstract: A survey of certain characterizations of complexity classes in an algebraic, machine-independent manner is presented, together with some applications to weak theories of arithmetic and higher-order functionals. Function algebras are examined, and hierarchies are defined. Bounded arithmetic theories are presented. >
TL;DR: This paper presents a nodal (arbitrary precision) integer arithmetic package and discusses the fast division algorithm which is implemented and the experiences with this approach are discussed.
Abstract: PAC is a parallel environment, based on a MIMD distributed computing model, which is intended to aid in the development of computer algebra algorithms. It uses parallelism as a tool for processing large problems. This paper discusses the general relationship between computer algebra and parallelism. The general features of the PAC project are described and some of the results obtained with PAC are presented. One of the crucial elements of symbolic computation on parallel architectures is efficient implementation of fast arbitrary precision arithmetic. This paper presents a nodal (arbitrary precision) integer arithmetic package and discusses the fast division algorithm which we have implemented. The representation used is designed to take advantage of a vectorized floating point unit. Our experiences with this approach are also discussed.
TL;DR: A high speed arithmetic processor includes an array of arithmetic cells which operate on digits internally represented in a signed-digit binary format, and perform subtraction operations on two ordinary binary digits, and produce the difference in a 2-bit signed digit format, without requiring a separate ordinary binary to signed digit conversion as discussed by the authors.
Abstract: A high speed arithmetic processor includes an array of arithmetic cells which operate on digits internally represented in a signed-digit binary format. Certain of these cells perform subtraction operations on two ordinary binary digits, and produce the difference in a 2-bit signed-digit binary format, without requiring a separate ordinary binary to signed-digit binary converter.
TL;DR: A survey of characterizations of complexity classes in an algebraic, machine-independent manner, together with some applications to weak theories of arithmetic and higher order functionals can be found in this article.
Abstract: We give a survey of certain characterizations of complexity classes in an algebraic, machine-independent manner, together with some applications to weak theories of arithmetic and higher order functionals.
TL;DR: The task of verifying a result in computational number theory became a study in the parallel computation of high precision arithmetic, providing information that can be used to intelligently make decisions about the implementation of an arbitrary-precision integer arithmetic package for shared memory multiprocessors.
Abstract: The task of verifying a result in computational number theory became a study in the parallel computation of high precision arithmetic, providing information that can be used to intelligently make decisions about the implementation of an arbitrary-precision integer arithmetic package for shared memory multiprocessors.