TL;DR: In this paper, a method for computing scalar products and sums of floating point numbers with maximum accuracy and circuitry was proposed, by means of a summing unit (ALU) and one or more accumulator registers (ARC1, ARC2) with cells (Ai, j) for storing codes of a base b having a length (2l+ 2 e1+ 2e2) for fixed point representation and certain overflow positions.
Abstract: Circuitry for generating scalar products and sums of floating point numbers with maximum accuracy and circuitry and a method for electronic computers by which scalar products of floating point numbers of the type pi, qi, ES(b,l,e1,e2) are summed with full precision in a fixed point representation by means of a summing unit (ALU) and one or more accumulator registers (ARC1, ARC2) with cells (Ai, j) for storing of codes of a base b having a length (2l+ 2 e1+ 2e2) for fixed point representation and certain overflow positions. By control means (SHR, E, Contr) the mantissas of products are delivered depending on the value of the respective exponents into the summing unit (ALU). By control means (RD, Contro), rounding operations ( ○ , , ∇, .increment.) demanded by the higher level computer are performed, and a rounded floating point number (□ c e S(b,l,e1,e2)) and overflow (OF) and underflow (UF) criteria are delivered. Parallel, serial and word organized summing units (ALU) and accumulator registers (ACR) are usable and in another embodiment, the multiplication of the factors (pi, qi) is performed using a table of multiples store.
TL;DR: The problem of analysing data near the critical temperature is investigated using simulated data (with particular reference to specific heat data) and the possibilities of quantitatively comparing competing models is discussed.
Abstract: The problem of analysing data near the critical temperature is investigated using simulated data (with particular reference to specific heat data). Attention is paid to the fitting problems posed by the many adjustable parameters inherent to theoretically predicted models, and to the statistical significance of resulting parameters. Consideration is given to non-ideal or rounding behaviour near T c and the possibilities of quantitatively comparing competing models is discussed.
TL;DR: A rounding unit for use in arithmetic processing of register floating point data is presented in this article.The rounding unit comprises a mantissa part register, an exponent part register and a judging circuit for judging whether the rounding operation is raising or truncating.
Abstract: A rounding unit for use in arithmetic processing of register floating point data. The rounding unit comprises a mantissa part register for storing the mantissa part of the floating point data, an exponent part register for storing the exponent part of the floating point data, a judging circuit for judging whether the rounding operation is raising or truncating, a mantissa part incrementer for incrementing the mantissa part of the floating point data and outputting a carry signal when it is overflowed, an exponent part incrementer for incrementing the exponent part of the floating point data and a selection circuit which, in response to the carry signal from the mantissa part incrementer and the judging signal from the judging circuit, orders the mantissa part register to store a constant data of which the most significant bit is "1" and the other bits are "0", when the rounding operation is raising and the carry signal is present. For generating the constant data, a constant register or an adjusting means is employed. The rounding unit executes the rounding operation of floating point data by hardware and thus it operates at a very high speed.
TL;DR: In this paper, the use of magnitude truncation instead of controlled rounding for the elimination of zero-input and constant-input oscillations in the wave digital biquad derived from the feed-forward RC active configuration is described.
Abstract: The use of magnitude truncation instead of controlled rounding for the elimination of zero-input and constant-input oscillations in the wave digital biquad derived from the feedforward RC -active configuration is described. We also describe how the structure could be used for the simultaneous realization of all types of second-order digital filters.
TL;DR: The effects of finite precision are considered in two-dimensional recursive digital fillers and various forms of fixed-point arithmetic are considered and some examples are given.
Abstract: The effects of finite precision are considered in two-dimensional recursive digital fillers. Errors are introduced in quantizing the input, the coefficients and in the evaluation of the multiplications of the filter implementations. Finite register length models of the filters are developed and block diagrams are given for various forms of realization. A statistical approach is used, which permits the computation of the statistics of the errors at the output of the filters. Rounding and truncation are compared, as methods of quantization. Various forms of fixed-point arithmetic are considered and some examples are given.
TL;DR: A model of the relative error in floating point multiplication is developed and is analyzed stochastically for various choices of computer design parameters, which include the base, the type of rounding rule, the number of guard digits, and whether the post-arithmetic normalization shift is done before or after rounding.
Abstract: A model of the relative error in floating point multiplication is developed and is analyzed stochastically for various choices of computer design parameters. These parameters include the base, the type of rounding rule, the number of guard digits, and whether the post-arithmetic normalization shift (if needed) is done before or after rounding. Under the assumption of logarithmic distribution for the fraction (mantissa), the major stochastic conclusions are:
TL;DR: The methods to estimate the integration errors, including the effects of truncation, rounding off, and instability of the solutions, are discussed in this article, where the authors consider the case of exponentially diverging orbits.
Abstract: The methods to estimate the integration errors, including the effects of truncation, rounding off, and instability of the solutions, are discussed. Polynomial error accumulation depends upon numerical method, stepsize, orbital period and also eccentricity; it is also machine dependent. Comets correspond to the most difficult case of exponentially diverging orbits; however they can be very close to resonant ordered regions.
TL;DR: In this article, a central processing unit (CPU) reads the reference weight, which is determined by dividing the result of the detection of the weight of ten pieces and the like by the number of pieces, out of a built-in memory part.
Abstract: PURPOSE:To eliminate the error in counting the number of parts by a scale by updating the memory and a reference weight every time the number of the parts is counted, and judging the crtical region at the time of rounding the counted number. CONSTITUTION:When an operating part 5 specifies parts numbers, a central processing unit CPU 3 reads the reference weight, which is determined by dividing the result of the detection of the weight of ten pieces and the like by the number of pieces, out of a built-in memory part (c). The reference weight is applied to an operating part (a). The weight of articles 6, whose number is unknown, is detected by a weight detector 1 and applied to said operating part (a) through an A/D converter and an I/O interface part (b). The detected weight is divided by the reference weight and the number of pieces is judged. When the fraction of the division is larger than 0.3 and smaller than 0.7, the CPU judges that the region is the critical region at the time of rounding, and displays the magnitude of dispersion on a display device 4. The change of the number of articles measured by the detector 1 is instructed. Meanwhile the CPU updates the content of the memory part, with the weight, which is obtained by dividing the detected weight by the number determined by the operation, as the new reference weight. Thus, errors in the total weight of parts, errors in the reference weight of one part, and errors in rounding in division are eliminated, and the errors in counting the number of parts by a scale becomes zero.
TL;DR: An efficient algorithm for the transformation of the input-output model into the state-space model is presented and a comparison of the proposed algorithm with a usual one is given and a numerical example is presented.
Abstract: An efficient algorithm for the transformation of the input-output model into the state-space model is presented. The algorithm is based on a direct connection of the model parameters in both spaces. This connection can be expressed by a recursive formula. To ensure the consistency of the formula, the parameters must be calculated in the correct sequence. This is achieved by using a corresponding index sequence table. The method requires only a small number of numerical operations, therefore the computational time is short and the effect of rounding errors on the accuracy is small. A comparison of the proposed algorithm with a usual one is also given and a numerical example is presented at the end.
TL;DR: In this article, the effect of data perturbations and rounding errors for some algorithms, using the ideas of Stummel's perturbation theory [3] which is a forward error analysis, is investigated.
Abstract: There exist several algorithms for the calculation of convergents of a continued fraction. We will investigate the effect of data perturbations and rounding errors for some algorithms, using the ideas of Stummel's perturbation theory [3] which is a forward error analysis. In Section 1 we briefly repeat the forward a priori error analysis which we shall use. In Section 2 we present three forward recurrence algorithms (including a method which we believe to be new) and the well-known backward recurrence algorithm for the calculation of a convergent of a given continued fraction. The next four sections are devoted to the a priori error analysis of the four algorithms. The theoretical results are applied to numerical examples in Section 7. As far as the rounding errors were concerned no algorithm was better or worse than the others for all the examples and no error bounds were especially more accurate than the other ones.
TL;DR: In this paper, a data processing device for processing a data sequence obtained by sampling an information signal, is arranged to compute a plurality of data before and/or after an incorrect data amoung the data sequence, to have data obtained by rounding up or rounding off the results of computation at about the same rate, and to replace the incorrect data with the computed data.
Abstract: A data processing device for processing a data sequence obtained by sampling an information signal, is arranged to compute a plurality of data before and/or after an incorrect data amoung the data sequence, to have data obtained by rounding up or rounding off the results of computation at about the same rate, and to replace the incorrect data with the computed data.
TL;DR: This paper demonstrates how one can simulate a hyperbolic rational number system in any high level language that supports floating point computation and infer that hyperbolics rational number systems form viable alternatives to traditional binary floating point number systems.
Abstract: One can naively view a computer number system as a pair (F, P) consisting of a finite set F of real numbers and a rounding rule P. One such number system is a hyperbolic rational number system which has as F a finite set of rational numbers and as P the so-called mediant rounding rule. In this paper we demonstrate how one can simulate a hyperbolic rational number system in any high level language that supports floating point computation. From this simulation we infer that hyperbolic rational number systems form viable alternatives to traditional binary floating point number systems. Many properties of hyperbolic rational number systems are derived from the relationship of their rounding rule to the well-developed theory of best rational approximation.
TL;DR: In this paper, the mantissa of two operands to be added or subtracted are set in operand registers 1 and 2, and the low-order bits of the least significant digit of the operand register 2 are shifted and inputted to a protection digit register 5 from the side of a G bit.
Abstract: PURPOSE: To decrease arithmetic steps of firmware and to shorten an execution time by making a protection bit correction through hardware. CONSTITUTION: Mantissas of two operands to be added or subtracted are set in operand registers 1 and 2. Exponents of the two operands are compared for digit matching and when it is decided that, for example, the mantissa in the operand register 2 is shifted to right, a selector 4 is so controlled to select the low-order digit input. The low-order bits of the mantissa shifted and outputted from the least significant digit of the operand register 2 are shifted and inputted to a protection digit register 5 from the side of a G bit, so a rounding carry logic circuit 6 makes correction of (00)+(G,R) in addition and (00)-(G,R) in subtraction as to G, R, and S bits in the protection digit register 5 to generate a rounding carry on the basis of the results, thereby applying the carry to an ALU. COPYRIGHT: (C)1987,JPO&Japio
TL;DR: Necessary and sufficient conditions for the optimality of filter realizations expressed in the factored stateVariable form are derived, as a simple extension of important earlier work with the usual state variable form.
Abstract: Studying the effects of roundoff errors in digital filters requires specialized study of each implementation of each of various filter structures. Even the study of these special structures, however, is fraught with difficulties; different implementations of the same structure can have different roundoff behavior, because rounding is done at different points in the structure. The limitations of practical VLSI architectures suggest two models of computation that accurately reflect the vast majority of filter implementations. Such implementations can be accurately described in a factored state variable form that represents the actual computations in the implementations. The quantization noise behavior of different filter structures can be studied under this unified framework. Necessary and sufficient conditions for the optimality of filter realizations expressed in the factored state variable form are derived, as a simple extension of important earlier work with the usual state variable form.
TL;DR: The object of this paper is to specify the new arithmetic in ADA in a convenient operator form for all usual numerical data types so that no computer representable element lies between the actual and the computer generated result of an operation.
Abstract: Usually, higher programming languages provide a floating point arithmetic without specifying the accuracy of the operations. In contrast, ADA defines the operations by means of model numbers and rounding (see [2], [11]). Nevertheless, this definition is not strong enough to satisfy the modern requirement of maximum accuracy in all spaces of scientific computation (see [7]-[9]). By this we mean that no computer representable element lies between the actual and the computer generated result of an operation. The object of this paper is to specify the new arithmetic in ADA for all usual numerical data types. The new arithmetic is made available in a convenient operator form.
TL;DR: A statistical technique for the coefficient word-length determination in microcomputer based digital com pensators is proposed, which uses as design specification the variation on the system output signal when implemented with finite precision arithmetics.
Abstract: A statistical technique for the coefficient word-length determination in microcomputer based digital com pensators is proposed. This technique uses as design specification the variation on the system output signal when implemented with finite precision arithmetics. Two's complement fixed-point with rounding arithmetics in used. The computer-aided, microcomputer based compensator design and implementation are also presented.
TL;DR: The paper establishes explicit analytical representations of the errors and residuals of the solutions of linear algebraic systems as functions of the data errors and of the rounding errors of a high-accuracy floating-point arithmetic.
Abstract: The paper establishes explicit analytical representations of the errors and residuals of the solutions of linear algebraic systems as functions of the data errors and of the rounding errors of a high-accuracy floating-point arithmetic. On this basis, strict, componentwise, and in first order optimal error and residual estimates are obtained. The stability properties of the elimination methods of Doolittle, Crout, and Gauss are compared with each other. The results are applied to three numerical examples arising in difference approximations, boundary and finite element approximations of elliptic boundary value problems. In these examples, only a modest increase of the accuracy of the solutions is achieved by high-accuracy arithmetic.
TL;DR: In this paper, the authors proposed a method to speed up rounding processing by detecting whether mantissa data is increased to output a carry before a mantissa is increased, which is the case in this paper.
Abstract: PURPOSE:To speed up rounding processing by detecting whether mantissa data is increased to output a carry before a mantissa is increased. CONSTITUTION:Mantissa part data, exponent part data, and a sign before rounding processing are stored in registers 101 and 105 and a sign bit 104 in a period phi2 of a period T1. A specific rounding mode is stored in a rounding mode register 103 until the period phi2 of the period T1. A round-up/round-off decision circuit 105 and a carry look-ahead circuit 107 operate in a period phi1 of the period T1. Further, the mantissa part data in the register 101 is passed through a mantissa data bus 108 and latched by a mantissa part adder 109 in the period phi1 of the period T1, and the exponent part data in the register 115 is latched by an exponent part adder 117 through an exponent data bus 116. The mantissa part adder 109 and exponent part adder 117 operate in the period phi2 of the period T2. At this time, the mantissa part adder 109 uses the decision signal 106 of the round-up/round-off decision circuit 105 as the least significant digit bit.
TL;DR: The optimal design procedure relies upon generalized Fourier transforms, and it is shown that the encoder part of the optimum pair of rules can be taken as a linear function when the input space of symbols is viewed in a natural algebraic setting.
Abstract: The k input and output digits of a rate (k/n) linear convolutional code over a finite field GF (q) are related to a finite set of integers by a q -ary expansion. The mean-square error criterion is used to simultaneously select the optimum encoder and decoder rules. This optimization is performed over all one-to-one generalized encoding rules and all decoding functions that map into the real numbers. The optimal design procedure relies upon generalized Fourier transforms, and it is shown that the encoder part of the optimum pair of rules can be taken as a linear function when the input space of symbols is viewed in a natural algebraic setting. The decoder part is a conditional mean estimator coupled with a rounding operation. One method of implementing the decoder uses the nonlinear combination of filter functions defined in the generalized frequency domain.
TL;DR: In this paper, limit cycles due to product rounding in a fixed-point implementation of a first-order two-dimensional digital filter are considered, and a number of lemmas and a theorem are presented, proving that only row or column limit cycles can exist where two product rounding quantisers are employed.
Abstract: Limit cycles due to product rounding in a fixed-point implementation of a first-order two-dimensional digital filter are considered. A number of lemmas and a theorem are presented, proving that only row or column limit cycles can exist where two product rounding quantisers are employed.
TL;DR: In this paper, the authors present a method to attain the multiplication of a mantissa part with the same multiplier as the addition of a decimal floating point by dividing the floating point into two parts on bits and executing in time division the addition and rounding processes of a partial product.
Abstract: PURPOSE:To attain the multiplication of a mantissa part with the same multiplier as the multiplication of a decimal floating point, by dividing a mantissa part of the floating point into two parts on bits, and executing in time division the addition and rounding processes of a partial product. CONSTITUTION:The partial product data supplied to input latches 11-14 are sent to adders 11 and 12 via a bus 20 for addition. The input data of specific bits only are supplied to adders 11 and 12, and a sum of partial products is obtained and stored to a feedback latch 14. Then this latch data is fed back again to the input terminal at one side of the adder. While another partial product or ''0'' is supplied to the other input terminal of the same adder. Such a process of addition is repeated by shifting the addition in timing until a total sum is obtained. The carry of the adder 12 is applied to the adder 11 by a controller 16 via a carry selection circuit 13. When the total sum is obtained, the data on a rounding subject bit corresponding to the total sum is supplied to a rounding circuit 15 from the latch 14. Thus the rounding data is obtained and supplied to the adder 11 to perform the rounding addition.
TL;DR: A floating point arithmetic system with rounding anticipation including an arithmetic unit (22) for arithmetically combining two mantissas; a carry circuit (24) for determining whether the sum will overflow upon the addition of two mussas and whether the difference will have a leading zero upon the subtraction of two nussas; the subtrahend in subtraction and the augend in addition include guard, round, and sticky digits as discussed by the authors.
Abstract: A floating point arithmetic system (10) with rounding anticipation including an arithmetic unit (22) for arithmetically combining two mantissas; a carry circuit (24) for determining whether the sum will overflow upon the addition of two mantissas and whether the difference will have a leading zero upon the subtraction of two mantissas; the subtrahend in subtraction and the augend in addition include guard, round, and sticky digits; a rounding circuit (24) is responsive to the carry circuit for rounding the least significant digit of the sum when the sum will overflow and for designating for rounding the guard digit of the sum when the sum will not overflow, for designating for rounding the round digit of the difference when the difference will have a leading zero, and for designating for rounding the guard digit of the difference when the difference will not have a leading zero; and means (24) for introducing to the arithmetic unit at the designated digit during the arithmetic combining of the two mantissas an amount equal to one-half the radix to effect the rounding during the arithmetic operation.
TL;DR: The chopping arithmetic is shown to match with the simplest software, which calculates round-off errors, and the effective length of digits of the accumulator is detected theoretically to this software.
Abstract: Models of algorithms of floating-point addition are designed for chopping, correctly rounding and augmentation rounding arithmetics with finite-length accumulators. The chopping arithmetic is shown to match with the simplest software, which calculates round-off errors. The effective length of digits of the accumulator is detected theoretically to this software.
TL;DR: The first method transforms products to sums and applies one of the known methods for rounding exact summation in time complexity O( n 2 ) with n processors ( n denoting the “length” of the expression).
Abstract: We propose two parallel algorithms for the rounding exact evaluation of sums of products. The first method transforms products to sums and applies one of the known methods for rounding exact summation in time complexity O( n 2 ) with n processors ( n denoting the “length” of the expression). The second method approximates the products as well as the sum and has average time complexity O( ld ( n )) for n /2 processors and has average time complexity O( n ) viewed as a sequential algorithm.
TL;DR: In this article, a heat zero bit counter 45 counts the number of zeros continuous from the highest digit of an input mantissa and delivers the count result in parallel to the actuation of a rounding circuit 44.
Abstract: PURPOSE:To obtain a floating point adder circuit which works at a high speed to attain the denormalization, by realizing the parallel actuations of circuit component elements and therefore decreasing the number of signal transmission lines between an input and an output. CONSTITUTION:A heat zero bit counter 45 counts the number of zeros continuous from the highest digit of an input mantissa and delivers the count result in parallel to the actuation of a rounding circuit 44. The exponent part replacement circuits 46 and 47 perform subtractions from an input exponent by means of the number of head zero bits, i.e., the output of the counter 45 and the value less than said zero bit number by 1 and delivers two types of replacement exponent value. In this subtraction mode, the correction of the exponent part due to an overflow phenomenon of a mantissa part produced at a preceding stage is also carried out at a time. A selector 48 usually selects the output of the counter 45 and then selects an input exponent 42 owing to the denormalization in case an underflow phenomenon occurs with one of both subtractors 46 and 47 that delivers smaller replacement exponent value. Then a selector 49 selects value 0 as the final output exponent.
TL;DR: Mr Miller is right, of course, to emphasise the need for improving the service provided for many of the authors' patients (not only those with ingrowing toenails) and to suggest that its best siting is in general practice.
Abstract: of lesions amenable to liquid nitrogen cryosurgical treatment in general practice in 1973.4 Previously I had treated many of my long suffering patients with ingrowing toenails with various traditional methods, with the poor results and high recurrence rate described by Mr Miller. The difference in the results obtained when I started to use the liquid nitrogen cryospray was remarkable. Thirty eight patients with 41 ingrowing toenails have been treated since then. At first I simply froze the infected area around the ingrowing toenail, allowing the typical cryolesion to develop and slough off. The recurrence rate was about one in three, not much better than I expected from the traditional methods of treatment. Seven years ago I started to treat the recurrences more ruthlessly after freezing the affected tissues, at first by curetting and later by excising the thawing, infected paronychial fold together with a narrow strip of the ingrowing nail, as well as part of the infected pulp; the results improved considerably. This has been the treatment of choice for the last 12 patients who have been followed up for at least 18 months, with no recurrences to date. Twenty nine patients have been followed up for over 18 months; five were lost to follow up and four were treated less than a year ago so their results cannot be assessed yet. This follow up and refinement of the technique used are in keeping with Mr Miller's expressed requirements. Mr Miller is right, of course, to emphasise the need for improving the service provided for many of our patients (not only those with ingrowing toenails) and to suggest that its best siting is in general practice. This would not only benefit the patient and the doctor but also ease the pressure on the hospital service and allow economic savings. Many patients whose conditions are amenable are already treated in general practice. Undoubtedly, many more patients would be treated by their own doctors if only the present disincentives to doing more for our patients could be removed.
TL;DR: This chapter presents a parallel sorting algorithm for the shared memory SIMD computer that uses n 1− e processors, where 0 e > 1, to sort a sequence of n integers in 0( n log n ) time, for a cost of 1 that is optimal.
Abstract: This chapter describes the shared-memory SIMD computers. SIMD computers are divided into two broad categories according to the way used by the processors to communicate and exchange data. In one category, the processors communicate through an interconnection network such as the linear array, the perfect shuffle, the mesh, the tree, and the cube. The other category comprises those computers in which the processors communicate through a shared memory. The chapter presents a parallel sorting algorithm for the shared memory SIMD computer. It uses n 1− e processors, where 0 e > 1, to sort a sequence of n integers in 0( n log n ) time, for a cost of 0( n log n ) that is optimal. The parameter e is quite important as it depends on the number of available processors on a given parallel computer. The rounding should be done pessimistically. The real n 1− e representing the number of processors used by an algorithm should be rounded down to ensure that the resulting integer does not exceed the actual number of available processors. By contrast, the real n e representing the worst-case running time of an algorithm should be rounded up to ensure that the resulting integer is not smaller than the true worst-case running time.