TL;DR: In this article, the improvement achieved by using integer programming over simple coefficient rounding in the design of finite impulse response (FIR) filters with discrete coefficients is most significant when the discrete coefficient space is the powers-of-two space or when a specification is to be met with a given coefficient word length by increasing the filter length.
Abstract: It is demonstrated that the improvement achieved by using integer programming over simple coefficient rounding in the design of finite impulse response (FIR) filters with discrete coefficients is most significant when the discrete coefficient space is the powers-of-two space or when a specification is to be met with a given coefficient word length by increasing the filter length. Both minimax and least square error criteria are considered.
TL;DR: In this article, a floating point, integrated, arithmetic circuit is organized around a file format having a floating-point numeric domain exceeding that of any single or double precision floating point numbers, long or short integer words of BCD data upon which it must operate.
Abstract: A floating point, integrated, arithmetic circuit is organized around a file format having a floating point numeric domain exceeding that of any single or double precision floating point numbers, long or short integer words of BCD data upon which it must operate. As a result the circuit has a greater reliability, range and precision than ever previously achieved without entailing additional circuit complexity. Reliability is further enhanced by a systematic three bit rounding field, and by including means for detecting every error or exception condition with an optional expected response provided thereto by hardware. As a result of such organization, an unexpected increase of capacity is achieved wherein transcendental functions can be computed totally in hardware, and whereby mixed mode arithmetic can be implemented without difficulty. The numeric processor also includes a programmable shifter capable of arbitrary numbers of bit and byte shifts in a single clock cycle, as well as an arithmetic unit capable of implementing multiplication, division, modulo reduction and square roots directly in hardware.
TL;DR: In this paper, the deformation of contact area of grinding and regulating wheels is remarked to be an important factor in rounding processes and stability of system and the diagram not only defines analytically chatter free conditions, but rounding effect for operational settings and dissolves difference between analysis and experiment.
TL;DR: An attempt is made to analyze the fixed point error performance of the normalized ladder algorithm for autoregressive system identification, assuming rounding arithmetic, and a simplified theoretical expression for predicting the average bias in the estimated reflection coefficients at any stage.
Abstract: In this paper, an attempt is made to analyze the fixed point error performance of the normalized ladder algorithm for autoregressive system identification, assuming rounding arithmetic. The paper contains two main results; i) A simplified theoretical expression for predicting the average bias in the estimated reflection coefficients at any stage, and ii) a recursive relation for the average error, arising from finite precision arithmetic, in the squared residuals. The second result illustrates how the errors made in one stage affect the errors in the succeeding stages. Simulations are performed to check the theorical predictions.
TL;DR: The digital computer is lifted from aSetting corresponding to the real numbers to a setting corresponding to function spaces, and the algorithms of ultra-arithmetic are given in an explicitly implementable form for the cases both of the Fourier basis and the Chebyshev basis.
TL;DR: In this paper, an algorithm for approximating dominated solutions of linear recursions with initial conditions is given, and the stability of this algorithm is investigated and expressions for the truncation and rounding errors are derived.
Abstract: An algorithm is given for approximating dominated solutions of linear recursions, when some initial conditions are given. The stability of this algorithm is investigated and expressions for the truncation and rounding errors are derived. A number of practical questions concerning the algorithm is considered, and several numerical examples sustain the theory.
TL;DR: It is shown that the integer round-up and round-down properties can be checked through a finite process and motivates a new and elementary proof of Fulkerson's Pluperfect Graph Theorem.
Abstract: LetA be a nonnegative integral matrix with no zero columns. Theinteger round-up property holds forA if for each nonnegative integral vectorw, the solution value to the integer programming problem min{1 źy: yA ź w, y ź 0, y integer} is obtained by rounding up to the nearest integer the solution value to the corresponding linear programming problem min{1 źy: yA ź w, y ź 0}. Theinteger round-down property is similarly defined for a nonnegative integral matrixB with no zero rows by considering max{1 źy: yB ≤ w, y ź 0, y integer} and its linear programming correspondent. It is shown that the integer round-up and round-down properties can be checked through a finite process. The method of proof motivates a new and elementary proof of Fulkerson's Pluperfect Graph Theorem.
TL;DR: A technique for the estimation of correlation coefficients between truncation errors in the flow graphs of Winograd short-length DFT algorithms is presented and results obtained are in close agreement with the corresponding simulation results compared to those predicted by Patterson and McClellan and hence justify the validity of assumptions made in the analysis.
Abstract: A technique for the estimation of correlation coefficients between truncation errors in the flow graphs of Winograd short-length DFT algorithms is presented. The fixed-point error analysis of basic modules (corresponds to the Winograd short-length algorithm) is carried out in sign-magnitude (or 1's complement) arithmetic by assuming correlation between truncation errors. The errors introduced by coefficient quantization and rounding after multiplication are also studied. The results obtained are in close agreement with the corresponding simulation results compared to those predicted by Patterson and McClellan and hence justify the validity of assumptions made in the analysis.
TL;DR: In this article, the error spectrum caused by rounding off the coefficients is shaped through the discrete optimization so to be effectively cancelled, in the L 2 norm sense, by other factors connected in cascade.
Abstract: This paper suggests a discrete optimization method which can solve high order FIR filter problems within a practically reasonable computing time. The error spectrum caused by rounding off the coefficients is shaped through the discrete optimization so to be effectively cancelled, in the L 2 norm sense, by other factors connected in cascade. In order to save computing time, the error spectrum is evaluated in a time domain, and parameters are divided into small groups during searching for the optimum solution. LPF and BPF design examples, with 200 lengths, show the proposed approach can reduce coefficient wordlengths by 2 or 3 bits, compared with results obtained by only rounding off. The execution time on the general purpose computer, ACOS System 900, is 97 seconds.
TL;DR: Avernier address scale as mentioned in this paper reduces the number of addressable memory locations required for numerical look-up tables by dropping least significant bits as the vernier scale moves from one ROM table to another.
Abstract: A vernier address scale reduces the number of addressable memory locations required for numerical look-up tables. Read-only memories (ROMs) store the data of linear or non-linear functions. Decoders determine which ROM is selected and advantage is taken of accuracy improvement as numbers become large by dropping least significant bits as the vernier address scale moves from one ROM table to another. Accuracy is further improved by using a method of one-half level quantization step for rounding. This reduces the size of numerical tables for math processing of reciprocals, roots of numbers, powers of numbers, logarithms, trigonometric and exponential functions.
TL;DR: In this article, the error estimates and stability theorems for Cramer's rule and Gaussian elimination for solving two linear equations in two unknowns under data perturbations and rounding errors of floating-point arithmetic are established.
Abstract: New condition numbers and stability constants for the numerical behaviour of Cramer's rule and Gaussian elimination for solving two linear equations in two unknowns under data perturbations and rounding errors of floating-point arithmetic are established. By these means fundamental error estimates and stability theorems are proved. The error estimates are illustrated by a series of numerical examples.
TL;DR: In this article, a comparison between one-step methods for time dependent problems, working with the total value of the nodal unknowns vector, and a variant of this, which consists in calculating only the increment of the vector for the time interval Δt.
Abstract: From the point of view of the generation of rounding errors a comparison is made between: 1 the usual algorithm for one step methods for time dependent problems, working with the total value of the nodal unknowns vector; and 2 a variant of this, which consists in calculating only the increment of the vector for the time interval Δt. The superiority of the second algorithm is concluded on the basis of both theoretical and empirical arguments.
TL;DR: Close agreement between theoretically predicted results and those of simulations justifies the validity of assumptions made in the analysis.
Abstract: point arithmetic with the accuracy provided by the machine. A built-in program was used to generate white complex noise (with its real and imaginary parts being distributed uniformly between +l/fi) for feeding the two programs. A stable estimate of average NSR of each module was obtained for two different word lengths using the results of the two programs. v. RESULTS AND DISCUSSIONS For the purpose of calculation of the NSR, tl and t2 are assumed to be same. It is evident from Table I11 that the correlated model predicts the NSR much closer to the experimental results (for both 8 and 12 bits) than those predicted by the uncorrelated model. It is observed that the predicted output NSR falls short at most by 10.888 percent while those predicted by the uncorrelated model fall short by as much as 4 1.98 percent from the corresponding simulation results. While analyzing for the variances due to the coefficient quantization and rounding due to multiplication, the earlier authors [6] are not consistent in assuming the multiplying coefficients like +1 and ?j as noiseless. However, the present analysis considers these coefficients as noiseless for all the modules studied. The values of b and c thus obtained here are slightly different from those of Patterson and McClellan [ 61. Two truncation errors connected through more than one path, though correlated, have very small correlation coefficients. Since the analytical computation of such a correlation coefficient is more involved, it is neglected. The slight difference between simulation results and those of some correlated models could be due to such an assumption. Overscaling a module has been adopted to facilitate scaling procedure which, in turn, has introduced extra noise variance. Long-length GW DFT algorithms are derived using these shortlength DFT algorithms. Patterson and McClellan [6] have shown that the order in which the component algorithms are used affects the error performance. For minimum error performance, they should be used in such a way that the numbers (o;Ni)/(Ni - 1) are in increasing order. With respect to the output noise variance per unit, the modules are arranged in descending order as 7, 5, 9, 16, 3, 8, 4, and 2 which, according to the uncorrelated model, was 5, 7, 9, 3, 16, 8,4, and 2. Close agreement between theoretically predicted results and those of simulations justifies the validity of assumptions made in the analysis. The error performance of large-N GW algorithms under the assumption of correlation between truncation errors has been studied elsewhere [ 81.
TL;DR: The method described permits any set of numbers within an allowable range to be rounded to any number of significant figures, especially useful as a subroutine in programs that generate large quantities of figures that require rounding.
Abstract: The method described permits any set of numbers within an allowable range to be rounded to any number of significant figures. This is accomplished by generation of an array (rounding position) data bank, which is unique for each combination of the following variables: (1) Chosen decimal position; (2) number range of the number to be rounded; and (3) number of significant figures to which this number is to be rounded. It is especially useful as a subroutine in programs that generate large quantities of figures (such as reserve or resource tonnages) that require rounding.
TL;DR: In this paper, an iterative method is described for improving the accuracy of the solution of linear operator equations when there are errors in computing the operator and errors in reading the right-hand side (with cut-off or rounding) in computing systems with contracted place mesh.
Abstract: An iterative method is described for improving the accuracy of the solution of linear operator equations when there are errors in computing the operator and errors in reading the right-hand side (with cut-off or rounding) in computing systems with contracted place mesh.
TL;DR: In this paper, a data processing system uses improved procedures for handling various arithmetic operations, such as floating point arithmetic mantissa calculations, where a look-ahead carry bit generator stage (13) is used for such purpose to reduce the overall mantissa calculation time.
Abstract: A data processing system uses improved procedures for handling various arithmetic operations. Thus, in floating point arithmetic mantissa calculations the system uses a novel technique for inserting a round bit ROUND into the appropriate bit (bit 23) of the floating point result wherein a look-ahead carry bit generator stage (13) is used for such purpose to reduce the overall mantissa calculation time. Further, the system utilizes logic which operates in parallel with the floating point exponent calculation logic for effectively predicting whether or not an overflow or underflow condition will be present in the final exponent result and for informing the system which such conditions have occurred. Moreover, the system utilizes a simplified technique for computing the extension bits which are required in multiply and divide computations, wherein a programmable array logic unit and a four-bit adder unit are combined for such purposes.
TL;DR: A numerically controlled machine tool for producing tangential entry into a compensated contour and for the tangential leaving from a compensated contour is described in this article. But this tool does not handle the rounding of intersections between two contours.
Abstract: A numerically controlled machine tool for producing tangential entry into a compensated contour and for the tangential leaving from a compensated contour. The numerically controlled machine tool also produces the interior or exterior rounding of intersections between two contours.
TL;DR: A carry-lookahead adder for fast addition of two operands in generalized binary numbers is developed and truncation errors for this type of representation are examined and rounding algorithms are presented to reduce these errors.
Abstract: In this correspondence we derive algorithms for multioperand addition of Koren's generalized number system. A carry-lookahead adder for fast addition of two operands in generalized binary numbers is developed. Truncation errors for this type of representation are examined and rounding algorithms are presented to reduce these errors.
TL;DR: The proposed design technique provides a simple calculation that ensures that the data sampling rate is consistent with the control system's accuracy specification or the fatigue life of its actuators, and enables the selection of a suitable machine wordlength or machine.
Abstract: As a result of the enormous impact of microprocessors, electronic engineers, with sometimes only a cursory back ground in control theory, are being involved in direct digital-control (D.D.C.) system design. There appears to be a real need for an easily understood and simply imple mented comprehensive design technique for single-input d.d.c. systems. The proposed design technique provides, first of all, a simple calculation that ensures that the data sampling rate is consistent with the control system's accuracy specification or the fatigue life of its actuators. Pulsed transfer-function design for a plant controller is based on two simple rules and a few standard frequency response curves, which are easily computed once and for all time. Structural resonances are eliminated by digital notch filters, the pole-zero locations of which are directly re lated to the frequency and bandwidth of an oscillatory mode; this is exactly as with analogue networks. In addition a computationally simple formula gives an upper bound on the amplitude of the control error (deviation) component due to multiplicative rounding effects in the digital computer; this thereby enables the selection of a suitable machine wordlength or machine. A distinct advantage of the proposed design technique is that its implementation does not necess arily involve a complex computer-aided-design facility.
TL;DR: It is shown that the Jordan method gives solutions comparable in accuracy with solutions by Gauss's method.
Abstract: A new method of analyzing rounding errors for Jordan-type methods of solving linear algebraic systems is presented. It is shown that the Jordan method gives solutions comparable in accuracy with solutions by Gauss's method.
TL;DR: Three parallel versions of this algorithm for the rounding exact summation of floating point numbers are proposed, namely a pipeline version, an algorithm similar to the exchange methods for sorting and a tree-like algorithm, associating a tree to the sum.
Abstract: Pichat and Bohlender studied an algorithm for the rounding exact summation of floating point numbers which can be executed on any floating point arithmetic unit. We propose parallel versions of this algorithm, namely a pipeline version, an algorithm similar to the exchange methods for sorting and a tree-like algorithm, associating a tree to the sum. For all these algorithms we discuss the properties, a multiprocessor architecture should have for an efficient implementation of an algorithm without restricting us to a special architecture.
TL;DR: In this article, the authors present a simple multiplicative congruential generator that produces numbers rectangularly distributed between 0 and 1, excluding the end points, with a cycle length exceeding 2 78 x 1013 so that even using 1000 random numbers per second continuously, the sequence would not repeat for over 880 years.
Abstract: Schrage (1979) has pointed out the advantages of pseudo-random generators that can be written in a high-level language and produce the same results on any machine. The generator that he presents, however, has the disadvantages: (1) like all simple multiplicative congruential generators, it does not work well at the extremes of the distribution-for any number produced that is less than 59499 x 10the next number will simply be 16807 times as much, and similarly at the top end; (2) on a 16-bit machine it has to use double precision arithmetic instead of integer arithmetic, which makes it very slow, and also uncertain that rounding errors could not occur. Our algorithm does not have these difficulties. We claim that it is reasonably short, reasonably fast, machine-independent, easily programmed in any language, and statistically sound. It has a cycle length exceeding 2 78 x 1013 so that even using 1000 random numbers per second continuously, the sequence would not repeat for over 880 years. Consequently we have tested only small parts of it, consisting of many millions of numbers nevertheless. However, there are theoretical grounds for expecting good results, and the results of the tests we have made have been so satisfactory, that we are prepared to extrapolate our experience and infer that the sequence is satisfactory throughout. The algorithm produces numbers rectangularly distributed between 0 and 1, excluding the end points.
TL;DR: This paper examined the influence of rounding errors on the two most popular types of nonparametric density estimator (i.e., kernels and generalized Fourier series) and showed that rounding introduces errors into the estimation of the parent density and its derivatives.
Abstract: All “continuous” data is rounded to some extent, and this rounding introduces errors into the estimation of the parent density and its derivatives. In this paper we examine the influence of rounding errors on the two most popular types of nonparametric density estimator—those based on kernels and those based on generalized Fourier series. Our results lead directly to several important conclusions which are very relevant to the practical problems of density estimation.