TL;DR: In this article, the cumulative effect of rounding or truncation errors on the basic arithmetic operations on digital computers is discussed, assuming that the reader has read the article on MATRIX COMPUTATIONS, since the results will be illustrated by examples from that area.
Abstract: In general, the basic arithmetic operations on digital computers are not exact but are subject to rounding or truncation errors. This article is concerned with the cumulative effect of these errors. It will be assumed that the reader has read the article on MATRIX COMPUTATIONS, since the results will be illustrated by examples from that area.
TL;DR: It is proved that if the d-regular multigraph does not contain more than /spl lfloor/d/2/spl rfloor/ copies of any 2-cycle then the decomposition into 0(n/sup 2/) pairs of cycle covers can be found, which comes handy in rounding a fractional solution of an LP relaxation of the maximum and minimum TSP problems.
Abstract: A directed multigraph is said to be d-regular if the indegree and outdegree of every vertex is exactly d. By Hall's theorem one can represent such a multigraph as a combination of at most n/sup 2/ cycle covers each taken with an appropriate multiplicity. We prove that if the d-regular multigraph does not contain more than /spl lfloor/d/2/spl rfloor/ copies of any 2-cycle then we can find a similar decomposition into 0(n/sup 2/) pairs of cycle covers where each 2-cycle occurs in at most one component of each pair. Our proof is constructive and gives a polynomial algorithm to find such decomposition. Since our applications only need one such a pair of cycle covers whose weight is at least the average weight of all pairs, we also give a simpler algorithm to extract a single such pair. This combinatorial theorem then comes handy in rounding a fractional solution of an LP relaxation of the maximum and minimum TSP problems. For maximum TSP, we obtain a tour whose weight is at least 2/3 of the weight of the longest tour, improving a previous 5/8 approximation. For minimum TSP we obtain a tour whose weight is at most 0.842log/sub 2/ n times the optimal, improving a previous 0.999log/sub 2/ n approximation. Utilizing a reduction from maximum TSP to the shortest superstring problem we obtain a 2.5-approximation algorithm for the latter problem which is again much simpler than the previous one. Other applications of the rounding procedure are approximation algorithms for maximum 3-cycle cover (factor 2/3, previously 3/5) and maximum asymmetric TSP with triangle inequality (factor 10/13, previously 3/4 ).
TL;DR: This work presents a linear programming (LP) based approach for solving the data association problem (DAP) in multiple target tracking using an iterated K-scan sliding window technique and presents a compact formulation of the DAP.
TL;DR: Application of the proposed procedure to adaptive filters realized in a Xilinx Virtex FPGA (field programmable gate array) has resulted in area reductions and power reductions and speed-up of up to 36% over common alternative design strategies.
Abstract: This paper introduces a design tool and its associated procedures for determining the sensitivity of outputs in a digital signal processing design to small errors introduced by rounding or truncation of internal variables. The proposed approach can be applied to both linear and nonlinear designs. By analyzing the resulting sensitivity values, the proposed procedure is able to determine an appropriate distinct word-length for each internal variable. Also in this paper, the power optimizing capabilities of word-length optimization are studied for the first time. Application of the proposed procedure to adaptive filters realized in a Xilinx Virtex FPGA (field programmable gate array) has resulted in area reductions of up to 80% combined with power reductions of up to 98% and speed-up of up to 36% over common alternative design strategies.
TL;DR: A unified presentation of a variety of results on the lifting of valid inequalities, as well as a standard procedure combining mixed integer rounding with lifting for the development of strong valid inequalities for knapsack and single node flow sets are presented.
Abstract: In this survey we attempt to give a uniÞed presentation of a variety of results on the lifting of valid inequalities, as well as a standard procedure combining mixed integer rounding with lifting for the development of strong valid inequalities for knapsack and single node sow sets. Our hope is that the latter can be used in practice to generate cutting planes for mixed integer programs. The survey contains essentially two parts. In the Þrst we present lifting in a very general way, emphasizing superadditive lifting which allows one to lift simultaneously different sets of variables. In the second, our procedure for generating strong valid inequalities consists of reduction to a knapsack set with a single continuous variable, construction of a mixed integer rounding inequality, and superadditive lifting. It is applied to several generalizations of the 0-1 single node sow set.
TL;DR: A hybrid method is presented, a hybrid method combining five of the previous methods which, for given z, detects the number k of roots near z and computes an including disc with in most cases a radius of the order of the numerical sensitivity of the root cluster.
TL;DR: It is applied to show that a few bits of precision can be saved in the floating-point division (FP-DIV) microarchitecture of the AMD-K7/spl trade/ microprocessor.
Abstract: Back in the 60's Goldschmidt presented a variation of Newton-Raphson iterations for division that is well suited for pipelining. The problem in using Goldschmidt's division algorithm is to present an error analysis that enables one to save hardware by using just the right amount of precision for intermediate calculations while still providing correct rounding. Previous implementations relied on combining formal proof methods (that span thousands of lines) with millions of test vectors. These techniques yield correct designs but the analysis is hard to follow and is not quite tight. We present a simple parametric error analysis of Goldschmidt's division algorithm. This analysis sheds more light on the effect of the different parameters on the error. In addition, we derive closed error formulae that allow to determine optimal parameter choices in four practical settings. We apply our analysis to show that a few bits of precision can be saved in the floating-point division (FP-DIV) microarchitecture of the AMD-K7/spl trade/ microprocessor. These reductions in precision apply to the initial approximation and to the lengths of the multiplicands in the multiplier. When translated to cost, the reductions reflect a savings of 10.6% in the overall cost of the FP-DIV microarchitecture.
TL;DR: A novel recursive-in-width depth-first tree search technique is presented for the design of lattice structure PR orthogonal filter banks subject to discrete coefficient value constraint and a frequency-response deterioration measure is developed to serve as a branching criterion.
Abstract: The lattice structure two-channel orthogonal filter bank structurally guarantees the perfect reconstruction (PR) property. Thus, it is eminently suitable for hardware realization even under severe coefficient quantization condition. Nevertheless, its frequency response is still adversely affected by coefficient quantization. In this paper, a novel recursive-in-width depth-first tree search technique is presented for the design of lattice structure PR orthogonal filter banks subject to discrete coefficient value constraint. A frequency-response deterioration measure is developed to serve as a branching criterion. At any node, the coefficient which will cause the largest deterioration in the frequency response of the filter when quantized is selected for branching. The improvement in the frequency response ripple magnitude achieved by our algorithm over that by simple rounding of coefficient values differs widely from example to example ranging from a fraction of a decibel to over 10 dB.
TL;DR: In this article, the problem of rounding a realvalued matrix into an integer-valued matrix to minimize an Lp-discrepancy measure between them has been studied, and it is shown that the problem is solvable in polynomial time if the matrix family is the union of two laminar families.
Abstract: We study the problem of rounding a real-valued matrix into an integer-valued matrix to minimize an Lp-discrepancy measure between them. To define the Lp-discrepancy measure, we introduce a family ${\cal F}$ of regions (rigid submatrices) of the matrix and consider a hypergraph defined by the family. The difficulty of the problem depends on the choice of the region family ${\cal F}$. We first investigate the rounding problem by using integer programming problems with convex piecewise-linear objective functions and give some nontrivial upper bounds for the Lp discrepancy. We propose "laminar family" for constructing a practical and well-solvable class of ${\cal F}$. Indeed, we show that the problem is solvable in polynomial time if ${\cal F}$ is the union of two laminar families. Finally, we show that the matrix rounding using L1 discrepancy for the union of two laminar families is suitable for developing a high-quality digital-halftoning software.
TL;DR: This article shows that rounding should not be used indiscriminately, and thus some caution should be exercised when rounding imputed values, particularly for dichotomous variables.
Abstract: With the advent of general purpose packages that support multiple imputation for analyzing datasets with missing data (e.g., Solas, SAS PROC MI, and S-Plus 6.0), we expect much greater use of multiple imputation in the future. For simplicity, some imputation packages assume the joint distribution of the variables in the multiple imputation model is multivariate normal, and impute the missing data from the conditional normal distribution for the missing data given the observed data. If the possibly missing data are not multivariate normal (say, binary), imputing a normal random variable can yield implausible values. To circumvent this problem, a number of methods have been developed, including rounding the imputed normal to the closest observed value in the dataset. We show that this rounding can cause biased estimates of parameters, whereas if the imputed value is not rounded, no bias would occur. This article shows that rounding should not be used indiscriminately, and thus some caution should be exercised when rounding imputed values, particularly for dichotomous variables.
TL;DR: This work introduces the concept of dew-point rounding that allows efficient implementation and reduced requirements for the quotient approximation and proposes the implementation of different versions of Goldschmidt's division algorithm with different pipeline depths.
Abstract: We propose optimized pipelined implementations for Goldschmidt's division algorithm with IEEE rounding based on Booth radix-8 multiplication. Compared to other FP-division algorithms, our implementations require fewer clock cycles and admit shorter clock periods. The considered optimizations for the quotient approximation are based on a careful general analysis of tight error bounds for the implementation and are accompanied by the utilization of redundant representations, partial compressions, injection-based rounding, and rectangular multipliers for the internal computations. To efficiently achieve IEEE compliant rounding, we introduce the concept of dew-point rounding that allows efficient implementation and reduced requirements for the quotient approximation. On this basis, we propose the implementation of different versions of Goldschmidt's division algorithm with different pipeline depths. None of these implementations requires a full-sized multiplier at any stage of the computations. In this way we reduce latency, cost, and enable increased throughput at a reasonable cost. We suggest a full range of pipelining depths: On one extreme is a 3-stage pipeline with a restart time that simply equals the latency minus the number of pipeline stages. On the other extreme is a fully pipelined design.
TL;DR: A floating point unit, a central processing unit and a method for adjusting the exponent of a floating point number are provided in this paper. But the exponent adjustment due to renormalization is not considered.
Abstract: A floating point unit, a central processing unit, and a method are provided for adjusting the exponent of a floating point number. During an addition or subtraction of two floating point numbers, the significand of the floating point result is rounded, and the exponent of the result may be adjusted due to normalization or renormalization. The exponent adjustment due to renormalization or the exponent adjustment due to normalization and renormalization is combined with the significand rounding operation.
TL;DR: In this paper, a method and system is used to determine the correct rounding of a floating point function and examine the portion of extra precision in the result known as the discriminant, indicating that standard rounding may give an incorrect result and further calculation is needed.
Abstract: A method and system is used to determine the correct rounding of a floating point function The method involves performing the floating point function to a higher precision than required and examining the portion of extra precision in the result known as the discriminant If a critical pattern is found in the discriminant, this indicates that standard rounding may give an incorrect result and further calculation is needed The method can work for various rounding modes and types of floating point representations The method can be implemented in a system as part of a processor instruction set or any combination of hardware, microcode, and software
TL;DR: The possibility of using computer algebra tools and interval methods to compute solutions which have guarantees on accuracy, e.g. which are not subject to unknown errors due to rounding or approximation, is explored.
Abstract: This paper is concerned with the problem of validation in the context of numerical computations in control. We explore the possibility of using computer algebra tools and interval methods to compute solutions which have guarantees on accuracy, e.g. which are not subject to unknown errors due to rounding or approximation. We demonstrate that this is possible for two common norms of a linear system (L 2 and L ∞ ) and H 2 -optimal controller synthesis. We further discuss some of the issues involved in achieving a validation property for other problems of controller synthesis.
TL;DR: This paper presents basic notions and main ideas of interval calculus and two examples of useful algorithms.
Abstract: Interval calculus is a relatively new branch of mathematics. Initially understood as a set of tools to assess the quality of numerical calculations (rigorous control of rounding errors), it became a discipline in its own rights today. Interval methods are usefull whenever we have to deal with uncertainties, which can be rigorously bounded. Fuzzy sets, rough sets and probability calculus can perform similar tasks, yet only the interval methods are able to (dis)prove, with mathematical rigor, the (non)existence of desired solution(s). Known are several problems, not presented here, which cannot be effectively solved by any other means.
This paper presents basic notions and main ideas of interval calculus and two examples of useful algorithms.
TL;DR: A floating-point multiplier with high speed and area efficient is presented and the effect of rounding on the area, speed, and accuracy for three different configurations is examined.
Abstract: In this paper, a floating-point multiplier with high speed and area efficient is presented The multiplier is designed, optimized, and implemented on an FPGA based system A comparison between the results of the proposed design and a previously reported one is provided The effect of rounding on the area, speed, and accuracy for three different configurations is examined
TL;DR: In this paper, the authors consider the problem of estimating the standard interval in the presence of rounding and show that the nominal confidence levels of standard intervals for μ and σ are much larger than actual coverage probabilities when the rounding is severe.
Abstract: In standard statistical analyses, data are assumed to be essentially exact. But indeed they are often obtained from a relatively crude gaging method and are thus intrinsically “rounded” to some nearest unit. The discussions in Lee and Vardeman (Lee, Chiang-Sheng, Vardeman, Stephen B. (2001). Interval estimation of a normal process mean from rounded data. Journal of Quality Technology 33:335–348.) and Lee and Vardeman (Lee, Chiang-Sheng, Vardeman, Stephen B. (2002). Interval estimation of a normal process standard deviation from rounded data. Communications in Statistics 31:13–34.) for a rounded sample from a single normal distribution established that nominal confidence levels of standard intervals for μ and σ are much larger than actual coverage probabilities when the rounding is severe. In this article we consider interval estimation in the balanced normal one-way random effects model. We demonstrate the deficiency of standard interval estimators in the presence of rounding and show how likelih...
TL;DR: In this article, a new SNR sensitivity was defined as an indicator of how the word length truncation of multiplier coefficients affects quality of the decoded image, and also proposed a new word length allocation method based on the sensitivity.
Abstract: Recently, the integer DCT (Int-DCT), which has the rounding operations in the lifting structure, is attracting many researchers' attention as an effective method for DCT based lossy / lossless unified coding. So far, focuses of the previous reports relevant to the Int-DCT have been limited to a few topics such as how to reduce the number of multipliers with the four point lossless Hadamard transform and the non-separable two dimensional LDCT. What seems to be lacking, however, is how to express multipliers' word length as short as possible for reduction of hardware complexity. This report defines a new "SNR sensitivity" as an indicator of how the word length truncation of multiplier coefficients affects quality of the decoded image, and also proposes a new word length allocation method based on the sensitivity. As a result, two [bit] in average shorter word length is attained under equivalent quality of the decoded image.
TL;DR: The proposed filter design method is based on integer programming (IP) and can be directly applied to any fixed-point FIR design specifications which can be formulated in a form of linear constraints on the filter coefficients.
TL;DR: Numerical results are presented to illustrate that it is possible to get very sharp error bounds of computed solutions of systems of linear equations whose coefficient matrices are symmetric and positive definite.
Abstract: This paper is concerned with the problem of verifying the accuracy of approximate solutions of systems of linear equations. Recently, fast algorithms for calculating guaranteed error bounds of computed solutions of systems of linear equations have been proposed using the rounding mode controlled verification method and the residual iterative verification method. In this paper, a new verification method for systems of linear equations is proposed. Using this verification method, componentwise verified error bounds of approximate solutions of systems of linear equations can be calculated. Numerical results are presented to illustrate that it is possible to get very sharp error bounds of computed solutions of systems of linear equations whose coefficient matrices are symmetric and positive definite.
TL;DR: This work points out how this analysis of reciprocals can be useful in analyzing certain reciprocal algorithms, and shows how the approach can be trivially adapted to the reciprocal square root function.
Abstract: One approach to testing and/or proving correctness of a floating-point algorithm computing a function f is based on finding input floating-point numbers a such that the exact result f(a) is very close to a "rounding boundary", i.e. a floating-point number or a midpoint between them. We show how to do this for the reciprocal function by utilizing prime factorizations. We present the method and show examples, as well as making a fairly detailed study of its expected and worst-case behavior. We point out how this analysis of reciprocals can be useful in analyzing certain reciprocal algorithms, and also show how the approach can be trivially adapted to the reciprocal square root function.
TL;DR: A new formulation of the side-chain positioning problem as a semidefinite program is investigated and two novel rounding schemes are introduced, which provide theoretical justifications for their effectiveness under various input conditions.
Abstract: Side-chain positioning is a central component of the protein structure prediction problem and has been the focus of a large body of research. The problem is NP-complete; in fact, it is even inapproximable. In practice, it is tackled by a variety of general search techniques and specialized heuristics. We investigate a new formulation of the problem as a semidefinite program. We introduce two novel rounding schemes and provide theoretical justifications for their effectiveness under various input conditions. We also present computational results on simulated data that show that our method outperforms a recently introduced linear programming approach on a wide range of inputs. Beyond the context of side-chain positioning, we are hopeful that our rounding schemes, which are very general, will be applicable elsewhere.
TL;DR: The concept of social capital refers to the ways in which the authors' lives are made more productive through social ties, and the value that accrues from social networks and contacts can enhance the productivity of individuals and groups.
TL;DR: In this article, the authors consider the problem of minimizing the sum of ordering cost and inventory cost over an infinite time horizon, where orders are of equal size and the reorder interval is an integer number.
TL;DR: In order to reduce the required chip area for the sequential processing of 8 K complex data, a DRAM-based pipelined commutator architecture is used, and this brings 60% chip size reduction over the flip-flop approach.
Abstract: In this paper, we propose an implementation method for a single-chip 8192 complex point FFT/IFFT in terms of sequential data processing. In order to reduce the required chip area for the sequential processing of 8 K complex data, a DRAM-based pipelined commutator architecture is used, and this brings 60% chip size reduction over the flip-flop approach. The 8192-point FFT/IFFT consists of cascaded blocks with six stages of radix-4 and one stage of radix-2. Since each stage requires rounding of the resulting bits while maintaining the proper S/N ratio, the convergent block floating point (CBFP) algorithm is used for the effective internal bit rounding. For the IFFT operation, we a use control signal that changes the radix-4 butterfly operator and coefficients saved in ROM without additional blocks. The proposed FFT/IFFT architecture was fabricated using a 0.35 /spl mu/m standard CMOS process.
TL;DR: In this paper, a method for the hydro-erosive rounding of an edge of a part, particularly an edge in a duct of a high pressure-resistant part, and a use thereof is described.
Abstract: The invention relates to a method for the hydro-erosive rounding of an edge of a part, particularly an edge in a duct of a high pressure-resistant part, and a use thereof According to said method, a liquid to which abrasive elements are added is directed along the edge that is to be rounded In order to optimize the result of the rounding process, a high-viscosity liquid is used as a liquid (10) The inventive method is used for rounding parts of a fuel injection system
TL;DR: A sequential implementation of the algorithm is proposed, and the execution times and hardware requirements are estimated for single and double-precision floating-point computations, for radix r=128, showing that powering can be computed with similar performance as high-radix CORDIC algorithms.
Abstract: A high-radix composite algorithm for the computation of the powering function (X/sup Y/) is presented. The algorithm consists of a sequence of overlapped operations: (i) digit-recurrence logarithm, (ii) left-to-right carry-free (LRCF) multiplications, and (iii) online exponential. A redundant number system is used, and the selection in (i) and (iii) is done by rounding except from the first iteration, when selection by table look-up is necessary to guarantee the convergence of the recurrences. A sequential implementation of the algorithm is proposed, and the execution times and hardware requirements are estimated for single and double-precision floating-point computations, for radix r=128, showing that powering can be computed with similar performance as high-radix CORDIC algorithms.
TL;DR: The on-line testing methods show new property that consists in rejection of authentic results, which reduces reliability of a check and ways of rise of reliability for an on-lines testing method are offered at processing the approximate data.
Abstract: . Processing of the approximate data has the featuresessentially changing conditions for the on-line testing of computingcircuits. A significant part of the errors produced by faults of thecomputing circuits doesno t reducer eilabliiyt of calculaiont s results.The on-line testing methods show new property that consists inrejection of authentic results. It reduces reliability of a checkni g T. heon-line testing methodsr educing probability of rejection of authenticresults are offered. 1. Introduction In the on-line testing there were uniform requirements tochecking methods of computing and control circuits. Theserequirements were formulated in the theory of self-checkingcircuits. The main thesis consists in detection of each fault ofthe given class of faults at occurrence of the first error [1, 2].The self-checking circuits of logical and arithmetic devicesusing parity and residue check techniques were developed [3-7]. These methods provide diagnosing of a fault “stuck-at 0”or “stuck-at 1” on the first error and, as a whole, have highdetection probability of typical faults for computing circuits.However processing of the approximate data essentiallydistinguishes computing circuits from control schemes. Onthe one hand, rounding of the data may lead to calculation ofnon-authentic results on the faultless circuit. On the otherhand, authentic results may be obtained on the faulty circuitbecause of loss of errors at the rounding of numbers. Theseconditions reduce efficiency of the on-line testing methodssatisfying traditional requirements.In section 2th e approximate calculations featuresth anti fluencethe on-line testing methods of computing circuits areexamined.The factors lowering influence of computing circuit faults onreliability of calculations results are defined in section 3.The appearance probability of an essential error is estimatedin section 4.In section 5 the problem of reilability estimation foran on-linetesting method is solved.In section 6 ways of rise of reliability for an on-line testingmethod are offered at processing the approximate data. Theon-line testing methods with the increased reliability areconsidered in section 7.
TL;DR: A conjecture of Muller is proved according to which the proportion of numbers in Fk with no FP-reciprocal approaches ½ - 3/2log4/3 ≈ 0.06847689 as k → ∞.