TL;DR: This paper presents an algorithm for calculating a faithful rounding of a vector of floating-point numbers, which adapts to the condition number of the sum, and proves certain constants used in the algorithm to be optimal.
Abstract: Given a vector of floating-point numbers with exact sum $s$, we present an algorithm for calculating a faithful rounding of $s$, i.e., the result is one of the immediate floating-point neighbors of $s$. If the sum $s$ is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e., it is fast for mildly conditioned sums with slowly increasing computing time proportional to the logarithm of the condition number. All statements are also true in the presence of underflow. The algorithm does not depend on the exponent range. Our algorithm is fast in terms of measured computing time because it allows good instruction-level parallelism, it neither requires special operations such as access to mantissa or exponent, it contains no branch in the inner loop, nor does it require some extra precision: The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain constants used in the algorithm are proved to be optimal.
TL;DR: This paper shows that under a strict feasibility assumption, an approximate solution of the semidefinite program is sufficient to obtain a rational decomposition, and quantifies the relation between the numerical error versus the rounding tolerance needed.
TL;DR: An algorithm for calculating the rounded-to-nearest result of $s:=\sum p_i$ for a given vector of floating-point numbers $p_i$, as well as algorithms for directed rounding, working for huge dimensions.
Abstract: In Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm for calculating the rounded-to-nearest result of $s:=\sum p_i$ for a given vector of floating-point numbers $p_i$, as well as algorithms for directed rounding. A special algorithm for computing the sign of $s$ is given, also working for huge dimensions. Assume a floating-point working precision with relative rounding error unit eps . We define and investigate a $K$-fold faithful rounding of a real number $r$. Basically the result is stored in a vector $\mathtt{Res}_{
u}$ of $K$ nonoverlapping floating-point numbers such that $\sum\mathtt{Res}_{
u}$ approximates $r$ with relative accuracy $\mathtt{eps}^K$, and replacing $\mathtt{Res}_K$ by its floating-point neighbors in $\sum\mathtt{Res}_{
u}$ forms a lower and upper bound for $r$. For a given vector of floating-point numbers with exact sum $s$, we present an algorithm for calculating a $K$-fold faithful rounding of $s$ using solely the working precision. Furthermore, an algorithm for calculating a faithfully rounded result of the sum of a vector of huge dimension is presented. Our algorithms are fast in terms of measured computing time because they allow good instruction-level parallelism, they neither require special operations such as access to mantissa or exponent, they contain no branch in the inner loop, nor do they require some extra precision. The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain constants used in the algorithms are proved to be optimal.
TL;DR: In this paper, the authors study two common situations where the flexibility of FPGAs allows one to design application-specific floating-point operators which are more efficient and more accurate than those offered by processors and GPUs.
Abstract: This article studies two common situations where the flexibility of FPGAs allows one to design application-specific floating-point operators which are more efficient and more accurate than those offered by processors and GPUs. First, for applications involving the addition of a large number of floating-point values, an ad-hoc accumulator is proposed. By tailoring its parameters to the numerical requirements of the application, it can be made arbitrarily accurate, at an area cost comparable to that of a standard floating-point adder, and at a higher frequency. The second example is the sum-of-product operation, which is the building block of matrix computations. A novel architecture is proposed that feeds the previous accumulator out of a floating-point multiplier whose rounding logic has been removed, again improving the area/accuracy tradeoff. These architectures are implemented within the FloPoCo generator, freely available under the LGPL.
TL;DR: A new method based on an analytical approach is presented for the case of linear-time-invariant (LTI) systems that allows the automatic determination of the signal-to-quantization-noise-ratio (SQNR) expression at the system output according to the fixed-point data format.
Abstract: One of the most important stages of floating-point to fixed-point conversion, is the evaluation of the fixed-point specification accuracy. This evaluation is required to optimize the data word-length according to accuracy constraints. Classical methods for accuracy evaluation are based on fixed-point simulations but they lead to very long optimization times.A new method based on an analytical approach is presented for the case of linear-time-invariant (LTI) systems. The use of this method in data word-length minimization processes reduces significantly the optimization time compared to simulation based methods. Our approach allows the automatic determination of the signal-to-quantization-noise-ratio (SQNR) expression at the system output according to the fixed-point data format. This method is valid for recursive and non-recursive LTI systems and takes account of the quantization modes (truncation or rounding). The theoretical concepts and the different methodology stages are explained. Then, the ability to efficiently evaluate the fixed-point specification accuracy is demonstrated through examples.
TL;DR: A system and method for unbiased rounding away from, or toward zero comprising apparatus for truncating N bits from an original M bit input number, and apparatus for adding the equivalent value of '½' to the M - N bit number unless the input number is negative, or positive, respectively, and the N truncated bits represent exactly ½ as discussed by the authors.
Abstract: A system and method for unbiased rounding away from, or toward, zero comprising apparatus for truncating N bits from an original M bit input number thereby to provide a M - N bit number, and apparatus for adding the equivalent value of '½' to the M - N bit number unless the input number is negative, or positive, respectively, and the N truncated bits represent exactly ½.
TL;DR: A floated-point fused dot-product unit is presented that performs single-precision floating-point multiplication and addition operations on two pairs of data in a time that is only 150% the time required for a conventional floating- point multiplication.
Abstract: A floating-point fused dot-product unit is presented that performs single-precision floating-point multiplication and addition operations on two pairs of data in a time that is only 150% the time required for a conventional floating-point multiplication. When placed and routed in a 45 nm process, the fused dot-product unit occupied about 70% of the area needed to implement a parallel dot-product unit using conventional floating-point adders and multipliers. The speed of the fused dot-product is 27% faster than the speed of the conventional parallel approach. The numerical result of the fused unit is more accurate because one rounding operation is needed versus at least three for other approaches.
TL;DR: Practical rounding rules are proposed to be used with the existing imputation methods to obtain usable imputations with small biases for estimation of means and correlations and asymptotic biases of marginal means and slope coefficients under plausible models are calculated.
Abstract: Since the 1990s, imputation methods have become increasingly accessible in standard software that typically assume a multivariate normal (MVN) distribution for incompletely observed variables. When these variables are not normally distributed but rather categorical (binary or ordinal), practitioners are often advised to round the MVN imputations to the nearest integer, but this simple procedure can lead to biased estimates. We propose practical rounding rules to be used with the existing imputation methods (e.g., under MVN) to obtain usable imputations with small biases for estimation of means and correlations. The rounding rules are calibrated in the sense that values reimputed for observed data have distributions similar to those of the observed data. Calibration in this sense is a form of posterior predictive check that can be used to evaluate any imputation procedure. It is readily implemented by duplicating the data and comparing the distributions of observed and imputed data. We calculate asymptotic...
TL;DR: An algorithm for emulating the fused multiply-and-add operator and an iterative algorithm for computing the correctly rounded sum of a set of floating-point numbers under mild assumptions are presented.
Abstract: Rounding to odd is a nonstandard rounding on floating-point numbers. By using it for some intermediate values instead of rounding to nearest, correctly rounded results can be obtained at the end of computations. We present an algorithm for emulating the fused multiply-and-add operator. We also present an iterative algorithm for computing the correctly rounded sum of a set of floating-point numbers under mild assumptions. A variation on both previous algorithms is the correctly rounded sum of any three floating-point numbers. This leads to efficient implementations, even when this rounding is not available. In order to guarantee the correctness of these properties and algorithms, we formally proved them by using the Coq proof checker.
TL;DR: It is proved that the upper complexity bound for both schemes is O((√(n ln m)/δ)ln n) iterations of a gradient-type method, where n and m are the sizes of the corresponding linear programming problems.
Abstract: In this paper, we propose new efficient gradient schemes for two non-trivial classes of linear programming problems. These schemes are designed to compute approximate solutions with relative accuracy δ. We prove that the upper complexity bound for both schemes is O((√(n ln m)/δ)ln n) iterations of a gradient-type method, where n and m (n
TL;DR: A rounding of the natural LP relaxation is conducted to show that the full-information budgeted-allocation problem can be approximated to within 4/3: the known lower-bound on the integrality gap is matched.
Abstract: We build on the work of Andelman & Mansour and Azar, Birnbaum, Karlin, Mathieu & Thach Nguyen to show that the full-information (i.e., offline) budgeted-allocation problem can be approximated to within 4/3: we conduct a rounding of the natural LP relaxation, for which our algorithm matches the known lower-bound on the integrality gap.
TL;DR: This paper develops a family of super-linearly convergent LP solvers based on proximal minimization schemes using Bregman divergences that exploit the underlying graphical structure, and so scale well to large problems.
Abstract: A large body of past work has focused on the first-order tree-based LP relaxation for the MAP problem in Markov random fields. This paper develops a family of super-linearly convergent LP solvers based on proximal minimization schemes using Bregman divergences that exploit the underlying graphical structure, and so scale well to large problems. All of our algorithms have a double-loop character, with the outer loop corresponding to the proximal sequence, and an inner loop of cyclic Bregman divergences used to compute each proximal update. The inner loop updates are distributed and respect the graph structure, and thus can be cast as message-passing algorithms. We establish various convergence guarantees for our algorithms, illustrate their performance, and also present rounding schemes with provable optimality guarantees.
TL;DR: The flow cost functions that are used in these formulations result in providing integer optimal solutions despite the absence of integrality constraints for a large subset of RWA input instances, while also minimizing the total number of used wavelengths.
Abstract: We design and implement various algorithms for solving the static RWA problem with the objective of minimizing the maximum number of requested wavelengths based on LP relaxation formulations. We present a link formulation, a path formulation and a heuristic that breaks the problem in the two constituent subproblems and solves them individually and sequentially. The flow cost functions that are used in these formulations result in providing integer optimal solutions despite the absence of integrality constraints for a large subset of RWA input instances, while also minimizing the total number of used wavelengths. We present a random perturbation technique that is shown to increase the number of instances for which we find integer solutions, and we also present appropriate iterative fixing and rounding methods to be used when the algorithms do not yield integer solutions. We comment on the number of variables and constraints these formulations require and perform extensive simulations to compare their performance to that of a typical min-max congestion formulation.
TL;DR: Afloating-point fused FFT Butterfly unit is presented that performs single-precision butterfly floating-point operation in a time that is only 87% the time required for a conventional floating- point butterfly.
Abstract: This paper extends the consideration of fused floating-point arithmetic to operations that are frequently encountered in DSP. The fast Fourier transform is a case in point, it uses a complex butterfly operation. For a radix-2 implementation, the butterfly consists of a complex multiply followed by the complex addition and subtraction of the same pair of data. These butterfly operations can be implemented with two fused primitives, a fused two-term inner product and a fused add subtract unit. A floating-point fused FFT Butterfly unit is presented that performs single-precision butterfly floating-point operation in a time that is only 87% the time required for a conventional floating-point butterfly. When placed and routed in a 45 nm process, the fused FFT Butterfly unit occupied about 72% of the area needed to implement a floating-point butterfly using conventional floating-point adders and multipliers. The numerical result of the fused butterfly unit is more accurate because fewer rounding operations are needed.
TL;DR: A 4-approximation algorithm for the problem of placing a fewest guards on a 1.5D terrain so that every point of the terrain is seen by at least one guard is presented in this paper.
Abstract: We present a 4-approximation algorithm for the problem of placing a fewest guards on a 1.5D terrain so that every point of the terrain is seen by at least one guard. This improves on the currently best approximation factor of 5. Our method is based on rounding the linear programming relaxation of the corresponding covering problem. Besides the simplicity of the analysis, which mainly relies on decomposing the constraint matrix of the LP into totally balanced matrices, our algorithm, unlike previous work, generalizes to the weighted and partial versions of the basic problem.
TL;DR: This paper shows by experiment that, based on a word length reduced integral image, the Viola and Jones face detector for a VGA resolution can work on a 16-bit CPU (i.s.o. 27 bits, which becomes 32 bits on byte-oriented CPUs), enabling face detection on a wider range of platforms.
Abstract: The integral image is an image containing accumulated sums of pixel values taken from an input image. It is an important concept for multi-scale image processing algorithms, for it provides a very economic way to compute the sum of pixel values in any rectangular input image region. Unfortunately, the integral image requires a large binary word length to represent the accumulated sums. This is an issue for platforms having limited memory, power, and bandwidth like in mobile devices. Our paper deals with two methods for word length reduction, involving computation through the overflow and rounding with error diffusion. We show by experiment that, based on a word length reduced integral image, the Viola and Jones face detector for a VGA resolution can work on a 16-bit CPU (i.s.o. 27 bits, which becomes 32 bits on byte-oriented CPUs), enabling face detection on a wider range of platforms.
TL;DR: In this paper, the propagation of rounding errors in large-eddy simulation is studied and it is shown that instantaneous flowfields produced by largeeddy simulation are partially controlled by these rounding errors and depend on multiple parameters.
Abstract: This paper studies the propagation of rounding errors in large-eddy simulation and shows that instantaneous flowfields produced by large-eddy simulation are partially controlled by these rounding errors and depend on multiple parameters: number of processors used for parallel simulation (even in an explicit code), changes in initial conditions (even of the order of machine accuracy), machine precision (simple, double, or quadruple), etc. Using a laminar Poiseuille pipe flow, a fully developed turbulent channel flow, and a complex burner geometry as test cases, results show that only turbulent flows exhibit a high sensitivity to these parameters. These results confirm that large-eddy simulation reflects the true nature of turbulence insofar as it may exponentially amplify infinitely small perturbations on initial conditions in time. However, they highlight an often overlooked limitation of large-eddy simulation in terms of validation and prediction of unsteady phenomena.
TL;DR: An optimal schemes for allocating bits of fine-grained scalable video sequences among multiple senders streaming to a single receiver, and a heuristic algorithm that produces near-optimal solutions for the multiple-frame case, and runs an order of magnitude faster than the optimal one.
Abstract: We present optimal schemes for allocating bits of fine-grained scalable video sequences among multiple senders streaming to a single receiver. This allocation problem is critical in optimizing the perceived quality in peer-to-peer and distributed multi-server streaming environments. Senders in such environments are heterogeneous in their outgoing bandwidth and they hold different portions of the video stream. We first formulate and optimally solve the problem for individual frames, then we generalize to the multiple frame case. Specifically, we formulate the allocation problem as an optimization problem, which is nonlinear in general. We use rate-distortion models in the formulation to achieve the minimum distortion in the rendered video, constrained by the outgoing bandwidth of senders, availability of video data at senders, and incoming bandwidth of receiver. We show how the adopted rate-distortion models transform the nonlinear problem to an integer linear programming (ILP) problem. We then design a simple rounding scheme that transforms the ILP problem to a linear programming (LP) one, which can be solved efficiently using common optimization techniques such as the Simplex method. We prove that our rounding scheme always produces a feasible solution, and the solution is within a negligible margin from the optimal solution. We also propose a new algorithm (FGSAssign) for the single-frame allocation problem that runs in O(nlog n) steps, where n is the number of senders. We prove that FGSAssign is optimal. Furthermore, we propose a heuristic algorithm (mFGSAssign) that produces near-optimal solutions for the multiple-frame case, and runs an order of magnitude faster than the optimal one. Because of its short running time, mFGSAssign can be used in real time. Our experimental study validates our analytical analysis and shows the effectiveness of our allocation algorithms in improving the video quality.
TL;DR: The main result is a construction with surface area O(radicd), matching the lower bound up to a constant factor of 2radic2pi/eap3, and it is shown that the bounds are optimal within constant factors for rectangular lattices.
Abstract: What is the least surface area of a shape that tiles Ropfd under translations by Zopfd? Any such shape must have volume 1 and hence surface area at least that of the volume-1 ball, namely Omega(radicd). Our main result is a construction with surface area O(radicd), matching the lower bound up to a constant factor of 2radic2pi/eap3. The best previous tile known was only slightly better than the cube, having surface area on the order of d. We generalize this to give a construction that tiles Ropfd by translations of any full rank discrete lattice Lambda with surface area 2piparV-1parfb, where V is the matrix of basis vectors of Lambda, and par.parfb denotes the Frobenius norm. We show that our bounds are optimal within constant factors for rectangular lattices. Our proof is via a random tessellation process, following recent ideas of Raz in the discrete setting. Our construction gives an almost optimal noise-resistant rounding scheme to round points in Ropfd to rectangular lattice points.
TL;DR: In this paper, the problem of finding tight affine lower bound functions for multivariate polynomials, which may be employed when global optimisation problems involving polynomial are solved with a branch and bound method, is addressed.
Abstract: This paper addresses the problem of finding tight affine lower bound functions for multivariate polynomials, which may be employed when global optimisation problems involving polynomials are solved with a branch and bound method. These bound functions are constructed by using the expansion of the given polynomial into Bernstein polynomials. The coefficients of this expansion over a given box yield a control point structure whose convex hull contains the graph of the given polynomial over the box. We introduce a new method for computing tight affine lower bound functions based on these control points, using a linear least squares approximation of the entire control point structure. This is demonstrated to have superior performance to previous methods based on a linear interpolation of certain specially chosen control points. The problem of how to obtain a verified affine lower bound function in the presence of uncertainty and rounding errors is also considered. Numerical results with error bounds for a series of randomly-generated polynomials are given.
TL;DR: In this paper, the first frequency band with high energy signal and the second band with low energy signal is manipulated initially to obtain a sequence of manipulated values, and then the sequence is rounded in order to create a generated rounding error spectrum so that the rounding error with the spectrum created would have higher energy in the first spectrum as compared to the second spectrum.
Abstract: FIELD: physics. ^ SUBSTANCE: said utility invention relates to signal processing in the form of successive values, e.g., audio signal samples or video signal samples, which, in particular, are especially suitable for lossless coding applications. During processing of a signal containing a sequence of discrete values, having the first frequency band with high energy signal and the second frequency band with low energy signal, the sequence of discrete values is manipulated initially (202) to obtain a sequence of manipulated values so that at least one of the manipulated values would be different from an integer. After that, the sequence of manipulated values is rounded (204) to obtain a sequence of rounded manipulated values. Rounding is performed in order to create a generated rounding error spectrum so that the rounding error with the spectrum created would have higher energy in the first frequency band as compared to the second frequency band. ^ EFFECT: obtaining particularly efficient coding. ^ 19 cl, 24 dwg
TL;DR: The authors' approximation ratio extends to the minimum cardinality Steiner network problem, where k denotes the average vertex demand, and the algorithm exploits rounding properties of the first two linear programs in iterated rounding.
Abstract: We present the best known algorithms for approximating the minimum cardinality undirected k-edge connected spanning subgraph. For simple graphs our approximation ratio is 1 + 1/2k + O(1/k2). The more precise version of our bound requires k ≥ 7, and for all such k it improves the longstanding bound of Cheriyan and Thurimella, 1 + 2/(k + 1) [2]. The improvement comes in two steps: First we show that for simple k-edge connected graphs, any laminar family of degree k sets is smaller than the general bound (n(1 + 3/k + O(1/k√k)) versus 2n). This immediately implies that iterated rounding improves the bound of [2]. Our second step improves iterated rounding by finding good edges for rounding. For multigraphs our approximation ratio is 1 + 21/11k k. This improves the previous bound 1 + 2/k [6]. It is of interest since it is known that for some constant c > 0, an approximation ratio ≤ 1 + c/k implies P = NP. Our approximation ratio extends to the minimum cardinality Steiner network problem, where k denotes the average vertex demand. The algorithm exploits rounding properties of the first two linear programs in iterated rounding.
TL;DR: A novel fragile watermarking scheme based on discrete cosine transform (DCT) using particle swarm optimization (PSO) algorithm is presented and results show the feasibility of employing PSO for watermarked and the accuracy of this novel method.
Abstract: In this paper, a novel fragile watermarking scheme based on discrete cosine transform (DCT) using particle swarm optimization (PSO) algorithm is presented Embedding watermarks in frequency domain can usually be achieved by modifying the least significant bits (LSBs) of the transformation coefficients After embedding process is completed, a number of rounding errors appear due to conversion of real numbers into integers in the process of transformation of image from frequency domain to spatial domain A population based stochastic optimization technique (PSO) is proposed to correct these rounding errors Simulation results show the feasibility of employing PSO for watermarking and the accuracy of this novel method
TL;DR: In this paper, an active-set line-search Newton method for solving large-scale instances of a class of multiple material minimum compliance problems is presented, which is modeled with a convex objective function and linear constraints.
Abstract: This paper presents an implementation of an active-set line-search Newton method intended for solving large-scale instances of a class of multiple material minimum compliance problems. The problem is modeled with a convex objective function and linear constraints. At each iteration of the Newton method, one or two linear saddle point systems are solved. These systems involve the Hessian of the objective function, which is both expensive to compute and completely dense. Therefore, the linear algebra is arranged such that the Hessian is not explicitly formed. The main concern is to solve a sequence of closely related problems appearing as the continuous relaxations in a nonlinear branch and bound framework for solving discrete minimum compliance problems. A test-set consisting of eight discrete instances originating from the design of laminated composite structures is presented. Computational experiments with a branch and bound method indicate that the proposed Newton method can, on most instances in the test-set, take advantage of the available starting point information in an enumeration tree and resolve the relaxations after branching with few additional function evaluations. Discrete feasible designs are obtained by a rounding heuristic. Designs with provably good objective functions are presented.
TL;DR: A new algorithm of reduction of coordination constraints effect on relayspsila operating times by optimal choice of pickup currents using linear programming is presented, which resulted in substantial reduction in operating times as compared to the first method.
Abstract: We present a new algorithm of reduction of coordination constraints effect on relayspsila operating times by optimal choice of pickup currents using linear programming Consequently, significant minimization of time multiplier settings was achieved This algorithm was tested on the 8-bus system and IEEE 14-bus network Its efficiency was investigated on a coordination problem solved in three different cases In the first one, the linear programming technique was used but the effect of coordination constraints on operating times was not considered In the second one, the nonlinear method was used to optimize the relay settings In the third one, the effect of coordination constraints was reduced by application of the proposed algorithm, which resulted in substantial reduction in operating times as compared to the first method In comparison to nonlinear methods, the new algorithm gives almost the same value of objective function However, it does not require the rounding of pickup currents to their nearest available values
TL;DR: The experiments show that diffusion rounding has an asymmetric characteristic for Ioff due to the differing significance of source/drain junctions on device threshold voltage, so simple weighting function models for Ionmiddot and Ioff are proposed to account for the diffusion rounding effects.
Abstract: Due to aggressive scaling of device feature size to improve circuit performance in the sub-wavelength lithography regime, both diffusion and poly gate shapes are no longer rectilinear. Diffusion rounding occurs most notably where the diffusion shapes are not perfectly rectangular, including common L and T-shaped diffusion layouts to connect to power rails. This paper investigates the impact of the non-rectilinear shape of diffusion (i.e., sloped diffusion or diffusion rounding) on circuit performance (delay and leakage). Simple weighting function models for Ionmiddot and Ioff to account for the diffusion rounding effects are proposed, and compared with TCAD simulation. Our experiments show that diffusion rounding has an asymmetric characteristic for Ioff due to the differing significance of source/drain junctions on device threshold voltage. Therefore, we can model Ionmiddot and Ioff as a function of slope angle and direction. The proposed models match well with TCAD simulation results, with less than 2% and 6% error in Ionmiddot and Ioff, respectively.
TL;DR: In this article, the authors studied integer rounding properties of various systems of linear inequalities to gain insight about the algebraic properties of Rees algebras of monomial ideals and monomial subrings.
Abstract: The aim of this paper is to study integer rounding properties of various systems of linear inequalities to gain insight about the algebraic properties of Rees algebras of monomial ideals and monomial subrings. We study the normality and Gorenstein property—as well as the canonical module and the a-invariant—of Rees algebras and subrings arising from systems with the integer rounding property. We relate the algebraic properties of Rees algebras and monomial subrings with integer rounding properties and present a duality theorem.
TL;DR: The R22SDF was more efficient than the R4SDC in terms of throughput per area due to a simpler controller and an easier balanced rounding scheme, and it is shown that balanced stage rounding is an appropriate rounding scheme for pipeline FFT processors.
Abstract: This paper presents optimized implementations of two different pipeline FFT processors on Xilinx Spartan-3 and Virtex-E FPGAs. Different optimization techniques and rounding schemes were explored. The implementation results achieved better performance with lower resource usage than prior art. The 16-bit 1024-point FFT with the R22SDF architecture had a maximum clock frequency of 95.2 MHz and used 2802 slices on the Spartan-3, a throughput per area ratio of 0.034 Msamples/s/slice. The R4SDC architecture ran at 123.8 MHz and used 4409 slices on the Spartan-3, a throughput per area ratio of 0.028 Msamples/s/slice. The R22SDF was more efficient than the R4SDC in terms of throughput per area due to a simpler controller and an easier balanced rounding scheme. This paper also shows that balanced stage rounding is an appropriate rounding scheme for pipeline FFT processors.
TL;DR: In this article, an ap- proximate solution method based on particle swarm optimization proposed by Kennedy et al. is proposed for nonlinear integer programming problems, which is applicable to discrete optimiza- tion problems by incoporating a new method for finding initial search points, the rounding of values obtained by the move scheme and the revision of move methods.
Abstract: In this research, focusing on nonlinear integer programming problems, we propose an ap- proximate solution method based on particle swarm optimization proposed by Kennedy et al. To be more specific, we develop a new particle swarm optimiza- tion method which is applicable to discrete optimiza- tion problems by incoporating a new method for gen- erating initial search points, the rounding of values obtained by the move scheme and the revision of move methods. Furthermore, we show the eciency of the proposed particle swarm optimization method by comparing it with an existing method through the application of them into the numerical examples.
TL;DR: A new method for power-of-two quantization that uses companded delta modulation structure to perform the quantization and shows a performance that is comparable to that of full precision adaptive filters.