TL;DR: The results show that deep learning training using BFLOAT16 tensors achieves the same state-of-the-art (SOTA) results across domains as FP32 tensors in the same number of iterations and with no changes to hyper-parameters.
Abstract: This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence; e.g., IEEE 754 compliant half-precision floating point (FP16) requires hyper-parameter tuning. In this paper, we discuss the flow of tensors and various key operations in mixed precision training, and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16. We have implemented a method to emulate BFLOAT16 operations in Tensorflow, Caffe2, IntelCaffe, and Neon for our experiments. Our results show that deep learning training using BFLOAT16 tensors achieves the same state-of-the-art (SOTA) results across domains as FP32 tensors in the same number of iterations and with no changes to hyper-parameters.
TL;DR: This paper proposes a possible implementation of a BF16 multiply-accumulation operation that relaxes several IEEE Floating-Point Standard features to afford low-cost hardware implementations and shows that this approach achieves the same network-level accuracy as using IEEE single-precision arithmetic ("FP32") for less than half the datapath area cost and with greater throughput.
Abstract: Bfloat16 ("BF16") is a new floating-point format tailored specifically for high-performance processing of Neural Networks and will be supported by major CPU and GPU architectures as well as Neural Network accelerators. This paper proposes a possible implementation of a BF16 multiply-accumulation operation that relaxes several IEEE Floating-Point Standard features to afford low-cost hardware implementations. Specifically, subnorms are flushed to zero; only one non-standard rounding mode (Round-Odd) is supported; NaNs are not propagated; and IEEE exception flags are not provided. The paper shows that this approach achieves the same network-level accuracy as using IEEE single-precision arithmetic ("FP32") for less than half the datapath area cost and with greater throughput.
TL;DR: The approach is in the spirit of other continuous computation packages such as Chebfun, and yields an algorithm which requires the computation of "continuous" matrix factorizations such as the LU and QR decompositions of vector-valued functions.
TL;DR: An optimized 16-bit format that has 6 exponent bits and 9 fraction bits, derived from a study of the range of values encountered in DL applications, that preserves the accuracy of DL networks and enables realization of a compact power-efficient computation engine.
Abstract: The resilience of Deep Learning (DL) training and inference workloads to low-precision computations, coupled with the demand for power-and area-efficient hardware accelerators for these workloads, has led to the emergence of 16-bit floating point formats as the precision of choice for DL hardware accelerators. This paper describes our optimized 16-bit format that has 6 exponent bits and 9 fraction bits, derived from a study of the range of values encountered in DL applications. We demonstrate that our format preserves the accuracy of DL networks, and we compare its ease-of-use for DL against IEEE-754 half-precision (5 exponent bits and 10 fraction bits) and bfloat16 (8 exponent bits and 7 fraction bits). Further, our format eliminated sub-normals and simplifies rounding modes and handling of corner cases. This streamlines floating-point unit logic and enables realization of a compact power-efficient computation engine.
TL;DR: In this article, a quasi-polynomial-time O(log 2 k/log log k )-approximation algorithm for the Directed Steiner Tree (DST) problem is presented.
Abstract: In the Directed Steiner Tree (DST) problem we are given an n-vertex directed edge-weighted graph, a root r , and a collection of k terminal nodes. Our goal is to find a minimum-cost subgraph that contains a directed path from r to every terminal. We present an O(log^2 k /log log k )-approximation algorithm for DST that runs in quasi-polynomial-time, i.e., in time n^polylog(k). By making standard complexity assumptions, we show the matching lower bound of Omega(log^2 k/loglogk) for the class of quasi-polynomial time algorithms, meaning that our approximation ratio is asymptotically the best possible. This is the first improvement on the DST problem since the classical quasi-polynomial-time O (log^3 k ) approximation algorithm by Charikar et al. [SODA’98J. Algorithms’99]. (The paper erroneously claims an O (log^2 k ) approximation due to a mistake in prior work.) Our approach is based on two main ingredients. First, we derive an approximation preserving reduction to the Group Steiner Tree on Trees with Dependency Constraint (GSTTD) problem. Compared to the classic Group Steiner Tree on Trees problem, in GSTTD we are additionally given some dependency constraints among the nodes in the output tree that must be satisfied. The GSTTD instance has quasi-polynomial size and logarithmic height. We remark that, in contrast, Zelikovsky’s heigh-reduction theorem [Algorithmica’97] used in all prior work on DST achieves a reduction to a tree instance of the related Group Steiner Tree (GST) problem of similar height, however losing a logarithmic factor in the approximation ratio. Our second ingredient is an LP-rounding algorithm to approximately solve GSTTD instances, which is inspired by the framework developed by [Rothvob, Preprint’11; Friggstad et al., IPCO’14]. We consider a Sherali-Adams lifting of a proper LP relaxation of GSTTD. Our rounding algorithm proceeds level by level from the root to the leaves, rounding and conditioning each time on a proper subset of label variables. The limited height of the tree and small number of labels on root-to-leaf paths guarantee that a small enough (namely, polylogarithmic) number of Sherali-Adams lifting levels is sufficient to condition up to the leaves. We believe that our basic strategy of combining label-based reductions with a round-and-condition type of LP-rounding over hierarchies might find applications to other related problems.
TL;DR: In this paper, the Izhikevich neuron model is used to demonstrate that rounding has an important role in producing accurate spike timings from explicit ODE solution algorithms, and fixed-point arithmetic with stochastic rounding consistently results in smaller errors compared to single precision floating-point and fixed point arithmetic with round-to-nearest across a range of neuron behaviours and ODE solvers.
Abstract: Although double-precision floating-point arithmetic currently dominates high-performance computing, there is increasing interest in smaller and simpler arithmetic types. The main reasons are potential improvements in energy efficiency and memory footprint and bandwidth. However, simply switching to lower-precision types typically results in increased numerical errors. We investigate approaches to improving the accuracy of reduced-precision fixed-point arithmetic types, using examples in an important domain for numerical computation in neuroscience: the solution of Ordinary Differential Equations (ODEs). The Izhikevich neuron model is used to demonstrate that rounding has an important role in producing accurate spike timings from explicit ODE solution algorithms. In particular, fixed-point arithmetic with stochastic rounding consistently results in smaller errors compared to single precision floating-point and fixed-point arithmetic with round-to-nearest across a range of neuron behaviours and ODE solvers. A computationally much cheaper alternative is also investigated, inspired by the concept of dither that is a widely understood mechanism for providing resolution below the least significant bit (LSB) in digital signal processing. These results will have implications for the solution of ODEs in other subject areas, and should also be directly relevant to the huge range of practical problems that are represented by Partial Differential Equations (PDEs).
TL;DR: QPyTorch as discussed by the authors is a low-precision arithmetic simulation framework built natively in PyTorch and supports a variety of combinations of precisions, number formats, and rounding options.
Abstract: Low-precision training reduces computational cost and produces efficient models. Recent research in developing new low-precision training algorithms often relies on simulation to empirically evaluate the statistical effects of quantization while avoiding the substantial overhead of building specific hardware. To support this empirical research, we introduce QPyTorch, a low-precision arithmetic simulation framework. Built natively in PyTorch, QPyTorch provides a convenient interface that minimizes the efforts needed to reliably convert existing codes to study low-precision training. QPyTorch is general, and supports a variety of combinations of precisions, number formats, and rounding options. Additionally, it leverages an efficient fused-kernel approach to reduce simulator overhead, which enables simulation of large-scale, realistic problems. QPyTorch is publicly available at https://github.com/Tiiiger/QPyTorch.
TL;DR: An orthogonal frequency division multiple access (OFDMA)-based Multi-access Edge Computing (MEC) system, consisting of one serving node and multiple users each with an inelastic computation task of a non-negligible task processing duration and aNon-Negligible computation result size is considered.
Abstract: We consider an orthogonal frequency division multiple access (OFDMA)-based Multi-access Edge Computing (MEC) system, consisting of one serving node and multiple users each with an inelastic computation task of a non-negligible task processing duration and a non-negligible computation result size. A joint uplink/downlink sub-channel, bit and time allocation problem is investigated to minimize the energy consumption, which happens to be a very challenging non-convex mixed integer nonlinear programming (MINLP) problem. We equivalently convert it into a convex MINLP problem by using the McCormick envelope, and develop two low-complexity algorithms to obtain two suboptimal solutions. Specifically, one is based on continuous relaxation with greedy rounding and the other one bases on penalty convex–concave procedure. Simulation results show the advantages of our suboptimal solutions.
TL;DR: This work considers an assortment optimization problem where a customer chooses a single item from a sequence of sets shown to her, while limited inventories constrain the items offered to customers over time, and derives a polynomial-time approximation algorithm which earns at least 1-ln(2-1/e), or 0.51, of the optimum.
Abstract: We consider an assortment optimization problem where a customer chooses a single item from a sequence of sets shown to her, while limited inventories constrain the items offered to customers over time. In the special case where all of the assortments have size one, our problem captures the online stochastic matching with timeouts problem. For this problem, we derive a polynomial-time approximation algorithm which earns at least 1-ln(2-1/e), or 0.51, of the optimum. This improves upon the previous-best approximation ratio of 0.46, and furthermore, we show that it is tight. For the general assortment problem, we establish the first constant-factor approximation ratio of 0.09 for the case that different types of customers value items differently, and an approximation ratio of 0.15 for the case that different customers value each item the same. Our algorithms are based on rounding an LP relaxation for multi-stage assortment optimization, and improve upon previous randomized rounding schemes to derive the tight ratio of 1-ln(2-1/e).
TL;DR: The experimental results show that the proposed scheme achieved performance that was superior to the state-of-the-art schemes, resulting in embedding rates up to 1.40 bpp and average PSNRs of approximately 47.60 dB.
TL;DR: In-depth analysis of parameters in the quantization aware training, the process of simulating precision loss in the forward pass by quantizing and dequantizing tensors and locations of precision loss simulation to evaluate how they affect accuracy of deep neural network aimed at performing efficient calculations on resource-constrained devices are performed.
Abstract: This paper focuses on convolution neural network quantization problem. The quantization has a distinct stage of data conversion from floating-point into integer-point numbers. In general, the process of quantization is associated with the reduction of the matrix dimension via limited precision of the numbers. However, the training and inference stages of deep learning neural network are limited by the space of the memory and a variety of factors including programming complexity and even reliability of the system. On the whole the process of quantization becomes more and more popular due to significant impact on performance and minimal accuracy loss. Various techniques for networks quantization have been already proposed, including quantization aware training and integer arithmetic-only inference. Yet, a detailed comparison of various quantization configurations, combining all proposed methods haven't been presented yet. This comparison is important to understand selection of quantization hyperparameters during training to optimize networks for inference while preserving their robustness. In this work, we perform in-depth analysis of parameters in the quantization aware training, the process of simulating precision loss in the forward pass by quantizing and dequantizing tensors. Specifically, we modify rounding modes, input preprocessing, output data signedness, bitwidth of the quantization and locations of precision loss simulation to evaluate how they affect accuracy of deep neural network aimed at performing efficient calculations on resource-constrained devices.
TL;DR: This article reformulates the problem such that trading convergence of the state vector against increasing switching costs is possible, which then allows to conserve known convergence properties of previous approaches for Mixed-Integer Optimal Control approximations.
Abstract: This article investigates a class of Mixed-Integer Optimal Control Problems (MIOCPs) with switching costs. We introduce the problem class of Minimal-Switching-Cost Optimal Control Problems (MSCP) with an objective function that consists of two summands, a continuous term depending on the state vector and an encoding of the discrete switching costs. State vectors of Mixed-Integer Optimal Control problems can be approximated by means of sequences of roundings of appropriate relaxations, which often result in a switching cost blow-up. We reformulate the problem such that trading convergence of the state vector against increasing switching costs is possible, which then allows to conserve known convergence properties of previous approaches for Mixed-Integer Optimal Control approximations. To demonstrate the findings and applicability, we present validating numerical results and the trade-off capability of our approach for a benchmark problem.
TL;DR: It is shown that whenever iterated rounding can be applied to a problem with some slack, there is a randomized procedure that returns an integral solution that satisfies the guarantees of iterate rounding and also has concentration properties.
Abstract: We give a general method for rounding linear programs that combines the commonly used iterated rounding and randomized rounding techniques. In particular, we show that whenever iterated rounding can be applied to a problem with some slack, there is a randomized procedure that returns an integral solution that satisfies the guarantees of iterated rounding and also has concentration properties. We use this to give new results for several classic problems where iterated rounding has been useful.
TL;DR: This paper gives the first Polynomial-Time Approximation Scheme (PTAS) for the case where the allowable subsets of served customers are characterized by a laminar matroid with constant depth, a special case of the well-known matroid Bayesian online selection problem.
Abstract: In the Bayesian online selection problem, the goal is to find a pricing algorithm for serving a sequence of arriving buyers that maximizes the expected social-welfare (or revenue) subject to different types of structural constraints. The focus of this paper is on the case where the allowable subsets of served customers are characterized by a laminar matroid with constant depth. This problem is a special case of the well-known matroid Bayesian online selection problem studied in [Kleinberg & Weinberg, 2012], when the underlying matroid is laminar. We give the first Polynomial-Time Approximation Scheme (PTAS) for the above problem. Our approach is based on rounding the solution of a hierarchy of linear programming relaxations that can approximate the optimum online solution with any degree of accuracy as well as a concentration argument that shows our rounding does not have a considerable loss in the expected social welfare. We also introduce the production constrained problem, for which the allowable subsets of served customers are characterized by joint production/shipping constraints that can be modeled by a special case of laminar matroids. We show that by leveraging the special structure of this problem, and using a similar approach as before, we can design a PTAS for this problem too even in the case where the depth of the laminar matroid is not constant. To achieve our result we exploit the negative dependence property of the selection rule in the lower-levels of the laminar family.
TL;DR: In this article, a primal-dual technique was applied to solve the k-means problem with penalties by a different rounding method, i.e., employing a deterministic rounding algorithm, instead of using the randomized rounding algorithm used in the previous approximation schemes.
Abstract: The clustering problem has been paid lots of attention in various fields of compute science. However, in many applications, the existence of noisy data poses a big challenge for the clustering problem. As one way to deal with clustering problem with noisy data, clustering with penalties has been studied extensively, such as the k-median problem with penalties and the facility location problem with penalties. As far as we know, there is only one approximation algorithm for the k-means problem with penalties with ratio \(25+\epsilon \). All the previous related results for the clustering with penalties problems were based on the techniques of local search, LP-rounding, or primal-dual, which cannot be applied directly to the k-means problem with penalties to get better approximation ratio than \(25+\epsilon \). In this paper, we apply primal-dual technique to solve the k-means problem with penalties by a different rounding method, i.e., employing a deterministic rounding algorithm, instead of using the randomized rounding algorithm used in the previous approximation schemes. Based on the above method, an approximation algorithm with ratio \(19.849+\epsilon \) is presented for the k-means problem with penalties.
TL;DR: A novel LP formulation is presented to the problem of computing personalized reserve prices in eager second price auctions without having any assumption on valuation distributions and a rounding procedure which achieves a (1+2(√2-1)e ∼2-2)-1≅0.684-approximation improves over the 1/2- approximation Algorithm due to Roughgarden and Wang.
Abstract: We study the problem of computing personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a dataset that contains the submitted bids of n buyers in a set of auctions and the goal is to return personalized reserve prices r that maximize the revenue earned on these auctions by running eager second price auctions with reserve r. We present a novel LP formulation to this problem and a rounding procedure which achieves a (1+2(√2-1)e√2-2)-1≅0.684-approximation. This improves over the 1/2-approximation Algorithm due to Roughgarden and Wang. We show that our analysis is tight for this rounding procedure. We also bound the integrality gap of the LP, which bounds the performance of any algorithm based on this LP.
TL;DR: In this article, a polynomial-time approximation algorithm for the assortment problem with timeouts was proposed, which achieves an approximation ratio of 1-ln(2-1/e), or 0.51.
Abstract: We consider an assortment optimization problem where a customer chooses a single item from a sequence of sets shown to her, while limited inventories constrain the items offered to customers over time. In the special case where all of the assortments have size one, our problem captures the online stochastic matching with timeouts problem. For this problem, we derive a polynomial-time approximation algorithm which earns at least 1-ln(2-1/e), or 0.51, of the optimum. This improves upon the previous-best approximation ratio of 0.46, and furthermore, we show that it is tight. For the general assortment problem, we establish the first constant-factor approximation ratio of 0.09 for the case that different types of customers value items differently, and an approximation ratio of 0.15 for the case that different customers value each item the same. Our algorithms are based on rounding an LP relaxation for multi-stage assortment optimization, and improve upon previous randomized rounding schemes to derive the tight ratio of 1-ln(2-1/e).
TL;DR: A polynomial-time algorithm is obtained for the two-criterion dimensionality reduction problem when the two criteria are increasing concave functions and new low-rank properties of extreme point solutions to semi-definite programs are proved.
Abstract: We model "fair" dimensionality reduction as an optimization problem. A central example is the fair PCA problem: the input data is divided into $k$ groups, and the goal is to find a single $d$-dimensional representation for all groups for which the maximum variance (or minimum reconstruction error) is optimized for all groups in a fair (or balanced) manner, e.g., by maximizing the minimum variance over the $k$ groups of the projection to a $d$-dimensional subspace. This problem was introduced by Samadi et al. (2018) who gave a polynomial-time algorithm which, for $k=2$ groups, returns a $(d+1)$-dimensional solution of value at least the best $d$-dimensional solution. We give an exact polynomial-time algorithm for $k=2$ groups. The result relies on extending results of Pataki (1998) regarding rank of extreme point solutions to semi-definite programs. This approach applies more generally to any monotone concave function of the individual group objectives. For $k>2$ groups, our results generalize to give a $(d+\sqrt{2k+0.25}-1.5)$-dimensional solution with objective value as good as the optimal $d$-dimensional solution for arbitrary $k,d$ in polynomial time. Using our extreme point characterization result for SDPs, we give an iterative rounding framework for general SDPs which generalizes the well-known iterative rounding approach for LPs. It returns low-rank solutions with bounded violation of constraints. We obtain a $d$-dimensional projection where the violation in the objective can be bounded additively in terms of the top $O(\sqrt{k})$-singular values of the data matrices. We also give an exact polynomial-time algorithm for any fixed number of groups and target dimension via the algorithm of Grigoriev and Pasechnik (2005). In contrast, when the number of groups is part of the input, even for target dimension $d=1$, we show this problem is NP-hard.
TL;DR: The authors propose an energy-efficient scheme for multiplying 2's-complement binary numbers with two least significant bits (LSBs), and demonstrate how the DLSB multipliers can be effectively used as a building block for the implementation of larger multiplications, delivering area and energy savings.
Abstract: Multiplication is an arithmetic operation that has a significant impact on the performance of various real-life applications, such as digital signal processing, image processing and computer vision. In this study, targeting to exploit the efficiency of alternative number representation formats, the authors propose an energy-efficient scheme for multiplying 2's-complement binary numbers with two least significant bits (LSBs). The double-LSB (DLSB) arithmetic delivers several benefits, such as the symmetric representation range, the number negation performed only by bitwise inversion, and the facilitation of the rounding process in the results of floating point architectures. The hardware overhead of the proposed circuit, when implemented at 45 nm, is negligible in comparison with the conventional Modified Booth multiplier for the ordinary 2's-complement numbers (3.1% area and 3.3% energy average overhead for different multiplier's bit-width). Moreover, the proposed DLSB multiplier outperforms the previous state-of-the-art implementation by providing 10.2% energy and 7.8% area average gains. Finally, they demonstrate how the DLSB multipliers can be effectively used as a building block for the implementation of larger multiplications, delivering area and energy savings.
TL;DR: It is proved that the problem of finding a path which satisfies two bounds, one for each criterion, is NP-complete, even in the acyclic case.
Abstract: We study a bi-criteria path problem on a directed multigraph with cycles, where each arc is associated with two parameters. The first is the survival probability of moving along the arc, and the second is the length of the arc. We evaluate the quality of a path by two independent criteria. The first is to maximize the survival probability along the entire path, which is the product of the arc probabilities, and the second is to minimize the total path length, which is the sum of the arc lengths. We prove that the problem of finding a path which satisfies two bounds, one for each criterion, is NP-complete, even in the acyclic case. We further develop approximation algorithms for the optimization versions of the studied problem. One algorithm is based on approximate computing of logarithms of arc probabilities, and the other two are fully polynomial time approximation schemes (FPTASes). One FPTAS is based on scaling and rounding of the input, while the other FPTAS is derived via the method of K-approximation sets and functions, introduced by Halman et al. (Math Oper Res 34:674–685, 2009).
TL;DR: In this paper, an experimental program has been developed along with supplementary archaeological analysis to define and characterize degrees of rounding among lithic artifacts, based on three criteria: surface alteration, edge alteration, and width of the ridges.
Abstract: Many Paleolithic lithic collections are found in contexts where post-depositional alterations, such as those made by water streams or sedimentary displacement, have affected the surface of most of the lithic artifacts. A major alteration often observed is the rounding of lithic artifacts. Although there have been some proposals on how to classify degrees of rounding—usually by employing naked eye classifications to determine degrees of rounding—there is a lack of consensus among lithic analysts. The aim of this study is to define and characterize degrees of rounding among lithic artifacts. This characterization also takes into consideration the differential development of alterations and rounding stages, depending on the raw materials. Here, an experimental program has been developed along with supplementary archaeological analysis to define and characterize degrees of rounding. Degrees of rounding are characterized according to three criteria: surface alteration, edge alteration, and width of the ridges. A preliminary characterization and proposal of degrees of rounding is presented. This characterization also takes into consideration the raw materials and the sensibility and resolution of the criteria to establish degrees of rounding. Results show how, after microscopic analysis, lithic artifacts that appear fresh to the naked present different degrees of rounding. The conclusions explain that, although rounding is continuous process, it is possible to establish degrees of rounding, a main initial goal for any lithic analyst.
TL;DR: A new two-threshold rounding scheme, tailored for multistage problems, is introduced and it is shown that this rounding scheme gives a 2$f$-approximation algorithm for the multistages variant of the f-Set Cover problem, where each element belongs to at most f sets.
Abstract: We consider a multistage framework introduced recently where, given a time horizon t=1,2,...,T, the input is a sequence of instances of a (static) combinatorial optimization problem I_1,I_2,...,I_T, (one for each time step), and the goal is to find a sequence of solutions S_1,S_2,...,S_T (one for each time step) reaching a tradeoff between the quality of the solutions in each time step and the stability/similarity of the solutions in consecutive time steps. For several polynomial-time solvable problems, such as Minimum Cost Perfect Matching, the multistage variant becomes hard to approximate (even for two time steps for Minimum Cost Perfect Matching). In this paper, we study the multistage variants of some important discrete minimization problems (including Minimum Cut, Vertex Cover, Set Cover, Prize-Collecting Steiner Tree, Prize-Collecting Traveling Salesman). We focus on the natural question of whether linear-programming-based methods may help in developing good approximation algorithms in this framework. We first show that Min Cut remains polytime solvable in its multistage variant, and Vertex Cover remains 2-approximable, as particular case of a more general statement which easily follows from the work of (Hochbaum, EJOR 2002) on monotone and IP2 problems. Then, we tackle other problems and for this we introduce a new two-threshold rounding scheme, tailored for multistage problems. As a first application, we show that this rounding scheme gives a 2$f$-approximation algorithm for the multistage variant of the f-Set Cover problem, where each element belongs to at most f sets. More interestingly, we are able to use our rounding scheme in order to propose a 3.53-approximation algorithm for the multistage variant of the Prize-Collecting Steiner Tree problem, and a 3.034-approximation algorithm for the multistage variant of the Prize-Collecting Traveling Salesman problem.
TL;DR: This paper adopts the grid model, and states that the unique binary solution can be explicitly and exactly retrieved from the minimum Euclidean norm solution by means of a rounding method based on some special entries, which are precisely determined.
TL;DR: This work provides a framework of analysis that is derived by duality properties, does not rely on potential functions and is applicable to a variety of scheduling problems, which yields improved competitive ratios.
Abstract: We study online scheduling problems on a single processor that can be viewed as extensions of the well-studied problem of minimizing total weighted flow time. In particular, we provide a framework of analysis that is derived by duality properties, does not rely on potential functions and is applicable to a variety of scheduling problems. A key ingredient in our approach is bypassing the need for “black-box” rounding of fractional solutions, which yields improved competitive ratios.
TL;DR: It is proved the existence of a slowed-down sticky Brownian motion whose induced rounding for MAXCUT attains the Goemans--Williamson approximation ratio.
Abstract: Answering a question of Abbasi-Zadeh, Bansal, Guruganesh, Nikolov, Schwartz and Singh (2018), we prove the existence of a slowed-down sticky Brownian motion whose induced rounding for MAXCUT attains the Goemans--Williamson approximation ratio. This is an especially simple particular case of the general rounding framework of Krivine diffusions that we investigate elsewhere.
TL;DR: This work presents a numerical stability analysis that describes and quantifies the impact of local rounding error propagation on the maximal attainable accuracy of the multi-term recurrences in the preconditioned pipelined BiCGStab method.
Abstract: Pipelined Krylov subspace methods avoid communication latency by reducing the number of global synchronization bottlenecks and by hiding global communication behind useful computational work. In exact arithmetic pipelined Krylov subspace algorithms are equivalent to classic Krylov subspace methods and generate identical series of iterates. However, as a consequence of the reformulation of the algorithm to improve parallelism, pipelined methods may suffer from severely reduced attainable accuracy in a practical finite precision setting. This work presents a numerical stability analysis that describes and quantifies the impact of local rounding error propagation on the maximal attainable accuracy of the multi-term recurrences in the preconditioned pipelined BiCGStab method. Theoretical expressions for the gaps between the true and computed residual as well as other auxiliary variables used in the algorithm are derived, and the elementary dependencies between the gaps on the various recursively computed vector variables are analyzed. The norms of the corresponding propagation matrices and vectors provide insights in the possible amplification of local rounding errors throughout the algorithm. Stability of the pipelined BiCGStab method is compared numerically to that of pipelined CG on a symmetric benchmark problem. Furthermore, numerical evidence supporting the effectiveness of employing a residual replacement type strategy to improve the maximal attainable accuracy for the pipelined BiCGStab method is provided.
TL;DR: This work introduces granularity as a sufficient condition for the consistency of a mixed-integer optimization problem, and shows how to exploit it for the computation of feasible points, which can improve the CPU time needed to solve problems from practice.
Abstract: We introduce granularity as a sufficient condition for the consistency of a mixed-integer optimization problem, and show how to exploit it for the computation of feasible points: For optimization problems which are granular, solving certain linear problems and rounding their optimal points always leads to feasible points of the original mixed-integer problem. Thus, the resulting feasible rounding approach is deterministic and even efficient, i.e., it computes feasible points in polynomial time. The optimization problems appearing in the feasible rounding approaches have a structure that is similar to that of the continuous relaxation, and thus our approach has significant advantages over heuristics, as long as the problem is granular. For instance, the computational cost of our approach always corresponds to merely a single step of the feasibility pump. A computational study on optimization problems from the MIPLIB libraries demonstrates that granularity may be expected in various real world applications. Moreover, a comparison with Gurobi indicates that state of the art software does not always exploit granularity. Hence, our algorithms do not only possess a worst-case complexity advantage, but can also improve the CPU time needed to solve problems from practice.
TL;DR: In this article, the problem of computing personalized reserve prices in eager second price auctions without having any assumption on valuation distributions is studied and a novel LP formulation and a rounding procedure is presented, which achieves a (1+2)-approximation to the integrality gap.
Abstract: We study the problem of computing data-driven personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a data-set that contains the submitted bids of $n$ buyers in a set of auctions and the problem is to return personalized reserve prices $\textbf r$ that maximize the revenue earned on these auctions by running eager second price auctions with reserve $\textbf r$. For this problem, which is known to be APX-hard, we present a novel LP formulation and a rounding procedure which achieves a $(1+2(\sqrt{2}-1)e^{\sqrt{2}-2})^{-1} \approx 0.684$-approximation. This improves over the $\frac{1}{2}$-approximation algorithm due to Roughgarden and Wang. We show that our analysis is tight for this rounding procedure. We also bound the integrality gap of the LP, which shows that it is impossible to design an algorithm that yields an approximation factor larger than $0.828$ with respect to this LP.
TL;DR: The proposed formation control system with static and dynamic target rounding up may inspire future underwater multi robot cooperation that have properties of high efficiency, wide range and multi tasks and has low environmental interference of aquatic environments.
Abstract: Aiming at multi robot cooperation application requirements of our small-scaled underwater spherical robots, a cooperative formation control system with static and dynamic target rounding up was proposed and studied in this paper. Considering the complex underwater environment and kinematic modeling of the robot, a simplified kinematic model of underwater spherical robot with horizontal and vertical motion was established. Given the environmental disturbances in practical underwater scenarios, an adaptive control algorithm was designed to control the underwater spherical robot, which has good robustness and adaptability. To handle the application requirement of static and dynamic target rounding up with multi robots, the path planning strategy of multi robot was proposed, the modeling of static and dynamic target rounding up was analyzed and optimized, and a controller based on linear quadratic regulator method was also designed to realize static/dynamic target rounding up with multi robots. In addition, underwater target rounding up experiments with two or three spherical robots were conducted to test the feasibility and performance of the proposed formation control system, and the experimental results confirmed the validity of the proposed system. This study may inspire future underwater multi robot cooperation that have properties of high efficiency, wide range and multi tasks and has low environmental interference of aquatic environments.
TL;DR: A novel single-precision floating-point (SPFP) multiplication algorithm and its architecture that approximates only one of the operands to reduce the number of logic blocks and iteratively compensates the approximation error to achieve acceptable error ranges in applications.
Abstract: Approximate multipliers have been widely used in critical applications, such as machine learning and multimedia, which are tolerant to approximation errors. This paper proposes a novel single-precision floating-point (SPFP) multiplication algorithm and its architecture. The proposed work approximates only one of the operands to reduce the number of logic blocks and iteratively compensates the approximation error to achieve acceptable error ranges in applications. To reduce the accuracy degradation by the single operand approximation, a rounding scheme and an operand selection scheme are additionally introduced. Compared with the widely-known previous iterative Mitchell design, our proposed SPFP multiplier design decreases the numbers of look up tables (LUTs) and flip flops (FFs) by 55% and 59% respectively, and shows two cycles shorter latency. The accuracy of our design becomes close to that of the iterative Mitchell design as the number of iterations increases, and it always meets the error tolerance of 1% when the number of iterations is four.