TL;DR: AdaRound is proposed, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss that outperforms rounding-to-nearest by a significant margin and establishes a new state-of-the-art forPost- training quantization on several networks and tasks.
Abstract: When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.
TL;DR: In this article, a probabilistic model of the objective is used to compute an acquisition function that estimates the expected utility (for solving the optimization problem) of evaluating the objective at each potential new point.
TL;DR: An approximate multiplier that is high speed yet energy efficient, that can be used as common multiplier design for both signed and un-signed operations and reduces logic size and facilitates with less power and delay is proposed.
Abstract: In this paper, we propose an approximate multiplier that is high speed yet energy efficient. Now-a-days, Energy minimization is one of the main design requirements especially in the portable gadgets i.e., smart phones, tablets and so on. In these types of gadgets, DSP blocks are key components, where the computational core of these blocks is the arithmetic logic unit where multiplications have a greatest share. So, by the use of the multipliers the computational part of multiplications is omitted by improving the speed and power/efficiency characteristics of multipliers as it plays a key role. In this, the approach is to round the operands to nearest exponent of two. By this approximations are made for improving the speed and efficiency. Since the final outputs are used in two Image processing applications, i.e., image sharpening and smoothing. This can be performed at different design abstraction levels i.e., circuit, logic and architecture levels using different techniques, here we use function approximation method (e.g., modifying the Boolean function of a circuit), a number of approximating arithmetic building blocks, such as adders, multipliers have been suggested. Finally, It has added advantage that it can be used as common multiplier design for both signed and un-signed operations and reduces logic size and facilitates with less power and delay. Here we are using Verilog HDL and Xilinx ISE14.8 software tools for simulation and synthesis purpose.
TL;DR: In this paper, a new dynamic matching sparsification scheme was proposed for adaptive adversaries, and a framework for dynamically rounding fractional matchings against adaptive adversaries was derived from this scheme.
Abstract: We present a new dynamic matching sparsification scheme. From this scheme we derive a framework for dynamically rounding fractional matchings against adaptive adversaries. Plugging in known dynamic fractional matching algorithms into our framework, we obtain numerous randomized dynamic matching algorithms which work against adaptive adversaries. In contrast, all previous randomized algorithms for this problem assumed a weaker, oblivious, adversary. Our dynamic algorithms against adaptive adversaries include, for any constant є >0, a (2+є)-approximate algorithm with constant update time or polylog worst-case update time, as well as (2−δ)-approximate algorithms in bipartite graphs with arbitrarily-small polynomial update time. All these results achieve polynomially better update time to approximation trade-offs than previously known to be achievable against adaptive adversaries.
TL;DR: An effective method to detect the recompression in the color images by using the conversion error, rounding error, and truncation error on the pixel in the spherical coordinate system is proposed and experimental results show that the performance of the proposed method is better than the existing methods.
Abstract: Detection of double Joint Photographic Experts Group (JPEG) compression is an important part of image forensics. Although methods in the past studies have been presented for detecting the double JPEG compression with a different quantization matrix, the detection of double JPEG compression with the same quantization matrix is still a challenging problem. In this paper, an effective method to detect the recompression in the color images by using the conversion error, rounding error, and truncation error on the pixel in the spherical coordinate system is proposed. The randomness of truncation errors, rounding errors, and quantization errors result in random conversion errors. The pixel number of the conversion error is used to extract six-dimensional features. Truncation error and rounding error on the pixel in its three channels are mapped to the spherical coordinate system based on the relation of a color image to the pixel values in the three channels. The former is converted into amplitude and angles to extract 30-dimensional features and 8-dimensional auxiliary features are extracted from the number of special points and special blocks. As a result, a total of 44-dimensional features have been used in the classification by using the support vector machine (SVM) method. Thereafter, the support vector machine recursive feature elimination (SVMRFE) method is used to improve the classification accuracy. The experimental results show that the performance of the proposed method is better than the existing methods.
TL;DR: The results demonstrate that continuous community partition method can improve influence spread and accuracy of the community partition effectively.
Abstract: Community partition is of great importance in social networks because of the rapid increasing network scale, data and applications. We consider the community partition problem under LT model in social networks, which is a combinatorial optimization problem that divides the social network to disjoint $m$ communities. Our goal is to maximize the sum of influence propagation through maximizing it within each community. As the influence propagation function of community partition problem is supermodular under LT model, we use the method of Lov{$\acute{a}$}sz Extension to relax the target influence function and transfer our goal to maximize the relaxed function over a matroid polytope. Next, we propose a continuous greedy algorithm using the properties of the relaxed function to solve our problem, which needs to be discretized in concrete implementation. Then, random rounding technique is used to convert the fractional solution to integer solution. We present a theoretical analysis with $1-1/e$ approximation ratio for the proposed algorithms. Extensive experiments are conducted to evaluate the performance of the proposed continuous greedy algorithms on real-world online social networks datasets and the results demonstrate that continuous community partition method can improve influence spread and accuracy of the community partition effectively.
TL;DR: In this article, the Izhikevich neuron model is used to demonstrate that rounding has an important role in producing accurate spike timings from explicit ODE solution algorithms, and fixed-point arithmetic with stochastic rounding consistently results in smaller errors compared to single-precision floating-point and fixed point arithmetic with round-to-nearest across a range of neuron behaviors and ODE solvers.
Abstract: Although double-precision floating-point arithmetic currently dominates high-performance computing, there is increasing interest in smaller and simpler arithmetic types. The main reasons are potential improvements in energy efficiency and memory footprint and bandwidth. However, simply switching to lower-precision types typically results in increased numerical errors. We investigate approaches to improving the accuracy of reduced-precision fixed-point arithmetic types, using examples in an important domain for numerical computation in neuroscience: the solution of ordinary differential equations (ODEs). The Izhikevich neuron model is used to demonstrate that rounding has an important role in producing accurate spike timings from explicit ODE solution algorithms. In particular, fixed-point arithmetic with stochastic rounding consistently results in smaller errors compared to single-precision floating-point and fixed-point arithmetic with round-to-nearest across a range of neuron behaviours and ODE solvers. A computationally much cheaper alternative is also investigated, inspired by the concept of dither that is a widely understood mechanism for providing resolution below the least significant bit in digital signal processing. These results will have implications for the solution of ODEs in other subject areas, and should also be directly relevant to the huge range of practical problems that are represented by partial differential equations. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
TL;DR: The authors study rounding of numerical expectations in the Health and Retirement Study (HRS) between 2002 and 2014 and find that respondents tend to report the values 25 and 75 more frequently than other values ending in 5.
TL;DR: AdaRound as discussed by the authors approximates the task loss with a Taylor series expansion and poses the rounding task as a quadratic unconstrained binary optimization problem, which is then simplified to a layer-wise local loss.
Abstract: When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.
TL;DR: This work extends randomized smoothing to cover parameterized transformations and certify robustness in the parameter space and shows how to efficiently compute the inverse of an image transformation, enabling individual guarantees in the online setting.
Abstract: We extend randomized smoothing to cover parameterized transformations (e.g., rotations, translations) and certify robustness in the parameter space (e.g., rotation angle). This is particularly challenging as interpolation and rounding effects mean that image transformations do not compose, in turn preventing direct certification of the perturbed image (unlike certification with $\ell^p$ norms). We address this challenge by introducing three different kinds of defenses, each with a different guarantee (heuristic, distributional and individual) stemming from the method used to bound the interpolation error. Importantly, we show how individual certificates can be obtained via either statistical error bounds or efficient online inverse computation of the image transformation. We provide an implementation of all methods at this https URL.
TL;DR: This paper defines inaccuracy checks to detect large precision loss and cancellation at strategic program locations to construct specialized branches that, when covered by a given input, are likely to lead to large errors in the result.
Abstract: Floating point is widely used in software to emulate arithmetic over reals. Unfortunately, floating point leads to rounding errors that propagate and accumulate during execution. Generating inputs to maximize the numerical error is critical when evaluating the accuracy of floating-point code. In this paper, we formulate the problem of generating high error-inducing floating-point inputs as a code coverage maximization problem solved using symbolic execution. Specifically, we define inaccuracy checks to detect large precision loss and cancellation. We inject these checks at strategic program locations to construct specialized branches that, when covered by a given input, are likely to lead to large errors in the result. We apply symbolic execution to generate inputs that exercise these specialized branches, and describe optimizations that make our approach practical. We implement a tool named FPGen and present an evaluation on 21 numerical programs including matrix computation and statistics libraries. We show that FPGen exposes errors for 20 of these programs and triggers errors that are, on average, over 2 orders of magnitude larger than the state of the art.
TL;DR: Partial outer convexification has been used to derive relaxations of mixed-integer optimal control problems (MIOCPs) that are constrained by time-dependent differential equations.
Abstract: Partial outer convexification has been used to derive relaxations of mixed-integer optimal control problems (MIOCPs) that are constrained by time-dependent differential equations. The family of sum...
TL;DR: A floating-point division and square root unit is presented, which implements a radix-64 floating- point division and a Radix-16 floating- Point square root, requiring 11, 6, and 4 cycles for double, single and half-precision division with normalized operands and result, and 15, 8 and 5 cycles for square root.
Abstract: Digit-recurrence algorithms are widely used in actual microprocessors to compute floating-point division and square root. These iterative algorithms present a good trade-off in terms of performance, area and power. We present a floating-point division and square root unit, which implements a radix-64 floating-point division and a radix-16 floating-point square root. To have an affordable implementation, each radix-64 division iteration and radix-16 square root iteration are made of simpler radix-4 iterations: 3 radix-4 iterations in division and 2 in square root. Speculation is used between consecutive radix-4 iterations to get a reduced timing. There are three different parts in digit-recurrence implementations: initialization, digit iterations, and rounding. The digit iteration is the iterative part and it uses the same logic for several cycles. Division and square root share partially the initialization and rounding stages, whereas each one has different logic for the digit iterations. The result is a low-latency floating-point divider and square root, requiring 11, 6, and 4 cycles for double, single and half-precision division with normalized operands and result, and 15, 8 and 5 cycles for square root. One or two additional cycles are needed in case of subnormal operand(s) or result.
TL;DR: Improved deterministic distributed algorithms for a number of well-studied matching problems are presented, which are simpler, faster, more accurate, and/or more general than their known counterparts.
Abstract: We present improved deterministic distributed algorithms for a number of well-studied matching problems, which are simpler, faster, more accurate, and/or more general than their known counterparts. The common denominator of these results is a deterministic distributed rounding method for certain linear programs, which is the first such rounding method, to our knowledge. A sampling of our end results is as follows:
TL;DR: A novel neural network training framework called NITI that exclusively utilizes low bitwidth integer arithmetic, which achieves similar accuracy as state-of-the-art integer training frameworks without relying on full-precision floating-point first and last layers.
Abstract: While integer arithmetic has been widely adopted for improved performance in deep quantized neural network inference, training remains a task primarily executed using floating point arithmetic. This is because both high dynamic range and numerical accuracy are central to the success of most modern training algorithms. However, due to its potential for computational, storage and energy advantages in hardware accelerators, neural network training methods that can be implemented with low precision integer-only arithmetic remains an active research challenge. In this paper, we present NITI, an efficient deep neural network training framework that stores all parameters and intermediate values as integers, and computes exclusively with integer arithmetic. A pseudo stochastic rounding scheme that eliminates the need for external random number generation is proposed to facilitate conversion from wider intermediate results to low precision storage. Furthermore, a cross-entropy loss backpropagation scheme computed with integer-only arithmetic is proposed. A proof-of-concept open-source software implementation of NITI that utilizes native 8-bit integer operations in modern GPUs to achieve end-to-end training is presented. When compared with an equivalent training setup implemented with floating point storage and arithmetic, NITI achieves negligible accuracy degradation on the MNIST and CIFAR10 datasets using 8-bit integer storage and computation. On ImageNet, 16-bit integers are needed for weight accumulation with an 8-bit datapath. This achieves training results comparable to all-floating-point implementations.
TL;DR: This work considers algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation) that are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure.
Abstract: We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation). These are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure. The parallel algorithms are designed for distributed-memory computation, and we use a data distribution and strategy that parallelizes computations for individual cores within the TT format. We analyze the computation and communication costs of the proposed algorithms to show their scalability, and we present numerical experiments that demonstrate their efficiency on both shared-memory and distributed-memory parallel systems. For example, we observe better single-core performance than the existing MATLAB TT-Toolbox in rounding a 2GB TT tensor, and our implementation achieves a $34\times$ speedup using all 40 cores of a single node. We also show nearly linear parallel scaling on larger TT tensors up to over 10,000 cores for all mathematical operations.
TL;DR: This letter incorporates Multi-connectivity into the optimization of the total power consumption in 5G Heterogeneous Cloud Radio Access Networks and proposes a heuristic algorithm, which takes the outputs of the linear programming relaxation and rounding as inputs and checks if it further minimizes the power consumption.
Abstract: Multi-connectivity ( $MC$ ) is proposed in Fifth-Generation mobile communications systems ( $5G$ ) to mitigate the deterioration of Quality-of-Service owing to line-of-sight blockage and lack of communication resources. The main idea of MC is to associate single user equipment with multiple network layers and multiple radio access technologies, simultaneously. In previous studies, MC is demonstrated to increase capacity, provide high reliability and decrease outage probability. However, for the first time in literature, in this letter, we incorporate MC into the optimization of the total power consumption in 5G Heterogeneous Cloud Radio Access Networks ( $HCRAN$ )s. Upon formulation of the problem as a binary integer linear programming problem, proving its NP-hardness, a heuristic algorithm is proposed consisting of linear programming relaxation and rounding and generalized assignment problem heuristic, which takes the outputs of the linear programming relaxation and rounding as inputs and checks if it further minimizes the power consumption. We analyze power consumption, time complexity and achievable rate of the proposed algorithm and verify its superiority over existing methods by simulations.
TL;DR: This paper presents PositDebug, a compile-time instrumentation that performs shadow execution with high precision values to detect various errors in computation using posits and provides directed acyclic graphs of instructions, which are likely responsible for the error.
Abstract: Posit is a recently proposed alternative to the floating point representation (FP). It provides tapered accuracy. Given a fixed number of bits, the posit representation can provide better precision for some numbers compared to FP, which has generated significant interest in numerous domains. Being a representation with tapered accuracy, it can introduce high rounding errors for numbers outside the above golden zone. Programmers currently lack tools to detect and debug errors while programming with posits. This paper presents PositDebug, a compile-time instrumentation that performs shadow execution with high precision values to detect various errors in computation using posits. To assist the programmer in debugging the reported error, PositDebug also provides directed acyclic graphs of instructions, which are likely responsible for the error. A contribution of this paper is the design of the metadata per memory location for shadow execution that enables productive debugging of errors with long-running programs. We have used PositDebug to detect and debug errors in various numerical applications written using posits. To demonstrate that these ideas are applicable even for FP programs, we have built a shadow execution framework for FP programs that is an order of magnitude faster than Herbgrind.
TL;DR: In this paper, the authors extend randomized smoothing to cover parameterized transformations (e.g., rotations, translations) and certify robustness in the parameter space, which is particularly challenging as interpolation and rounding effects mean that image transformations do not compose.
Abstract: We extend randomized smoothing to cover parameterized transformations (e.g., rotations, translations) and certify robustness in the parameter space (e.g., rotation angle). This is particularly challenging as interpolation and rounding effects mean that image transformations do not compose, in turn preventing direct certification of the perturbed image (unlike certification with $\ell^p$ norms). We address this challenge by introducing three different kinds of defenses, each with a different guarantee (heuristic, distributional and individual) stemming from the method used to bound the interpolation error. Importantly, we show how individual certificates can be obtained via either statistical error bounds or efficient online inverse computation of the image transformation. We provide an implementation of all methods at this https URL.
TL;DR: This work presents an algebraic framework of LWR, inspired by a recent work of Peikert and Pepin, and shows a search-to-decision reduction for Ring-LWR, generalizing a result in the plain LWR setting.
Abstract: In this work, we conduct a comprehensive study on establishing hardness reductions for (Module) Learning with Rounding over rings (RLWR). Towards this, we present an algebraic framework of LWR, inspired by a recent work of Peikert and Pepin (TCC ’19). Then we show a search-to-decision reduction for Ring-LWR, generalizing a result in the plain LWR setting by Bogdanov et al. (TCC ’15). Finally, we show a reduction from Ring-LWE to Module Ring-LWR (even for leaky secrets), generalizing the plain LWE to LWR reduction by Alwen et al. (Crypto ’13). One of our central techniques is a new ring leftover hash lemma, which might be of independent interests.
TL;DR: In this article, expressions are derived that indicate how many decimals are reliable and so at what point the results should be rounded, based on the measurement precision, that is, the precision of the raw data and uses propagation of error techniques.
TL;DR: A pair arithmetic for the four basic operations and square root is presented, which can be regarded as a simplified, more-efficient double-double arithmetic and the central assumption on the underlying arithmetic is the first standard model for error analysis for operations on a discrete set of real numbers.
Abstract: We present a pair arithmetic for the four basic operations and square root. It can be regarded as a simplified, more-efficient double-double arithmetic. The central assumption on the underlying arithmetic is the first standard model for error analysis for operations on a discrete set of real numbers. Neither do we require a floating-point grid nor a rounding to nearest property. Based on that, we define a relative rounding error unit u and prove rigorous error bounds for the computed result of an arbitrary arithmetic expression depending on u, the size of the expression, and possibly a condition measure. In the second part of this note, we extend the error analysis by examining requirements to ensure faithfully rounded outputs and apply our results to IEEE 754 standard conform floating-point systems. For a class of mathematical expressions, using an IEEE 754 standard conform arithmetic with base β, the result is proved to be faithfully rounded for up to 1 / √βu - 2 operations. Our findings cover a number of previously published algorithms to compute faithfully rounded results, among them Horner’s scheme, products, sums, dot products, or Euclidean norm. Beyond that, several other problems can be analyzed, such as polynomial interpolation, orientation problems, Householder transformations, or the smallest singular value of Hilbert matrices of large size.
TL;DR: In this article, the authors show that mixed-integer control problems for evolution type partial differential equations can be regarded as operator differential inclusions, which yields a relaxation result including a characterization of the optimal value for mixedinteger optimal control problems with control constraints.
Abstract: We show that mixed-integer control problems for evolution type partial differential equations can be regarded as operator differential inclusions This yields a relaxation result including a characterization of the optimal value for mixed-integer optimal control problems with control constraints The theory is related to partial outer convexification and sum-up rounding methods The results are applied to optimal valve switching control for gas pipeline operations A numerical example illustrates the approach
TL;DR: This work designs a simple (1.7 + ε)-competitive algorithm using a migration factor of O(1/ε), which maintains at every arrival a locally optimal solution with respect to the Jump neighborhood, and presents as its main contribution a more involved (4/3+ε-competitive algorithm) using a Migration factor of Ō( 1/ε 3).
Abstract: Online models that allow recourse can be highly effective in situations where classical online models are too pessimistic. One such problem is the online machine covering problem on identical machines. In this setting, jobs arrive one by one and must be assigned to machines with the objective of maximizing the minimum machine load. When a job arrives, we are allowed to reassign some jobs as long as their total size is (at most) proportional to the processing time of the arriving job. The proportionality constant is called the migration factor of the algorithm.Using a rounding procedure with useful structural properties for online packing and covering problems, we design first a simple (1.7 + e)-competitive algorithm using a migration factor of O(1/e), which maintains at every arrival a locally optimal solution with respect to the Jump neighborhood. After that, we present as our main contribution a more involved (4/3+e)-competitive algorithm using a migration factor of Ō(1/e 3). At every arrival, we run an adaptation of the Largest Processing Time first (LPT) algorithm. Since the new job can cause a complete change of the assignment of smaller jobs in both cases, a low migration factor is achieved by carefully exploiting the highly symmetric structure obtained by the rounding procedure.
TL;DR: The theory presented here is the first forward error analysis in the energy norm of iterative refinement and the first rounding error analysis of multigrid in general.
Abstract: This paper establishes the first theoretical framework for analyzing the rounding-error effects on multigrid methods using mixed-precision iterative-refinement solvers. While motivated by the sparse symmetric positive definite (SPD) matrix equations that arise from discretizing linear elliptic PDEs, the framework is purely algebraic such that it applies to matrices that do not necessarily come from the continuum. Based on the so-called energy or $A$ norm, which is the natural norm for many problems involving SPD matrices, we provide a normwise forward error analysis, and introduce the notion of progressive precision for multigrid solvers. Each level of the multigrid hierarchy uses three different precisions that each increase with the fineness of the level, but at different rates, thereby ensuring that the bulk of the computation uses the lowest possible precision. The theoretical results developed here in the energy norm differ notably from previous theory based on the Euclidean norm in important ways. In particular, we show that simply rounding an exact result to finite precision causes an error in the energy norm that is proportional to the square root of $\kappa$, the associated matrix condition number. (By contrast, this error is of order $1$ when measured in the Euclidean norm.) Given this observation, we show that the limiting accuracy for both V-cycles and full multigrid is optimal in the sense that it is also proportional to $\kappa^{1/2}$ in energy. Additionally, we show that the loss of convergence rate due to rounding grows in proportion to $\kappa^{1/2}$, but argue that this loss is insignificant in practice. The theory presented here is the first forward error analysis in the energy norm of iterative refinement and the first rounding error analysis of multigrid in general.
TL;DR: A submodular maximization problem motivated by applications in online retail in which a platform displays a list of products to a user in response to a search query is studied, and an optimal (1-1/e)-approximation algorithm is given for this problem.
Abstract: We study a submodular maximization problem motivated by applications in online retail. A platform displays a list of products to a user in response to a search query. The user inspects the first k items in the list for a k chosen at random from a given distribution, and decides whether to purchase an item from that set based on a choice model. The goal of the platform is to maximize the engagement of the shopper defined as the probability of purchase. This problem gives rise to a less-studied variation of submodular maximization in which we are asked to choose an ordering of a set of elements to maximize a linear combination of different submodular functions.
First, using a reduction to maximizing submodular functions over matroids, we give an optimal (1-1/e)-approximation for this problem. We then consider a variant in which the platform cares not only about user engagement, but also about diversification across various groups of users, that is, guaranteeing a certain probability of purchase in each group. We characterize the polytope of feasible solutions and give a bi-criteria ((1-1/e)^2,(1-1/e)^2)-approximation for this problem by rounding an approximate solution of a linear programming relaxation. For rounding, we relay on our reduction and the particular rounding techniques for matroid polytopes. For the special case in which underlying submodular functions are coverage functions -- which is practically relevant in online retail -- we propose an alternative LP relaxation and a simpler randomized rounding for the problem. This approach yields to an optimal bi-criteria (1-1/e,1-1/e)-approximation algorithm for the special case of the problem with coverage functions.
TL;DR: A scalable algorithm termed Single-Task Unload for Budget Resolution (STUBR), which resolves budget violations and orders the tasks to obtain robust solutions, is developed and it is observed that STUBR exhibits robust performance under practical scenarios and outperforms existing alternatives.
Abstract: We study task scheduling and offloading in a cloud computing system with multiple users where tasks have different processing times, release times, communication times, and weights. Each user may schedule a task locally or offload it to a shared cloud with heterogeneous processors by paying a price for the resource usage. We consider four different models in this paper: (i) zero task release and communication times, (ii) non-zero task release times and zero communication times, (iii) non-zero task release times and fixed communication times, and (iv) non-zero task release times and sequence-dependent communication times. Our work aims at identifying a task scheduling decision that minimizes the weighted sum completion time of all tasks, while satisfying the users' budget constraints. We propose an efficient solution framework for this NP-hard problem. As a first step, we use a relaxation and a rounding technique to obtain an integer solution that is a constant factor approximation to the minimum weighted sum completion time. This solution violates the budget constraints, but the average budget violation decreases as the number of users increases. Thus, we develop a scalable algorithm termed Single-Task Unload for Budget Resolution (STUBR), which resolves budget violations and orders the tasks to obtain robust solutions.
TL;DR: This work derives fast approximation schemes for LP relaxations of several well-studied geometric optimization problems that include packing, covering, and mixed packing and covering constraints and obtains the first near-linear constant factor approximation algorithms for several problems.
Abstract: We derive fast approximation schemes for LP relaxations of several well-studied geometric optimization problems that include packing, covering, and mixed packing and covering constraints. Previous work in computational geometry concentrated mainly on the rounding stage to prove approximation bounds, assuming that the underlying LPs can be solved efficiently. This work demonstrates that many of those results can be made to run in nearly linear time. In contrast to prior work on this topic our algorithms handle weights and capacities, side constraints, and also apply to mixed packing and covering problems, in a unified fashion. Our framework relies crucially on the properties of a randomized MWU algorithm of [41]; we demonstrate that it is well-suited for range spaces that admit efficient approximate dynamic data structures for emptiness oracles. Our framework cleanly separates the MWU algorithm for solving the LP from the key geometric data structure primitives, and this enables us to handle side constraints in a simple way. Combined with rounding algorithms that can also be implemented efficiently, we obtain the first near-linear constant factor approximation algorithms for several problems.
TL;DR: Algorithms and a hardware accelerator for performing stochastic rounding (SR) are presented to augment the ARM M4F based multi-core processor SpiNNaker2 with a more flexible rounding functionality than is available in the ARM processor itself.
Abstract: Algorithms and a hardware accelerator for performing stochastic rounding (SR) are presented. The main goal is to augment the ARM M4F based multi-core processor SpiNNaker2 with a more flexible rounding functionality than is available in the ARM processor itself. The motivation of adding such an accelerator in hardware is based on our previous results showing improvements in numerical accuracy of ODE solvers in fixed-point arithmetic with SR, compared to standard round to nearest or bit truncation rounding modes. Furthermore, performing SR purely in software can be expensive, due to requirement of a pseudorandom number generator (PRNG), multiple masking and shifting instructions, and an addition operation. Also, saturation of the rounded values is included, since rounding is usually followed by saturation, which is especially important in fixed-point arithmetic due to a narrow dynamic range of representable values. The main intended use of the accelerator is to round fixed-point multiplier outputs, which are returned unrounded by the ARM processor in a wider fixed-point format than the arguments.
TL;DR: It is demonstrated that pseudo-granularity can be expected in many nonlinear applications from practice, and that its explicit use can be beneficial.
Abstract: We study a new technique to check the existence of feasible points for mixed-integer nonlinear optimization problems that satisfy a structural requirement called granularity. For granular optimization problems, we show how rounding the optimal points of certain purely continuous optimization problems can lead to feasible points of the original mixed-integer nonlinear problem. To this end, we generalize results for the mixed-integer linear case from Neumann et al. (Comput Optim Appl 72:309–337, 2019). We study some additional issues caused by nonlinearity and show how to overcome them by extending the standard granularity concept to an advanced version, which we call pseudo-granularity. In a computational study on instances from a standard test library, we demonstrate that pseudo-granularity can be expected in many nonlinear applications from practice, and that its explicit use can be beneficial.