Top 30 papers published in the topic of Arbitrary-precision arithmetic in 2018

Showing papers on "Arbitrary-precision arithmetic published in 2018"

Journal Article•10.2478/AMNS.2018.2.00038•

Review of numerical methods for NumILPT with computational accuracy assessment for fractional calculus

[...]

1 Dec 2018

TL;DR: In this article, the authors present results of accuracy evaluation of numerous numerical algorithms for the numerical approximation of the Inverse Laplace Transform, including Stehfest, Abate and Whitt, Vlach and Singhai.

...read moreread less

Abstract: In the paper we present results of accuracy evaluation of numerous numerical algorithms for the numerical approximation of the Inverse Laplace Transform. The selected algorithms represent diverse lines of approach to this problem and include methods by Stehfest, Abate and Whitt, Vlach and Singhai, De Hoog, Talbot, Zakian and a one in which the FFT is applied for the Fourier series convergence acceleration. We use C++ and Python languages with arbitrary precision mathematical libraries to address some crucial issues of numerical implementation. The test set includes Laplace transforms considered as difficult to compute as well as some others commonly applied in fractional calculus. Evaluation results enable to conclude that the Talbot method which involves deformed Bromwich contour integration, the De Hoog and the Abate and Whitt methods using Fourier series expansion with accelerated convergence can be assumed as general purpose high-accuracy algorithms. They can be applied to a wide variety of inversion problems.

...read moreread less

77 citations

Journal Article•10.1177/0278364917753994•

High-dimensional stochastic optimal control using continuous tensor decompositions

[...]

Alex Gorodetsky¹, Sertac Karaman¹, Youssef M. Marzouk¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Feb 2018-The International Journal of Robotics Research

TL;DR: This work proposes novel dynamic programming algorithms that alleviate the curse of dimensionality in problems that exhibit certain low-rank structure, and demonstrates the algorithms running in real time on board a quadcopter during a flight experiment under motion capture.

...read moreread less

Abstract: Motion planning and control problems are embedded and essential in almost all robotics applications. These problems are often formulated as stochastic optimal control problems and solved using dynamic programming algorithms. Unfortunately, most existing algorithms that guarantee convergence to optimal solutions suffer from the curse of dimensionality: the run time of the algorithm grows exponentially with the dimension of the state space of the system. We propose novel dynamic programming algorithms that alleviate the curse of dimensionality in problems that exhibit certain low-rank structure. The proposed algorithms are based on continuous tensor decompositions recently developed by the authors. Essentially, the algorithms represent high-dimensional functions e.g. the value function in a compressed format, and directly perform dynamic programming computations e.g. value iteration, policy iteration in this format. Under certain technical assumptions, the new algorithms guarantee convergence towards optimal solutions with arbitrary precision. Furthermore, the run times of the new algorithms scale polynomially with the state dimension and polynomially with the ranks of the value function. This approach realizes substantial computational savings in źcompressibleź problem instances, where value functions admit low-rank approximations. We demonstrate the new algorithms in a wide range of problems, including a simulated six-dimensional agile quadcopter maneuvering example and a seven-dimensional aircraft perching example. In some of these examples, we estimate computational savings of up to 10 orders of magnitude over standard value iteration algorithms. We further demonstrate the algorithms running in real time on board a quadcopter during a flight experiment under motion capture.

...read moreread less

66 citations

Journal Article•10.1007/S13389-017-0149-6•

Arithmetic coding and blinding countermeasures for lattice signatures

[...]

Markku-Juhani O. Saarinen

01 Apr 2018-Journal of Cryptographic Engineering

TL;DR: A practical, compact, and more quantum-resistant variant of the BLISS Ideal Lattice Signature Scheme is developed and it is demonstrated that arithmetic decoding from an uniform source to target distribution is also an optimal non-uniform sampling method in the sense that a minimal amount of true random bits is required.

...read moreread less

Abstract: We describe new arithmetic coding techniques and side-channel blinding countermeasures for lattice-based cryptography. Using these techniques, we develop a practical, compact, and more quantum-resistant variant of the BLISS Ideal Lattice Signature Scheme. We first show how the BLISS parameters and hash-based random oracle can be modified to be more secure against quantum pre-image attacks while optimizing signature size. Arithmetic Coding offers an information theoretically optimal compression for stationary and memoryless sources, such as the discrete Gaussian distributions often present in lattice-based cryptography. We show that this technique gives better signature sizes than the previously proposed advanced Huffman-based signature compressors. We further demonstrate that arithmetic decoding from an uniform source to target distribution is also an optimal non-uniform sampling method in the sense that a minimal amount of true random bits is required. Performance of this new Binary Arithmetic Coding sampler is comparable to other practical samplers. The same code, tables, or circuitry can be utilized for both tasks, eliminating the need for separate sampling and compression components. We then describe simple randomized blinding techniques that can be applied to anti-cyclic polynomial multiplication to mask timing- and power consumption side-channels in ring arithmetic. We further show that the Gaussian sampling process can also be blinded by a split-and-permute techniques as an effective countermeasure against side-channel attacks.

...read moreread less

59 citations

Book Chapter•10.1007/978-3-319-96418-8_30•

Numerical Integration in Arbitrary-Precision Ball Arithmetic

[...]

Fredrik Johansson¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

24 Jul 2018

TL;DR: In this paper, the authors present an implementation of arbitrary-precision numerical integration with rigorous error bounds in the Arb library, which combines adaptive bisection with adaptive Gaussian quadrature where error bounds are determined via complex magnitudes.

...read moreread less

Abstract: We present an implementation of arbitrary-precision numerical integration with rigorous error bounds in the Arb library. Rapid convergence is ensured for piecewise complex analytic integrals by use of the Petras algorithm, which combines adaptive bisection with adaptive Gaussian quadrature where error bounds are determined via complex magnitudes without evaluating derivatives. The code is general, easy to use, and efficient, often outperforming existing non-rigorous software.

...read moreread less

22 citations

Journal Article•10.1007/S11277-017-5028-Z•

Modified Binary Multiplier Architecture to Achieve Reduced Latency and Hardware Utilization

[...]

Geetam Singh Tomar, Marcus L. George¹•Institutions (1)

University of the West Indies¹

01 Feb 2018-Wireless Personal Communications

TL;DR: The results of simulation indicate that the latency of the proposed novel binary multiplier systems (8-bit, 16-bit and 24-bit) with significantly shorter than existing implementations.

...read moreread less

Abstract: Arithmetic Logic Units (ALUs) are very important components of the processor, which performs various arithmetic and logical operations such as multiplication, division, addition, subtraction, cubing, squaring, etc. Of these all operations, multiplication is most elementary and most frequently used operation in the ALUs. The operation of multiplication also forms the basis of many other complex arithmetic operations such as cubing, squaring, convolution, etc. This paper presents the modified novel multi-precision binary multiplier architecture to achieve a reduced latency/delay and area/hardware utilization along with existing implementations of binary multiplication. This system will function as second stage of the of a novel multi-precision binary multiplier system. The system was implemented using Xilinx 14.2 ISE and simulated with ISIM which was available from Xilinx 14.2 ISE. The results of simulation indicate that the latency of the proposed novel binary multiplier systems (8-bit, 16-bit and 24-bit) with significantly shorter than existing implementations.

...read moreread less

19 citations

Journal Article•10.1137/18M1170133•

Fast and rigorous arbitrary-precision computation of Gauss-Legendre quadrature nodes and weights

[...]

Fredrik Johansson, Marc Mezzarobba

27 Nov 2018-SIAM Journal on Scientific Computing

TL;DR: A strategy for rigorous arbitrary-precision evaluation of Legendre polynomials on the unit interval and its application in the generation of Gauss-Legendre quadrature rules with simultaneous high degree and precision is described.

...read moreread less

Abstract: We describe a strategy for rigorous arbitrary-precision evaluation of Legendre polynomials on the unit interval and its application in the generation of Gauss-Legendre quadrature rules. Our focus is on making the evaluation practical for a wide range of realistic parameters, corresponding to the requirements of numerical integration to an accuracy of about 100 to 100 000 bits. Our algorithm combines the summation by rectangular splitting of several types of expansions in terms of hypergeometric series with a fixed-point implementation of Bonnet's three-term recurrence relation. We then compute rigorous enclosures of the Gauss-Legendre nodes and weights using the interval Newton method. We provide rigorous error bounds for all steps of the algorithm. The approach is validated by an implementation in the Arb library, which achieves order-of-magnitude speedups over previous code for computing Gauss-Legendre rules with simultaneous high degree and precision.

...read moreread less

18 citations

Posted Content•

Automating Generation of Low Precision Deep Learning Operators.

[...]

Meghan Cowan, Thierry Moreau, Tianqi Chen, Luis Ceze

25 Oct 2018-arXiv: Learning

TL;DR: This paper presents an extensive case study on low power ARM Cortex-A53 CPU, and shows how it can generate 1-bit, 2-bit convolutions with speedups up to 16x over an optimized 16-bit integer baseline and 2.3x better than handwritten implementations.

...read moreread less

Abstract: State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and mobile devices poses a challenge due to their limited compute capabilities and strict energy budgets. One solution that has generated significant research interest is deploying highly quantized models that operate on low precision inputs and weights less than eight bits, trading off accuracy for performance. These models have a significantly reduced memory footprint (up to 32x reduction) and can replace multiply-accumulates with bitwise operations during compute intensive convolution and fully connected layers. Most deep learning frameworks rely on highly engineered linear algebra libraries such as ATLAS or Intel's MKL to implement efficient deep learning operators. To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries. In this paper we introduce a work flow to quickly generate high performance low precision deep learning operators for arbitrary precision that target multiple CPU architectures and include optimizations such as memory tiling and vectorization. We present an extensive case study on low power ARM Cortex-A53 CPU, and show how we can generate 1-bit, 2-bit convolutions with speedups up to 16x over an optimized 16-bit integer baseline and 2.3x better than handwritten implementations.

...read moreread less

17 citations

Proceedings Article•10.1109/HPCC/SMARTCITY/DSS.2018.00059•

Acceleration of Large Integer Multiplication with Intel AVX-512 Instructions

[...]

Takuya Edamatsu¹, Daisuke Takahashi¹•Institutions (1)

University of Tsukuba¹

28 Jun 2018

TL;DR: An implementation of large integer multiplication using Single Instruction Multiple Data (SIMD) instructions using a reduced-radix representation is proposed and the execution time and the number of instructions are compared against the GNU Multiple Precision Arithmetic Library (GMP).

...read moreread less

Abstract: In this paper, we propose an implementation of large integer multiplication using Single Instruction Multiple Data (SIMD) instructions. We evaluated the implementation on an Intel Xeon Phi processor. The second generation Intel Xeon Phi processor, Knights Landing, has a set of Advanced Vector Extensions-512 (AVX-512) instructions. Using AVX-512, the processor can handle 512 bits at the same time and has the potential to multiply faster than a processor using Streaming SIMD Extensions (SSE) and AVX. Therefore, we applied AVX-512F (foundation) instructions to the program. In the multiplication of large integers, as the number of digits increases, various processing costs also become larger. One of these costs is carry processing. Therefore, we implemented a multiplication function using a reduced-radix representation and compared the execution time and the number of instructions against the GNU Multiple Precision Arithmetic Library (GMP). Furthermore, we used some optimization techniques for this kernel. We successfully achieved an execution time that was approximately 2.5x faster than GMP on the Knights Landing architecture.

...read moreread less

10 citations

10.7275/11399986.0•

A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units

[...]

Niall Emmart

1 Jan 2018

TL;DR: A study of the impact of multi-modal decision analysis on graphics processing units and how it affects performance and efficiency is published.

...read moreread less

Abstract: A STUDY OF HIGH PERFORMANCE MULTIPLE PRECISION ARITHMETIC ON GRAPHICS PROCESSING UNITS

...read moreread less

8 citations

Journal Article•

A Computationally Efficient FPTAS for Convex Stochastic Dynamic Programs

[...]

Nir Halman, Giacomo Nannicini, James B. Orlin

11 Jun 2018-Siam Journal on Control and Optimization

TL;DR: This paper designs and implements an FPTAS with excellent computational performance and shows that it is faster than an exact algorithm even for small problem instances and small approximation factors, becoming orders of magnitude faster as the problem size increases.

...read moreread less

Abstract: We propose a computationally efficient fully polynomial-time approximation scheme (FPTAS) to compute an approximation with arbitrary precision of the value function of convex stochastic dynamic programs, using the technique of $K$-approximation sets and functions introduced by Halman et al. [Math. Oper. Res., 34, (2009), pp. 674--685]. This paper deals with the convex case only, and it has the following contributions. First, we improve on the worst-case running time given by Halman et al. Second, we design and implement an FPTAS with excellent computational performance and show that it is faster than an exact algorithm even for small problem instances and small approximation factors, becoming orders of magnitude faster as the problem size increases. Third, we show that with careful algorithm design, the errors introduced by floating point computations can be bounded, so that we can provide a guarantee on the approximation factor over an exact infinite-precision solution. We provide an extensive computatio...

...read moreread less

8 citations

Posted Content•

Fast and rigorous arbitrary-precision computation of Gauss-Legendre quadrature nodes and weights

[...]

Fredrik Johansson, Marc Mezzarobba

12 Feb 2018-arXiv: Numerical Analysis

TL;DR: In this article, the authors describe a strategy for rigorous arbitrary-precision evaluation of Legendre polynomials on the unit interval and its application in the generation of Gauss-Legendre quadrature rules.

...read moreread less

Abstract: We describe a strategy for rigorous arbitrary-precision evaluation of Legendre polynomials on the unit interval and its application in the generation of Gauss-Legendre quadrature rules Our focus is on making the evaluation practical for a wide range of realistic parameters, corresponding to the requirements of numerical integration to an accuracy of about 100 to 100 000 bits Our algorithm combines the summation by rectangular splitting of several types of expansions in terms of hypergeometric series with a fixed-point implementation of Bonnet's three-term recurrence relation We then compute rigorous enclosures of the Gauss-Legendre nodes and weights using the interval Newton method We provide rigorous error bounds for all steps of the algorithm The approach is validated by an implementation in the Arb library, which achieves order-of-magnitude speedups over previous code for computing Gauss-Legendre rules with simultaneous high degree and precision

...read moreread less

Proceedings Article•10.5220/0006538401750182•

A Simple and Robust Approach to Computation of Meshes Intersection

[...]

Věra Skorkovská¹, Ivana Kolingerová¹, Bedrich Benes²•Institutions (2)

University of West Bohemia¹, Purdue University²

27 Jan 2018

TL;DR: This work proposes an accurate geometry-based method for local repair of intersecting meshes accurately without the need to manipulate with the input data or to employ arbitrary precision arithmetic.

...read moreread less

Abstract: Triangular meshes are important in many fields in both basic and applied research that rely on their correctness and accuracy. Many operations with meshes can lead to undesirable situations and the resulting models can be damaged and further unusable. Self-intersection and mesh-to-mesh intersection are types of operations that are often present and can cause such problems. We propose an accurate geometry-based method for local repair of intersecting meshes. The state-of-the-art methods either solve the problem inaccurately, or use methods such as arbitrary precision arithmetic or virtual perturbation to deal with the troublesome boundary cases. Our method represents a robust way to repair intersecting meshes accurately without the need to manipulate with the input data or to employ arbitrary precision arithmetic. The correct solution is obtained through a careful classification of the cases that could result from a numerical imprecision of the floating point arithmetic.

...read moreread less

Posted Content•

Numerical Relativity with Arbitrary Precision Arithmetic: Applications to Gravitational Collapse

[...]

Daniel Santos-Oliván, Carlos F. Sopuerta

02 Mar 2018-arXiv: Computational Physics

TL;DR: In this article, Pseudo-Spectral Collocation (PSC) is used in combination with high-order precision arithmetic for Numerical Relativity problems with high accuracy and performance requirements.

...read moreread less

Abstract: Numerical Relativity is a mature field with many applications in Astrophysics, Cosmology and even in Fundamental Physics. As such, we are entering a stage in which new sophisticated methods adapted to open problems are being developed. In this paper, we advocate the use of Pseudo-Spectral Collocation (PSC) methods in combination with high-order precision arithmetic for Numerical Relativity problems with high accuracy and performance requirements. The PSC method provides exponential convergence (for smooth problems, as is the case in many problems in Numerical Relativity) and we can use different bit precision without the need of changing the structure of the numerical algorithms. Moreover, the PSC method provides high-compression storage of the information. We introduce a series of techniques for combining these tools and show their potential in two problems in relativistic gravitational collapse: (i) The classical Choptuik collapse, estimating with arbitrary precision the location of the apparent horizon. (ii) Collapse in asympotically anti-de Sitter spacetimes, showing that the total energy is preserved by the numerical evolution to a very high degree of precision.

...read moreread less

Posted Content•

Numerical integration in arbitrary-precision ball arithmetic

[...]

Fredrik Johansson¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

22 Feb 2018-arXiv: Mathematical Software

...read moreread less

Journal Article•10.1145/3177922•

Average Counting via Approximate Histograms

[...]

Jacek Cichoń¹, Karol Gotfryd¹•Institutions (1)

Wrocław University of Technology¹

29 Mar 2018-ACM Transactions on Sensor Networks

TL;DR: A new algorithm is proposed that allows the approximation of the average of a set of measurements done by sensor network with arbitrary precision, controlled by two parameters, and requires O(D) rounds, where D is the diameter of the network.

...read moreread less

Abstract: We propose a new algorithm for the classical averaging problem for distributed wireless sensors networks. This subject has been studied extensively and there are many clever algorithms in the literature. These algorithms are based on the idea of local exchange of information. They behave well in dense networks (e.g., in networks whose connections form a complete graph), but their convergence to the real average is very slow in linear or cyclic graphs.Our solution is different. In order to calculate the average, we first build an approximate histogram of observed data; then, from this histogram, we estimate the average. In our solution, we use the extreme propagation technique and probabilistic counters. It allows us to find the approximation of the average of a set of measurements done by sensor network with arbitrary precision, controlled by two parameters. Our method requires O(D) rounds, where D is the diameter of the network. We study the message complexity of this algorithm and show that it is of order O(log n) for each node, where n is the size of the network.

...read moreread less

Journal Article•10.1002/MMA.5058•

Numerical investigation of the acoustic scattering problem from penetrable prolate spheroidal structures using the Vekua transformation and arbitrary precision arithmetic

[...]

Leonidas N. Gergidis¹, Drosos Kourounis², Stylianos Mavratzas, Antonios Charalambopoulos³•Institutions (3)

University of Ioannina¹, University of Lugano², National Technical University of Athens³

15 Sep 2018-Mathematical Methods in The Applied Sciences

Abstract: A complete set of radiating “outwards” eigensolutions of the Helmholtz equation, obtained by transforming appropriately through the Vekua mapping the kernel of Laplace equation, is applied to the investigation of the acoustic scattering by penetrable prolate spheroidal scatterers. The scattered field is expanded in terms of the aforementioned set, detouring so the standard spheroidal wave functions along with their inherent numerical deficiencies. The coefficients of the expansion are provided by the solution of linear systems, the conditioning of which calls for arbitrary precision arithmetic. Its integration enables the polyparametric investigation of the convergence of the current approach to the solution of the direct scattering problem. Finally, far‐field pattern visualization in the 3D space clarifies the preferred scattering directions for several frequencies of the incident wave, ranging from the “low” to the “resonance” region.

...read moreread less

Book Chapter•10.1007/978-981-10-4394-9_67•

Single-Precision Floating Point Matrix Multiplier Using Low-Power Arithmetic Circuits

[...]

Soumya Gargave¹, Yash Agrawal¹, Rutu Parekh¹•Institutions (1)

Dhirubhai Ambani Institute of Information and Communication Technology¹

1 Jan 2018

TL;DR: Simulation results show that design of floating point matrix multiplier is better at 45 nm than 180 nm technology node in terms of lesser delay by 43% and energy-delay product by 97.86% at 1 V.

...read moreread less

Abstract: This paper presents a single-precision floating point (IEEE 754 standard) matrix multiplier module. This is constructed using subblocks, which include floating point adder and floating point multiplier. These subblocks are designed to achieve the goal of low power consumption. Different architectures of subblocks are compared on the basis of energy-delay product. Design and simulations have been performed for 180 and 45 nm technology node. Simulation results show that design of floating point matrix multiplier is better at 45 nm than 180 nm technology node in terms of lesser delay by 43% and energy-delay product by 97.86% at 1 V. Also, 45 nm technology cells occupy only 6.25% of the area as compared to 180 nm cells.

...read moreread less

Proceedings Article•10.29007/5C91•

Numerical validation in quadruple precision using stochastic arithmetic

[...]

Stef Graillat, Fabienne Jézéquel, Romain Picot, François Févotte, Bruno Lathuilière - Show less +1 more

11 Oct 2018

TL;DR: The CADNA library has been improved to enable the estimation of rounding errors in programs using quadruple precision floating-point variables, i.e. having 113-bit mantissa length variables.

...read moreread less

Abstract: Discrete Stochastic Arithmetic (DSA) enables one to estimate rounding errors and to detect numerical instabilities in simulation programs. DSA is implemented in the CADNA library that can analyze the numerical quality of single and double precision programs. In this article, we show how the CADNA library has been improved to enable the estimation of rounding errors in programs using quadruple precision floating-point variables, i.e. having 113-bit mantissa length. Although an implementation of DSA called SAM exists for arbitrary precision programs, a significant performance improvement has been obtained with CADNA compared to SAM for the numerical validation of programs with 113-bit mantissa length variables. This new version of CADNA has been sucessfully used for the control of accuracy in quadruple precision applications, such as a chaotic sequence and the computation of multiple roots of polynomials. We also describe a new version of the PROMISE tool, based on CADNA, that aimed at reducing in numerical programs the number of double precision variable declarations in favor of single precision ones, taking into account a requested accuracy of the results. The new version of PROMISE can now provide type declarations mixing single, double and quadruple precision.

...read moreread less

Proceedings Article•10.23919/MIXDES.2018.8436868•

IP Core of Coprocessor for Multiple-Precision-Arithmetic Computative

[...]

Kamil Rudnicki, Tomasz P. Stefanski¹•Institutions (1)

Gdańsk University of Technology¹

21 Jun 2018

TL;DR: An IP core of coprocessor supporting computations requiring integer multiple-precision arithmetic (MPA) and a processor aimed to provide scalability allowing one to use the developed IP core not only in scientific computing, but also in embedded systems employing encryption based on MPA.

...read moreread less

Abstract: In this paper, we present an IP core of coprocessor supporting computations requiring integer multiple-precision arithmetic (MPA). Whilst standard 32/64-bit arithmetic is sufficient to solve many computing problems, there are still applications that require higher numerical precision. Hence, the purpose of the developed coprocessor is to support and offload central processing unit (CPU) in such computations. The developed digital circuit of the coprocessor works with integer numbers of precision approaching maximally 32 kbits. Our IP core is developed using the very high speed integrated circuit hardware description language (VHDL) and simulated assuming implementation in field-programmable gate arrays (FPGAs). It exchanges data using three 64-bit data buses whereas a code for execution on the coprocessor is fetched from a dedicated 8-bit bus (all buses in AMBA standard - AXI Stream). An instruction set of the coprocessor currently consists of 7 instructions including multiplication, addition and subtraction. The computations can maximally employ 16 registers of the length 32k bits. Simulation results assuming implementation on Zynq system on chip (SoC) show that computations of the factorial $(n!)$ for $n=\pmb{1000}$ take $\pmb{326.4}\mu\pmb{\sec}$ . Such a design currently requires 7982 look-up tables (LUTs), 10400 flip-flops (FFs), 33 block RAMs (BRAMs) and 28 DSP modules. The processor is aimed to provide scalability allowing one to use the developed IP core not only in scientific computing, but also in embedded systems employing encryption based on MPA.

...read moreread less

Proceedings Article•10.1109/ENT-MIPT.2018.00042•

Multiple-Precision Summation on Hybrid CPU-GPU Platforms Using RNS-based Floating-Point Representation

[...]

Konstantin Isupov, Alexander Kuvaev

01 Nov 2018-Canadian Entomologist

TL;DR: This work considers the summation of large sets of floating-point numbers on hybrid CPU-GPU platforms using MPRES, a new software library for multiple-precision computations on CPUs and CUDA compatible GPUs, and presents the addition algorithm for RNS-based representations, as well as three multiple- Precision summation algorithms.

...read moreread less

Abstract: We consider the summation of large sets of floating-point numbers on hybrid CPU-GPU platforms using MPRES, a new software library for multiple-precision computations on CPUs and CUDA compatible GPUs. This library uses an RNSbased floating-point representation, in accordance with which the multiple-precision significands are represented in a residue number system (RNS). This representation allows the computation of digits (residues) of significands in a parallel way and without carry propagation delay. We present the addition algorithm for RNS-based representations, as well as three multiple-precision summation algorithms: recursive summation, pairwise summation, and block-parallel hybrid summation. The hybrid algorithm demonstrates better performance, as it allows the full utilization of the GPU's resources.

...read moreread less

Proceedings Article•10.15439/2018F107•

Computation of Gauss-Jacobi Quadrature Nodes and Weights with Arbitrary Precision

[...]

Dariusz W. Brzeziński¹•Institutions (1)

Lodz University of Technology¹

26 Sep 2018

TL;DR: The results of numerical experiments presented in the paper prove high accuracy and efficiency of developed methods for computation of quadratures' nodes and weights, decreased amount of required iterations for polynomials zeros finding and elimination of truncation errors during weights computation.

...read moreread less

Abstract: In the paper there are presented efficient and accurate methods of Gauss-Jacobi nodes and weights computation. They include an enhancement for standard iteration method for Jacobi polynomials zeros finding, weight function formula transformation for increased accuracy of fractional derivatives computation and arbitrary precision application for mitigation of double precision arithmetic flaws. The results of numerical experiments presented in the paper prove high accuracy and efficiency of developed methods for computation of quadratures' nodes and weights, decreased amount of required iterations for polynomials zeros finding and elimination of truncation errors during weights computation. Accuracy of computations depends on height of precision applied for it, which is limited only by accessible hardware.

...read moreread less

Proceedings Article•10.1109/ARITH.2019.00032•

Performance Evaluation of an Extrapolation Method for Ordinary Differential Equations with Error-free Transformation

[...]

Tomonori Kouya

08 Aug 2018-arXiv: Numerical Analysis

TL;DR: The application of EFT to explicit extrapolation methods to solve initial value problems of ordinary differential equations is proposed and implemented routines can be effective for large-sized linear ODE and small-sized nonlinear ODE.

...read moreread less

Abstract: The application of error-free transformation (EFT) is recently being developed to solve ill-conditioned problems. It can reduce the number of arithmetic operations required, compared with multiple precision arithmetic, and also be applied by using functions supported by a well-tuned BLAS library. In this paper, we propose the application of EFT to explicit extrapolation methods to solve initial value problems of ordinary differential equations. Consequently, our implemented routines can be effective for large-sized linear ODE and small-sized nonlinear ODE, especially in the case when harmonic sequence is used.

...read moreread less

Book Chapter•10.1007/978-981-10-7191-1_20•

8-Bit Asynchronous Wave-Pipelined Arithmetic Logic Unit

[...]

Polani Rahul¹, Korada Prudhvi Raj¹, S. Umadevi¹•Institutions (1)

VIT University¹

1 Jan 2018

TL;DR: An 8-bit asynchronous wave-pipelined arithmetic logic unit has been modified with set of 8 arithmetic and 12 logical operations in order to reduce power and latency by using ASIC semi-custom design flow in cadence® environment using gpdk-180-nm technology.

...read moreread less

Abstract: In this paper, an 8-bit asynchronous wave-pipelined arithmetic logic unit has been modified with set of 8 arithmetic and 12 logical operations. All the internal modules have been modified in order to reduce power and latency by using ASIC semi-custom design flow in cadence® environment using gpdk-180-nm technology. This modified design has achieved reduction in power by 45%, reduction in delay by 19%, reduction in area by 43%, reduction in cell count by 49% as compared to the existing ALU.

...read moreread less

Proceedings Article•10.1109/APUSNCURSINRSM.2018.8608958•

Error Control of MLFMA within a Multiple- Precision Arithmetic Framework

[...]

Mert Kalfa¹, Vakur B. Erturk¹, Ozgur Ergul²•Institutions (2)

Bilkent University¹, Middle East Technical University²

1 Jul 2018

TL;DR: A new error control scheme is presented that provides the truncation numbers as well as the required digits of machine precision for the multilevel fast multipole algorithm (MLFMA) and can be used to solve low-frequency problems that would otherwise experience overflow issues.

...read moreread less

Abstract: We present a new error control scheme that provides the truncation numbers as well as the required digits of machine precision for the multilevel fast multipole algorithm (MLFMA). The proposed method is valid for all frequencies, whereas the previous studies on error control are valid only for high-frequency problems. When combined with a multiple-precision arithmetic framework, the proposed method can be used to solve low-frequency problems that would otherwise experience overflow issues. Numerical results in the form of optimal truncation numbers and machine precisions for a variety of box sizes and desired relative error thresholds are presented and compared with the results available in the literature.

...read moreread less

Journal Article•10.1080/0020739X.2017.1349943•

An elementary algorithm to evaluate trigonometric functions to high precision

[...]

B. Tomas Johansson¹•Institutions (1)

Aston University¹

02 Jan 2018-International Journal of Mathematical Education in Science and Technology

TL;DR: Evaluation of the cosine function is done via a simple Cordic-like algorithm, together with a package for handling arbitrary-precision arithmetic in the computer program Matlab, which approximates the function having hundreds of correct decimals.

...read moreread less

Abstract: Evaluation of the cosine function is done via a simple Cordic-like algorithm, together with a package for handling arbitrary-precision arithmetic in the computer program Matlab. Approximations to the cosine function having hundreds of correct decimals are presented with a discussion around errors and implementation.

...read moreread less

Book Chapter•10.4018/978-1-5225-2915-6.CH016•

A Software Library for Multi Precision Arithmetic

[...]

Kannan Balasubramanian¹, Ahmed Abbas²•Institutions (2)

Mepco Schlenk Engineering College¹, American University in Cairo²

1 Jan 2018

Proceedings Article•10.23919/PIERS.2018.8598097•

Diffracted Field Calculation Using Multiple Precision Arithmetic and Parallel Computing

[...]

Takashi Kuroki¹, Toshihiko Shibazaki¹, Teruhiro Kinoshita²•Institutions (2)

College of Industrial Technology¹, Tokyo Polytechnic University²

1 Aug 2018

TL;DR: Diffracted field calculations using multiple precision arithmetic and parallel computing to obtain meaningful numerical data for larger disks using double precision or quadruple precision arithmetic is described.

...read moreread less

Abstract: KOSEN students learn various latest skills and technologies. In computer science, high performance computing such as parallel computing and high precision computing is a latest topic. The electromagnetic diffraction by a circular disk of perfect conductor has been analyzed rigorously. However, it has been difficult to obtain meaningful numerical data for larger disks using double precision or quadruple precision arithmetic. By using multiple precision arithmetic, numerical data for the current distributions and the diffracted fields can be obtained for larger disks. In addition, equations for calculating the diffracted fields are expanded and rearranged to control accuracies of numerical data. The multiple precision arithmetic wastes computing time. For speeding up, parallel computing is used. Through these high performance computing, our KOSEN students learn numerical calculation technique. This article describes diffracted field calculations using multiple precision arithmetic and parallel computing.

...read moreread less

Journal Article•10.1007/S12532-019-00154-6•

Solving Quadratic Programs to High Precision using Scaled Iterative Refinement

[...]

Tobias Weber¹, Sebastian Sager¹, Ambros M. Gleixner²•Institutions (2)

Otto-von-Guericke University Magdeburg¹, Zuse Institute Berlin²

19 Mar 2018-arXiv: Optimization and Control

TL;DR: In this paper, a refinement algorithm is proposed to solve quadratic optimization problems to arbitrary precision, assuming a floating-point QP solver oracle and proving linear convergence of residuals and primal errors.

...read moreread less

Abstract: Quadratic optimization problems (QPs) are ubiquitous, and solution algorithms have matured to a reliable technology. However, the precision of solutions is usually limited due to the underlying floating-point operations. This may cause inconveniences when solutions are used for rigorous reasoning. We contribute on three levels to overcome this issue. First, we present a novel refinement algorithm to solve QPs to arbitrary precision. It iteratively solves refined QPs, assuming a floating-point QP solver oracle. We prove linear convergence of residuals and primal errors. Second, we provide an efficient implementation, based on SoPlex and qpOASES that is publicly available in source code. Third, we give precise reference solutions for the Maros and M\'esz\'aros benchmark library.

...read moreread less

Journal Article•10.7939/r3j960r8x•

Design, Evaluation and Application of Approximate Arithmetic Circuits

[...]

Jiang Hong-lan

11 Nov 2018

Abstract: As very important modules in a processor, arithmetic circuits often play a pivotal role in determining the performance and power dissipation of a demanding computation. The demand for higher speed and power efficiency, as well as the desirability for error resilience in many applications (e.g., multimedia, recognition and data analytics) has driven the development of approximate arithmetic circuit design. In this dissertation, approximate arithmetic circuits are evaluated, several fundamental approximate circuits are devised, and a high-performance and energy-efficient approximate adaptive filter is proposed using approximate distributed arithmetic (DA) circuits. Existing approximate arithmetic circuits in the literature are first reviewed, evaluated and compared to guide the selection of a suitable approximate design for a specific application with designated purposes. A low-power approximate radix-8 Booth multiplier using an approximate recoding adder is then proposed for signed multiplication. Compared with an accurate multiplier, the proposed approximate design saves as much as 44% in power and 43% in area with a mean relative error distance (MRED) of 0.43%. Compared with the other approximate Booth multipliers, the proposed design has the lowest power-delay product while providing a moderate accuracy. Moreover, an adaptive approximation approach is proposed for the design of a divider and a square root (SQR) circuit. In this design, the division/SQR is computed using a reduced-width divider/SQR circuit and a shifter by adaptively pruning the input bits. The synthesis results show that the proposed approximate divider with an MRED of 6.6% achieves more than 60% improvements in speed and power dissipation compared with an accurate design. The proposed divider is more accurate than other approximate dividers when a similar power-delay product is considered. By changing the width of the reduced-width SQR circuit, the approximate SQR circuit is 22.69% to 74.54% faster, and saves 30.75% to 79.34% in power with an MRED from 0.7% to 8.0% compared with an accurate design. Compared to other approximate designs, the proposed approximate divider and SQR circuit designs perform better in image processing applications. The superior control capability of the cerebellum has motivated extensive interest in the development of computational cerebellar models. Many models have been applied to motor control and image stabilization in robots. Often computationally complex, cerebellar models have rarely been implemented in dedicated hardware. In this dissertation, a fixed-point finite impulse response adaptive filter is proposed using approximate DA circuits. This design can be used in general digital signal processing applications as well as in control systems as an adaptive filter-based cerebellar model. In this design, the radix-8 Booth algorithm is used to reduce the number of partial products in the DA architecture, and the partial products are approximately generated by truncating the input data with error compensation, accumulated by using an approximate Wallace tree. At a similar accuracy, the proposed design attains on average a 55% reduction in energy per operation and a 2.2× increase in throughput per area compared with an accurate design. A saccadic system using the proposed approximate adaptive filter-based cerebellar model achieves a similar retinal slip as using an accurate filter. These results are promising for the large-scale integration of approximate circuits into high-performance and energy-efficient systems for error-resilient applications.

...read moreread less

Proceedings Article•10.1145/3190339.3190341•

Provably correct posit arithmetic with fixed-point big integer

[...]

Shin Yee Chung

28 Mar 2018

TL;DR: This paper seeks to develop provably correct posit arithmetic based on fixed-point big integers that can serve as a reference for other hardware-optimized implementations, as a test bed for applications to experiment with different posit bit configurations, and to analyze the relative errors of using smaller bit sizes in the posit numbers compared to using the native 32-bit or 64-bit floating-point numbers.

...read moreread less

Abstract: Floating-point number format is used extensively in many applications, especially scientific software. The applications rely on efficient hardware floating-point support to perform arithmetic operations. With the advent of multicore CPUs and massively parallel GPUs, the memory bandwidth of a computer system is increasingly limited for each of the compute cores. The limited memory bandwidth is a serious bottleneck to the system performance. The posit number format [12] is a promising approach to improve the accuracy of the arithmetic operations with more efficient use of bit storage, hence, reducing memory contention. However, robust and reliable software implementations of posit arithmetic libraries in C/C++ or Python are not readily available. In this paper, we seek to develop provably correct posit arithmetic based on fixed-point big integers. A robust and reliable implementation can then serve as a reference for other hardware-optimized implementations, as a test bed for applications to experiment with different posit bit configurations, and to analyze the relative errors of using smaller bit sizes in the posit numbers compared to using the native 32-bit or 64-bit floating-point numbers.

...read moreread less