Half-precision floating-point format

Topic Tools

Papers published on a yearly basis

Papers

IEEE Standard for Binary Floating Point Arithmetic

[...]

Ansi Ieee

1 Jan 1985

1,405 citations

Patent•

Floating point for simid array machine

[...]

Paul Amba Wilkinson¹, James Warren Dieffenderfer¹, Peter M. Kogge¹•Institutions (1)

IBM¹

1 Jun 1995

TL;DR: In this paper, the IEEE 32-bit floating-point format was replaced with the IEEE 8-bit format, which allows floating point commands to be executed in a fixed small number of cycles, thus advancing the capabilities of doing floating point arithmetic on a SIMD machine.

...read moreread less

Abstract: A floating point system and method according to a format that includes a sign bit, an exponent part having a plurality of bits, and a fraction part having a plurality of multi-bit blocks, wherein floating point operation is based on block shifts of the fraction part, with each shift of one block associated with an increment or decrement of the exponent part by one count. This format illustrated is implemented as a format suitable for the accuracy greater than the IEEE 32-bit floating-point format, and is intended to be implemented in machines having byte-wide (8 bit) data streams. The preferred format consists of a sign bit, 7 exponent bits and 4 fraction bytes of eight bits for a total of 40 bits. This format and implementation allows floating-point commands to be executed in a fixed small number of cycles, thus advancing the capabilities of doing floating-point arithmetic on a SIMD machine. The floating-point implementation is adaptable to multiprocessor parallel array processor computing systems and for parallel array processing with a simplified architecture adaptable to chip implementation. The array provided is an N dimensional array of byte-wide processing units each coupled with an adequate segment of byte-wide memory and control logic. A partitionable section of the array containing several processing units is contained on a silicon chip arranged with eight elements of the processing array each preferably consisting of combined processing element with a local memory for processing bit parallel bytes of information in a clock cycle. A processor system (or subsystem) comprises an array of pickets, a communication network, an 1/0 system, and a SIMD controller consisting of a microprocessor, a canned-routine processor, and a microcontroller that runs the array.

...read moreread less

147 citations

Proceedings Article•10.1109/FPGA.1996.564761•

Implementation of IEEE single precision floating point addition and multiplication on FPGAs

[...]

Louca¹, Cook¹, Johnson¹•Institutions (1)

Rutgers University¹

17 Apr 1996

TL;DR: This work has explored FPGA implementations of addition and multiplication for IEEE single precision floating-point numbers, and prototypes have been implemented on Altera FLEX8000s, and peak rates of 7 MFlops for 32-bit addition and 2.3 M flop multiplication have been obtained.

...read moreread less

Abstract: Floating point operations are hard to implement on FPGAs because of the complexity of their algorithms. On the other hand, many scientific problems require floating point arithmetic with high levels of accuracy in their calculations. Therefore, we have explored FPGA implementations of addition and multiplication for IEEE single precision floating-point numbers. Customizations were performed where this was possible in order to save chip area, or get the most out of our prototype board. The implementations tradeoff area and speed for accuracy. The adder is a bit-parallel adder, and the multiplier is a digit-serial multiplier. Prototypes have been implemented on Altera FLEX8000s, and peak rates of 7 MFlops for 32-bit addition and 2.3 MFlops for 32-bit multiplication have been obtained.

...read moreread less

134 citations

Book Chapter•10.1007/3-540-46117-5_68•

A Library of Parameterized Floating-Point Modules and Their Use

[...]

Pavle Belanovic¹, Miriam Leeser¹•Institutions (1)

Northeastern University¹

2 Sep 2002

TL;DR: A library of fully parameterized hardware modules for format control, arithmetic operations and conversion to and from any fixed-point format, and for hybrid implementations that combine both fixed and floating-point calculations.

...read moreread less

Abstract: We present a parameterized floating-point library for use with reconfigurable hardware. Our format is both general and flexible. All IEEE formats are a subset of our format, as are all previously published floating-point formats for reconfigurable hardware. We have developed a library of fully parameterized hardware modules for format control, arithmetic operations and conversion to and from any fixed-point format. The format converters allow for hybrid implementations that combine both fixed and floating-point calculations. This permits the designer to choose between the increased range of floating-point and the increased precision of fixed-point within the same application. We illustrate the use of this library with a hybrid implementation of the K-means clustering algorithm applied to multispectral satellite images.

...read moreread less

129 citations

Patent•

Common format for encoding both single and double precision floating point numbers

[...]

Russell W. Mason¹, Craig A. Heikes¹•Institutions (1)

Hewlett-Packard¹

14 Sep 1992

TL;DR: In this article, a technique for encoding multiple floating point formats into a double precision floating point number by padding single word floating point numbers with zeros to form a 64-bit double word was proposed.

...read moreread less

Abstract: A technique for encoding multiple floating point formats into a double precision floating point number by padding single word floating point numbers with zeros to form a 64-bit double word in a way that allows a single precision arithmetic logic unit to be built on top of a double precision arithmetic logic unit. The formatting circuitry of the invention requires only small differences in the hardware for single and double precision operations so as to simplify the arithmetic logic unit and the multiplier of the floating point processing units. The encoding technique of the invention includes right justifying the exponent and mantissa of the floating point number in a "common format" such that rounding of the mantissa need only occur in one place, thereby greatly simplifying the rounding procedure. The technique of the invention also removes multiplexers from critical speed paths in the floating point processing units when it is desired to accommodate multiple data formats.

...read moreread less

94 citations

...

Expand

Year	Papers
2020	1
2018	1
2017	1
2016	2
2015	9
2014	7

Topic Tools

Papers published on a yearly basis

Papers

IEEE Standard for Binary Floating Point Arithmetic

Floating point for simid array machine

Implementation of IEEE single precision floating point addition and multiplication on FPGAs

A Library of Parameterized Floating-Point Modules and Their Use

Common format for encoding both single and double precision floating point numbers

Related Topics (5)

Performance Metrics