TL;DR: The results show that the proposed 16-bit approximate radix-4 Booth multiplier with approximate factors of 12 and 14 are more accurate than existing approximate Booth multipliers with moderate power consumption and the proposed R4ABM2 multiplier with an approximation factor of 14 is the most efficient design.
Abstract: Approximate computing is an attractive design methodology to achieve low power, high performance (low delay) and reduced circuit complexity by relaxing the requirement of accuracy. In this paper, approximate Booth multipliers are designed based on approximate radix-4 modified Booth encoding (MBE) algorithms and a regular partial product array that employs an approximate Wallace tree. Two approximate Booth encoders are proposed and analyzed for error-tolerant computing. The error characteristics are analyzed with respect to the so-called approximation factor that is related to the inexact bit width of the Booth multipliers. Simulation results at 45 nm feature size in CMOS for delay, area and power consumption are also provided. The results show that the proposed 16-bit approximate radix-4 Booth multipliers with approximate factors of 12 and 14 are more accurate than existing approximate Booth multipliers with moderate power consumption. The proposed R4ABM2 multiplier with an approximation factor of 14 is the most efficient design when considering both power-delay product and the error metric NMED. Case studies for image processing show the validity of the proposed approximate radix-4 Booth multipliers.
TL;DR: The bit approximate radix-8 Booth multipliers are designed using the approximate recoding adder with and without the truncation of a number of less significant bits in the partial products.
Abstract: The Booth multiplier has been widely used for high performance signed multiplication by encoding and thereby reducing the number of partial products. A multiplier using the radix- $4$ (or modified Booth) algorithm is very efficient due to the ease of partial product generation, whereas the radix- $8$ Booth multiplier is slow due to the complexity of generating the odd multiples of the multiplicand. In this paper, this issue is alleviated by the application of approximate designs. An approximate $2$ -bit adder is deliberately designed for calculating the sum of $1\times$ and $2\times$ of a binary number. This adder requires a small area, a low power and a short critical path delay. Subsequently, the $2$ -bit adder is employed to implement the less significant section of a recoding adder for generating the triple multiplicand with no carry propagation. In the pursuit of a trade-off between accuracy and power consumption, two signed $16\times 16$ bit approximate radix-8 Booth multipliers are designed using the approximate recoding adder with and without the truncation of a number of less significant bits in the partial products. The proposed approximate multipliers are faster and more power efficient than the accurate Booth multiplier. The multiplier with 15-bit truncation achieves the best overall performance in terms of hardware and accuracy when compared to other approximate Booth multiplier designs. Finally, the approximate multipliers are applied to the design of a low-pass FIR filter and they show better performance than other approximate Booth multipliers.
TL;DR: The architecture of a design method for an M-bit by N-bit Booth encoded parallel multiplier generator and an algorithm for reducing the delay inside the branches of the Wallace tree section are discussed.
Abstract: The architecture of a design method for an M-bit by N-bit Booth encoded parallel multiplier generator are discussed. An algorithm for reducing the delay inside the branches of the Wallace tree section is explained. The final step of adding two N+or-M-1-bit numbers is done by an optimal carry select adder stage. The algorithm for optimal partitioning of the N+or-M-1-bit adder is also presented. >
TL;DR: This work improved the algorithm and the method of implementation, and designed an advanced multiplier and divider for MOS LSI based on a new algorithm that has several excellent features such as high speed addition operations.
Abstract: A high speed multiplier and divider for MOS LSI based on a new algorithm is presented. When we implement the multiplier and the divider in LSI, the features such as high speed operation, small number of transistors and easy layout are the most important factors. A computational algorithm using a redundant binary representation has several excellent features such as high speed addition operations. We improved the algorithm and the method of implementation, and designed an advanced multiplier and divider with the above mentioned features. We expect mat our multiplier and divider are excellent compared with multipliers using the Booth algorithm and the Wallace tree, and with divider using the SRT method, respectively.
TL;DR: By using recurring wire shifters, the authors can expand the level of repeated blocks to cover the entire adder tree, which simplifies the complicated Wallace tree wiring scheme.
Abstract: A 54-b*54-b parallel multiplier was implemented in 0.88- mu m CMOS using the new, regularly structured tree (RST) design approach. The circuit is basically a Wallace tree, but the tree and the set of partial-product-bit generators are combined into a recurring block which generates seven partial-product bits and compresses them to a pair of bits for the sum and carry signals. This block is used repeatedly to construct an RST block in which even wiring among blocks included in wire shifters is designed as recurring units. By using recurring wire shifters, the authors can expand the level of repeated blocks to cover the entire adder tree, which simplifies the complicated Wallace tree wiring scheme. In addition, to design time savings, layout density is increased by 70% to 6400 transistors/mm/sup 2/, and the multiplication time is decreased by 30% to 13 ns. >