A Memory-Efficient CNN Accelerator Using Segmented Logarithmic Quantization and Multi-Cluster Architecture

doi:10.1109/TCSII.2020.3038897

Journal Article10.1109/TCSII.2020.3038897

A Memory-Efficient CNN Accelerator Using Segmented Logarithmic Quantization and Multi-Cluster Architecture

Jiawei Xu, +6 more

- 01 Jun 2021

- IEEE Transactions on Circuits and System...

- Vol. 68, Iss: 6, pp 2142-2146

13

TL;DR: A segmented logarithmic (SegLog) quantization method is exploited to mitigate the on-chip memory and bandwidth requirements, thus accommodating more processing elements (PEs) in a given chip area to organize a reconfigurable multi-cluster architecture.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/tcsii.2022.3196055

An FPGA-Based Transformer Accelerator Using Output Block Stationary Dataflow for Object Recognition Applications

01 Jan 2023

- IEEE Transactions on Circuits and System...

TL;DR: In this article , a transformer accelerator with an output block stationary (OBS) dataflow is proposed to minimize the repeated memory access by block-level and vector-level broadcasting while preserving a high digital signal processor (DSP) utilization rate, leading to higher energy efficiency.

...read moreread less

10

Proceedings Article•10.1109/aicas54282.2022.9869994

Hardware-Friendly Logarithmic Quantization with Mixed-Precision for MobileNetV2

13 Jun 2022

TL;DR: In this paper , the authors proposed a novel logarithmic weight quantization considering the characteristics of MobileNetV2, and a mixed-precision quantization that minimizes accuracy loss by training the distribution range using the trainable parameter.

...read moreread less

9

•Journal Article•10.1109/access.2022.3157893

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

01 Jan 2022

- IEEE Access

TL;DR: FxP-QNet as discussed by the authors employs post-training self-distillation and network prediction error statistics to optimize the quantization of floating-point values into fixed-point numbers, and gradually adapts the quantisation level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements.

...read moreread less

6

•Journal Article•10.1109/access.2022.3162066

Energy-Efficient High-Speed ASIC Implementation of Convolutional Neural Network Using Novel Reduced Critical-Path Design

01 Jan 2022

- IEEE Access

TL;DR: In this paper , a bit-level-multiply-accumulator (BLMAC) with a modified Booth encoder and a Wallace reduction tree is proposed to reduce the critical path of the overall architecture.

...read moreread less

6

Journal Article•10.1109/ACCESS.2022.3162066

Energy-Efficient High-Speed ASIC Implementation of Convolutional Neural Network Using Novel Reduced Critical-Path Design

Sun-Sik Lee, +3 more

- IEEE Access

TL;DR: This paper proposes a hardware-efficient, high-speed convolution block for ASIC implementation of the CNN algorithm using a novel bit-level-multiply-accumulator (BLMAC) with a modified Booth encoder and a Wallace reduction tree.

...read moreread less

5

References

•Journal Article•10.1109/JETCAS.2019.2910232

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

Yu-Hsin Chen, +3 more

- 11 Apr 2019

- IEEE Journal on Emerging and Selected To...

TL;DR: Eyeriss v2 as mentioned in this paper is a DNN accelerator architecture designed for running compact and sparse DNNs, which can process sparse data directly in the compressed domain for both weights and activations and therefore is able to improve both processing speed and energy efficiency with sparse models.

...read moreread less

876

•Posted Content

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

Yu-Hsin Chen, +3 more

- 10 Jul 2018

- arXiv: Distributed, Parallel, and Cluste...

TL;DR: Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs, is presented, which introduces a highly flexible on-chip network that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources.

...read moreread less

628

•Proceedings Article

Post training 4-bit quantization of convolutional networks for rapid-deployment

Ron Banner, +2 more

- 01 Jan 2019

TL;DR: This paper introduces the first practical 4-bit post training quantization approach: it does not involve training the quantized model (fine-tuning), nor it requires the availability of the full dataset, and achieves accuracy that is just a few percents less the state-of-the-art baseline across a wide range of convolutional models.

...read moreread less

528

Proceedings Article•10.1109/ICASSP.2017.7953288

LogNet: Energy-efficient neural networks using logarithmic computation

Edward H. Lee, +4 more

- 05 Mar 2017

TL;DR: This work explores how logarithmic encoding of non-uniformly distributed weights and activations is preferred over linear encoding at resolutions of 4 bits and less and enables networks to achieve higher classification accuracies than fixed-point at low resolutions and eliminate bulky digital multipliers.

...read moreread less

196

•Posted Content

Post-training 4-bit quantization of convolution networks for rapid-deployment

Ron Banner, +3 more

- 02 Oct 2018

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This article proposed a 4-bit post-training quantization approach, which does not require training the quantized model (fine-tuning), nor does it require the availability of the full dataset.

...read moreread less

185