TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators

doi:10.23919/DATE51398.2021.9474235

Proceedings Article10.23919/DATE51398.2021.9474235

TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators

Geng Yuan, +12 more

- 01 Feb 2021

- pp 926-931

33

TL;DR: In this article, the authors proposed a weight pruning framework for ReRAM-based mixed-signal DNN accelerators, which effectively reduces the required bits for ADC resolution and hence the overall area and power consumption of the accelerator without introducing any computational inaccuracy.

Abstract: As the number of weight parameters in deep neural networks (DNNs) continues growing, the demand for ultra-efficient DNN accelerators has motivated research on non-traditional architectures with emerging technologies. Resistive Random-Access Memory (ReRAM) crossbar has been utilized to perform insitu matrix-vector multiplication of DNNs. DNN weight pruning techniques have also been applied to ReRAM-based mixed-signal DNN accelerators, focusing on reducing weight storage and accelerating computation. However, the existing works capture very few peripheral circuits features such as Analog to Digital converters (ADCs) during the neural network design. Unfortunately, ADCs have become the main part of power consumption and area cost of current mixed-signal accelerators, and the large overhead of these peripheral circuits is not solved efficiently. To address this problem, we propose a novel weight pruning framework for ReRAM-based mixed-signal DNN accelerators, named TINYADC, which effectively reduces the required bits for ADC resolution and hence the overall area and power consumption of the accelerator without introducing any computational inaccuracy. Compared to state-of-the-art pruning work on the ImageNet dataset, TINYADC achieves 3.5× and 2.9× power and area reduction, respectively. TINYADC framework optimizes the throughput of state-of-the-art architecture design by 29% and 40% in terms of the throughput per unit of millimeter square and watt (GOPs/s×mm2and GOPs/w), respectively.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Book Chapter•10.1007/978-3-031-20083-0_37

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

Zhenglun Kong, +10 more

- 01 Jan 2022

TL;DR: SPViT as mentioned in this paper proposes a soft token pruning framework, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin).

...read moreread less

81

•Proceedings Article•10.1109/ISCA52012.2021.00029

FORMS: fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator

Geng Yuan, +10 more

- 14 Jun 2021

TL;DR: For instance, the fine-grained ReRAM-based DNN accelerator with algorithm/hardware co-design as mentioned in this paper achieves 1.12× and 2.4 × speed up in terms of frame per second over optimized ISAAC with almost the same power/area cost.

...read moreread less

79

•Proceedings Article•10.1145/3579371.3589062

RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!

Joel Emer, +1 more

- 17 Apr 2023

TL;DR: AELLA as mentioned in this paper adapts the architecture to each DNN; it lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DL layer, and dynamically slicing inputs through speculation and recovery.

...read moreread less

19

Proceedings Article•10.18653/v1/2021.emnlp-main.606

A Secure and Efficient Federated Learning Framework for NLP

Chenghong Wang, +9 more

- 28 Jan 2022

TL;DR: SEFL is proposed, a secure and efficient federated learning framework for NLP that eliminates the need for the trusted entities; achieves similar and even better model accuracy compared with existing FL designs; and is resilient to client dropouts.

...read moreread less

16

•Proceedings Article•10.23919/date54114.2022.9774756

Enabling Fast Deep Learning on Tiny Energy-Harvesting IoT Devices

14 Mar 2022

TL;DR: In this article , the authors proposed a methodology that enables fast deep learning with low-energy accelerators for tiny energy harvesting devices, which employs block circulant matrix and structured pruning to achieve high compression for leveraging the advantage of various vector operation accelerators.

...read moreread less

10

...

Expand

References

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

Proceedings Article•10.18653/V1/N19-1423

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

24.6K

•Book

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Stephen Boyd, +4 more

- 23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

20.5K

Journal Article•10.1145/3007787.3001139

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

Ali Shafiee, +7 more

- 18 Jun 2016

TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.

...read moreread less

1.9K

Proceedings Article•10.1109/MICRO.2014.58

DaDianNao: A Machine-Learning Supercomputer

Yunji Chen, +10 more

- 13 Dec 2014

TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.

...read moreread less

1.7K