Proceedings Article10.23919/DATE51398.2021.9474235
TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators
Geng Yuan,Payman Behnam,Yuxuan Cai,Ali Shafiee,Jingyan Fu,Zhiheng Liao,Zhengang Li,Xiaolong Ma,Jieren Deng,Jinhui Wang,Mahdi Nazm Bojnordi,Yanzhi Wang,Caiwen Ding +12 more
- 01 Feb 2021
- pp 926-931
33
TL;DR: In this article, the authors proposed a weight pruning framework for ReRAM-based mixed-signal DNN accelerators, which effectively reduces the required bits for ADC resolution and hence the overall area and power consumption of the accelerator without introducing any computational inaccuracy.
read more
Abstract: As the number of weight parameters in deep neural networks (DNNs) continues growing, the demand for ultra-efficient DNN accelerators has motivated research on non-traditional architectures with emerging technologies. Resistive Random-Access Memory (ReRAM) crossbar has been utilized to perform insitu matrix-vector multiplication of DNNs. DNN weight pruning techniques have also been applied to ReRAM-based mixed-signal DNN accelerators, focusing on reducing weight storage and accelerating computation. However, the existing works capture very few peripheral circuits features such as Analog to Digital converters (ADCs) during the neural network design. Unfortunately, ADCs have become the main part of power consumption and area cost of current mixed-signal accelerators, and the large overhead of these peripheral circuits is not solved efficiently. To address this problem, we propose a novel weight pruning framework for ReRAM-based mixed-signal DNN accelerators, named TINYADC, which effectively reduces the required bits for ADC resolution and hence the overall area and power consumption of the accelerator without introducing any computational inaccuracy. Compared to state-of-the-art pruning work on the ImageNet dataset, TINYADC achieves 3.5× and 2.9× power and area reduction, respectively. TINYADC framework optimizes the throughput of state-of-the-art architecture design by 29% and 40% in terms of the throughput per unit of millimeter square and watt (GOPs/s×mm2and GOPs/w), respectively.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning
Zhenglun Kong,Peiyan Dong,Xiaolong Ma,Xin Meng,Wei Niu,Xuan Shen,Geng Yuan,Bin Ren,Hao Tang,Mingfu Qin,Yanzhi Wang +10 more
- 01 Jan 2022
TL;DR: SPViT as mentioned in this paper proposes a soft token pruning framework, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin).
81
FORMS: fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator
Geng Yuan,Payman Behnam,Zhengang Li,Ali Shafiee,Sheng Lin,Xiaolong Ma,Hang Liu,Xuehai Qian,Mahdi Nazm Bojnordi,Yanzhi Wang,Caiwen Ding +10 more
- 14 Jun 2021
TL;DR: For instance, the fine-grained ReRAM-based DNN accelerator with algorithm/hardware co-design as mentioned in this paper achieves 1.12× and 2.4 × speed up in terms of frame per second over optimized ISAAC with almost the same power/area cost.
79
RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!
Joel Emer,Vivienne Sze +1 more
- 17 Apr 2023
TL;DR: AELLA as mentioned in this paper adapts the architecture to each DNN; it lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DL layer, and dynamically slicing inputs through speculation and recovery.
A Secure and Efficient Federated Learning Framework for NLP
Chenghong Wang,Jieren Deng,Xian Yong Meng,Yijue Wang,Ji Li,Sheng Lin,Shuo Han,Fei Miao,Sanguthevar Rajasekaran,Caiwen Ding +9 more
- 28 Jan 2022
TL;DR: SEFL is proposed, a secure and efficient federated learning framework for NLP that eliminates the need for the trusted entities; achieves similar and even better model accuracy compared with existing FL designs; and is resilient to client dropouts.
16
Enabling Fast Deep Learning on Tiny Energy-Harvesting IoT Devices
14 Mar 2022
TL;DR: In this article , the authors proposed a methodology that enables fast deep learning with low-energy accelerators for tiny energy harvesting devices, which employs block circulant matrix and structured pruning to achieve high compression for leveraging the advantage of various vector operation accelerators.
References
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova +3 more
- 11 Oct 2018
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
24.6K
•Book
Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers
Stephen Boyd,Neal Parikh,Eric Chu,Borja Peleato,Jonathan Eckstein +4 more
- 23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars
Ali Shafiee,Anirban Nag,Naveen Muralimanohar,Rajeev Balasubramonian,John Paul Strachan,Miao Hu,R. Stanley Williams,Vivek Srikumar +7 more
- 18 Jun 2016
TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.
1.9K
DaDianNao: A Machine-Learning Supercomputer
Yunji Chen,Luo Tao,Liu Shaoli,Zhang Shijin,Liqiang He,Jia Wang,Ling Li,Tianshi Chen,Zhiwei Xu,Ninghui Sun,Olivier Temam +10 more
- 13 Dec 2014
TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
1.7K