Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

doi:10.1109/JETCAS.2019.2910232

Open AccessJournal Article10.1109/JETCAS.2019.2910232

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

Yu-Hsin Chen, +3 more

- 11 Apr 2019

- IEEE Journal on Emerging and Selected To...

- Vol. 9, Iss: 2, pp 292-308

855

TL;DR: Eyeriss v2 as mentioned in this paper is a DNN accelerator architecture designed for running compact and sparse DNNs, which can process sparse data directly in the compressed domain for both weights and activations and therefore is able to improve both processing speed and energy efficiency with sparse models.

Abstract: A recent trend in deep neural network (DNN) development is to extend the reach of deep learning applications to platforms that are more resource and energy-constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity . These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes and often require specialized hardware to exploit sparsity for performance improvement. Therefore, many DNN accelerators designed for large DNNs do not perform well on these models. In this paper, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65-nm CMOS process achieves a throughput of 1470.6 inferences/s and 2560.3 inferences/J at a batch size of 1, which is $12.6\times $ faster and $2.5\times $ more energy-efficient than the original Eyeriss running MobileNet.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1007/S00034-020-01632-2

A New Low Power Schema for Stream Processors Front-End with Power-Aware DA-Based FIR Filters by Investigation of Image Transitions Sparsity

Seyedeh Fatemeh Ghamkhari, +1 more

- 28 Jan 2021

- Circuits Systems and Signal Processing

TL;DR: A new gated flip-flop is designed and utilized in shift register arrays, to decrease power consumption and investigation of statistical properties of input in image processing applications, utilization of implicit clock gating, and multi-vdd techniques are three main approaches used to increase energy efficiency.

...read moreread less

1

Proceedings Article•10.1109/ICCE-TAIWAN49838.2020.9258103

An Efficient Accelerator for Deep Convolutional Neural Networks

Yi-Xian Kuo, +1 more

- 28 Sep 2020

TL;DR: In this paper, after convolution, maximum pooling is performed to reduce the bandwidth and the biggest feature of this article is that PE can perform 1.78 MAC operations in one clock cycle.

...read moreread less

1

•Posted Content•10.36227/techrxiv.21837027

Brain Inspired Computing: A Systematic Survey and Future Trends

19 Jan 2023

TL;DR: In this article , the authors present a comprehensive survey of Brain Inspired Computing (BIC) and summarize four components of BIC infrastructure development: 1) modeling/algorithm; 2) hardware platform; 3) software tool; 4) benchmark data.

...read moreread less

1

Proceedings Article•10.1109/ICCS51219.2020.9336599

Design of High Performance RNN Accelerator Based on Network Compression

Wentao Zhu, +5 more

- 10 Dec 2020

TL;DR: In this paper, the authors proposed a recurrent neural network accelerator which can reduce computation redundancy, memory overhead and energy consumption, and a novel network compression method based on pruning and hybgrid quantization is also proposed to reduce computation and memory overhead.

...read moreread less

1

Proceedings Article•10.1109/asicon58565.2023.10396527

A Domain-Specific DMA Structure for Per-channel Processing-based CNN Accelerator

Yi Chen, +5 more

- 24 Oct 2023

TL;DR: A per-channel processing-based CNN accelerator and a domain-specific DMA structure are proposed for the edge computing scenario, which allows preloading the parameters required for the next round while the CNN related operation is performing under the current round.

...read moreread less

1

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Proceedings Article•10.1109/CVPR.2015.7298594

Going deeper with convolutions

Christian Szegedy, +8 more

- 07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

56.6K

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K