Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

doi:10.1109/JETCAS.2019.2910232

Open AccessJournal Article10.1109/JETCAS.2019.2910232

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

Yu-Hsin Chen, +3 more

- 11 Apr 2019

- IEEE Journal on Emerging and Selected To...

- Vol. 9, Iss: 2, pp 292-308

855

TL;DR: Eyeriss v2 as mentioned in this paper is a DNN accelerator architecture designed for running compact and sparse DNNs, which can process sparse data directly in the compressed domain for both weights and activations and therefore is able to improve both processing speed and energy efficiency with sparse models.

Abstract: A recent trend in deep neural network (DNN) development is to extend the reach of deep learning applications to platforms that are more resource and energy-constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity . These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes and often require specialized hardware to exploit sparsity for performance improvement. Therefore, many DNN accelerators designed for large DNNs do not perform well on these models. In this paper, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65-nm CMOS process achieves a throughput of 1470.6 inferences/s and 2560.3 inferences/J at a batch size of 1, which is $12.6\times $ faster and $2.5\times $ more energy-efficient than the original Eyeriss running MobileNet.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.23919/DATE51398.2021.9474145

HeSA: Heterogeneous Systolic Array Architecture for Compact CNNs Hardware Accelerators

Rui Xu, +3 more

- 01 Feb 2021

TL;DR: In this paper, the authors proposed a heterogeneous systolic array (HeSA) architecture, which introduces heterogeneous processing elements that support multiple modes of dataflow, which can further exploit the reuse data chance of depthwise convolutional layers.

...read moreread less

10

•Posted Content•10.20944/PREPRINTS202107.0375.V1

Low-power ultra-small Edge AI Accelerators for Image Recognition with Convolution Neural Networks: Analysis and Future Directions

Weison Lin, +2 more

- 16 Jul 2021

TL;DR: This work lists the three key features in the specifications such as computation ability, power consumption, and the area size of prior art edgeAI accelerators and the CGRA accelerators during the past few years to define and evaluate the low power ultra-small edge AI accelerators.

...read moreread less

10

Proceedings Article•10.1109/ICC42927.2021.9500304

Optimal Transport for UAV D2D Distributed Learning: Example using Federated Learning

Sherif B. Azmy, +4 more

- 14 Jun 2021

TL;DR: In this article, the authors propose the use of Device-to-Device (D2D) communication to fairly distribute the data so far collected by UAVs with different capabilities by posing it as an optimal transport problem.

...read moreread less

10

•Proceedings Article•10.1109/ASAP52443.2021.00022

How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

Stylianos I. Venieris, +3 more

- 07 Jul 2021

TL;DR: In this article, the authors provide preliminary answers to this potentially game-changing question by presenting an array of design techniques for efficient AI systems, which span model-, system-and hardware-level techniques, and their combination.

...read moreread less

10

Review•10.1145/3661820

A Review on the emerging technology of TinyML

Vasileios Tsoukas, +3 more

- 30 Apr 2024

- ACM Computing Surveys

TL;DR: TinyML is an emerging technology for developing autonomous and secure devices with local AI capabilities. It aims to democratize AI and contribute to the digital revolution of intelligent devices. The work reviews optimization techniques, development boards, software, applications, and future directions.

...read moreread less

10

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Proceedings Article•10.1109/CVPR.2015.7298594

Going deeper with convolutions

Christian Szegedy, +8 more

- 07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

56.6K

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K