Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

doi:10.1109/JETCAS.2019.2910232

Open AccessJournal Article10.1109/JETCAS.2019.2910232

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

Yu-Hsin Chen, +3 more

- 11 Apr 2019

- IEEE Journal on Emerging and Selected To...

- Vol. 9, Iss: 2, pp 292-308

855

TL;DR: Eyeriss v2 as mentioned in this paper is a DNN accelerator architecture designed for running compact and sparse DNNs, which can process sparse data directly in the compressed domain for both weights and activations and therefore is able to improve both processing speed and energy efficiency with sparse models.

Abstract: A recent trend in deep neural network (DNN) development is to extend the reach of deep learning applications to platforms that are more resource and energy-constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity . These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes and often require specialized hardware to exploit sparsity for performance improvement. Therefore, many DNN accelerators designed for large DNNs do not perform well on these models. In this paper, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65-nm CMOS process achieves a throughput of 1470.6 inferences/s and 2560.3 inferences/J at a batch size of 1, which is $12.6\times $ faster and $2.5\times $ more energy-efficient than the original Eyeriss running MobileNet.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1109/hpca53966.2022.00069

CANDLES: Channel-Aware Novel Dataflow-Microarchitecture Co-Design for Low Energy Sparse Neural Network Acceleration

01 Apr 2022

TL;DR: CANDLES as mentioned in this paper adopts a Pixel-first compression and channel-first dataflow to reduce the energy consumption of the index metadata and improves the energy efficiency of DNN accelerators.

...read moreread less

14

•Journal Article•10.3389/FNINS.2021.615279

End-to-End Implementation of Various Hybrid Neural Networks on a Cross-Paradigm Neuromorphic Chip.

Guanrui Wang, +5 more

- 02 Feb 2021

- Frontiers in Neuroscience

TL;DR: Wang et al. as mentioned in this paper proposed an end-to-end mapping framework for implementing various hybrid neural networks on many-core neuromorphic architectures based on the cross-paradigm Tianjic chip.

...read moreread less

13

Journal Article•10.1109/TCSII.2020.3038897

A Memory-Efficient CNN Accelerator Using Segmented Logarithmic Quantization and Multi-Cluster Architecture

Jiawei Xu, +6 more

- 01 Jun 2021

- IEEE Transactions on Circuits and System...

TL;DR: A segmented logarithmic (SegLog) quantization method is exploited to mitigate the on-chip memory and bandwidth requirements, thus accommodating more processing elements (PEs) in a given chip area to organize a reconfigurable multi-cluster architecture.

...read moreread less

13

•Book Chapter•10.1007/978-3-030-63393-6_23

Truly Heterogeneous HPC: Co-design to Achieve What Science Needs from HPC

Suma Cardwell, +8 more

- 26 Aug 2020

TL;DR: In this paper, the authors explore the example of mapping the connectome of the brain to illustrate the advantages of using a heterogeneous system that incorporates neuromorphic hardware, which is such an emerging technology which would interest the HPC community, due to its potential for implementing large-scale calculations with an extremely low power footprint.

...read moreread less

13

•Journal Article•10.1109/tmc.2022.3157957

KeepEdge: A Knowledge Distillation Empowered Edge Intelligence Framework for Visual Assisted Positioning in UAV Delivery

01 Aug 2023

- IEEE Transactions on Mobile Computing

TL;DR: In this paper , a knowledge distillation empowered edge intelligence architecture, KeepEdge, is presented to achieve visual information-assisted positioning for the UAV delivery services, which integrates deep neural networks (DNN) into an edge computing framework to enable edge intelligence which empowers UAVs to autonomously identify the expected delivery position.

...read moreread less

13

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Proceedings Article•10.1109/CVPR.2015.7298594

Going deeper with convolutions

Christian Szegedy, +8 more

- 07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

56.6K

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K