Architecting Decentralization and Customizability in DNN Accelerators for Hardware Defect Adaptation

doi:10.1109/tcad.2022.3197540

Journal Article10.1109/tcad.2022.3197540

Architecting Decentralization and Customizability in DNN Accelerators for Hardware Defect Adaptation

01 Nov 2022

- IEEE Transactions on Computer-Aided Desi...

- Vol. 41, Iss: 11, pp 3934-3945

9

TL;DR: In this article , a one-time training of DNNs with Hardware-Aware Dropout/Dropconnect techniques boosts model decentralization and facilitates accurate neural network inference in the degraded computational fabrics.

Abstract: The efficiency of machine intelligence techniques has improved noticeably in the embedded application domains thanks to the dedicated hardware accelerators for deep neural networks (DNNs). Despite the economic criticality of yield and reliability problems in advanced semiconductor nodes, these concerns have attracted limited attention in the context of embedded machine intelligence devices. The micro-architectural features of deep learning accelerators, when paired with the algorithmic characteristics of DNNs, unlock novel opportunities to tackle semiconductor reliability problems in embedded deep learning devices. While the fine-grained bypassing of the faulty processing elements reins the computational impact of hardware defects, a one-time training of DNNs with Hardware-Aware Dropout/Dropconnect techniques boosts model decentralization and facilitates accurate neural network inference in the degraded computational fabrics. Furthermore, on-device calibration methods can improve resilience even further without necessitating expensive defect compensation methods such as device-specific training. Our work confirms the potential for improving the yield, reliability, and operational lifetime of embedded machine intelligence devices through a highly practical co-design of DNNs and configurable hardware architectures.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1109/vlsi-soc57769.2023.10321881

Analyzing the Impact of Different Real Number Formats on the Structural Reliability of TCUs in GPUs

Robert Limas Sierra, +3 more

- 16 Oct 2023

TL;DR: This work for the first time quantitatively evaluates the effects of hardware faults arising in TCU structures when using two different formats for real number representation, and demonstrates that the Posit formats are less affected by faults than Floating-Point formats by up to one order of magnitude.

...read moreread less

7

Journal Article•10.3390/electronics13030578

Exploring Hardware Fault Impacts on Different Real Number Representations of the Structural Resilience of TCUs in GPUs

Robert Limas Sierra, +3 more

- 31 Jan 2024

- Electronics

TL;DR: The posit format of TCUs is less affected by faults than the floating-point format.

...read moreread less

6

Journal Article•10.1109/tcad.2023.3335144

Error Resilience in Deep Neural Networks Using Neuron Gradient Statistics

C Amarnath, +3 more

- IEEE Transactions on Computer-Aided Desi...

TL;DR: A novel error resilience approach for DNNs that diagnoses and suppresses erroneous neuron outputs without DNN retraining. Error diagnosis is based on the statistics of gradients of neuron output values relative to adjacent neurons.

...read moreread less

3

Journal Article•10.1109/vts60656.2024.10538938

Evaluating the Reliability of Supervised Compression for Split Computing

Juan-David Guerrero-Balaguera, +3 more

- 22 Apr 2024

1

Journal Article•10.1109/iolts59296.2023.10224892

A Novel Approach to Error Resilience in Online Reinforcement Learning

C Amarnath, +1 more

- 03 Jul 2023

TL;DR: A novel error resilience approach for online RL that makes use of running statistics collected across the (real-time) RL training process to configure error detection thresholds without the need to access a reference training dataset is presented.

...read moreread less

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

•Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014

- Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

43.7K

•Dissertation

Learning Multiple Layers of Features from Tiny Images

Alex Krizhevsky

- 01 Jan 2009

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

23.7K

...

Expand

Architecting Decentralization and Customizability in DNN Accelerators for Hardware Defect Adaptation

Chat with Paper

AI Agents for this Paper

Citations

Analyzing the Impact of Different Real Number Formats on the Structural Reliability of TCUs in GPUs

Exploring Hardware Fault Impacts on Different Real Number Representations of the Structural Resilience of TCUs in GPUs

Error Resilience in Deep Neural Networks Using Neuron Gradient Statistics

Evaluating the Reliability of Supervised Compression for Split Computing

A Novel Approach to Error Resilience in Online Reinforcement Learning

References

Deep Residual Learning for Image Recognition

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Gradient-based learning applied to document recognition

Dropout: a simple way to prevent neural networks from overfitting

Learning Multiple Layers of Features from Tiny Images

Related Papers (5)

Breast cancer detection from mammograms using artificial intelligence

Deep Super Learner: A Deep Ensemble for Classification Problems

Deep Super Learner: A Deep Ensemble for Classification Problems

Towards Bayesian Deep Learning: A Framework and Some Existing Methods

Usage of deep learning in recent applications