Journal Article10.1109/TVLSI.2019.2912941
DASM: Data-Streaming-Based Computing in Nonvolatile Memory Architecture for Embedded System
16
TL;DR: A data-streaming design for the NVM-based CIM (e.g., DASM), which achieves speedup compared to the NVIDIA Jetson TK1 embedded GPU board, Intel Xeon E5-2640 CPU, the state-of-the-art field-programmable gate array (FPGA) design, with much lower power consumption.
read more
Abstract: Emerging nonvolatile memories (NVMs), including resistive RAM (RRAM), phase-change memory (PCM), and magnetic RAM (MRAM), have opened up new pathways for Computing-In-Memory (CIM). Those NVM technologies can achieve energy-efficient computational operations with only minor modification of the peripheral circuits. Despite many advantages provided by computational NVMs, parallelism is not sufficiently explored in such CIM designs. To break through this limitation on performance gain, we propose a data-streaming design for the NVM-based CIM (e.g., DASM) by leveraging the underlying parallelism in the hardware. DASM benefits from the massive parallelism of data-streaming computing, reduction in data movement of the CIM, and the nonvolatility of memory arrays. Specifically, data streaming operations can be implemented with CIM bitwise operations in both read-out and write-in procedures. In addition, we use the multilevel power gating for the memory array and connections to further boost the performance. Finally, we study a case of inference process for the quantized deep-neural-network-based on the DASM design. DASM architecture achieves $47.8\times $ , $5.1\times $ , $2.1\times $ speedup compared to the NVIDIA Jetson TK1 embedded GPU board, Intel Xeon E5-2640 CPU, the state-of-the-art field-programmable gate array (FPGA) design, with much lower power consumption.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
PXNOR-BNN: In/With Spin-Orbit Torque MRAM Preset-XNOR Operation-Based Binary Neural Networks
TL;DR: An NVM-based CIM architecture employing a Preset-XNOR operation in/with the spin–orbit torque magnetic random access memory (SOT-MRAM) to accelerate the computation of BNNs (PXNOR-BNN) is proposed.
59
A survey of in-spin transfer torque MRAM computing
Hao Cai,Bo Liu,Juntong Chen,Lirida Alves de Barros Naviner,Yongliang Zhou,Zhen Wang,Jun Yang +6 more
TL;DR: This study reviews state-of-the-art techniques for managing IMC with an emphasis on spin-transfer torque-MRAM computing via design schemes at the bit-cell, circuit, and system levels and demonstrates the existing limitations of in- MRAM computing and potential methods for overcoming these issues.
27
Energy-efficient computing-in-memory architecture for AI processor: device, circuit, architecture perspective
Liang Chang,Chenglong Li,Zhang Zhaomin,Jianbiao Xiao,Qingsong Liu,Zhen Zhu,Weihang Li,Zixuan Zhu,Siqi Yang,Jun Zhou +9 more
TL;DR: In this article, the authors analyze the requirement of AI algorithms on the data movement and low power requirement of the AI processors and present several novel solutions beyond traditional analog-digital mixed static random access memory (SRAM)-based CIM architecture.
16
Fault-Tolerant Neuromorphic Computing Systems
Arjun Chaudhuri,Mengyun Liu,Krishnendu Chakrabarty +2 more
- 01 Nov 2019
TL;DR: A survey of research on fault modeling, test generation methodologies, and fault-tolerant design of neuromorphic computing systems based on RRAM and MRAM is presented.
12
SpinLiM: Spin Orbit Torque Memory for Ternary Neural Networks Based on the Logic-in-Memory Architecture
Lichuan Luo,He Zhang,Jinyu Bai,Youguang Zhang,Wang Kang,Weisheng Zhao +5 more
- 01 Feb 2021
TL;DR: In this article, two magnetic tunnel junctions (MTJ) are driven by the interplay of field-free spin orbit torque (SOT) and spin transfer torque (STT) effects to achieve a novel state-of-the-art paradigm for ternary multiplication operations.
10
References
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
The missing memristor found
TL;DR: It is shown, using a simple analytical example, that memristance arises naturally in nanoscale systems in which solid-state electronic and ionic transport are coupled under an external bias voltage.
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Mohammad Rastegari,Vicente Ordonez,Joseph Redmon,Ali Farhadi,Ali Farhadi +4 more
- 08 Oct 2016
TL;DR: The Binary-Weight-Network version of AlexNet is compared with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy.
•Posted Content
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
TL;DR: A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.
2.8K