Proceedings Article10.23919/DATE.2019.8715265
CORN: In-Buffer Computing for Binary Neural Network
Liang Chang,Xin Ma,Zhaohao Wang,Youguang Zhang,Weisheng Zhao,Yuan Xie +5 more
- 25 Mar 2019
- pp 384-389
17
TL;DR: A BNN computing accelerator, namely CORN, which consists of a Spin-Orbit-Torque Magnetic RAM based data buffer to perform the majority operation (to replace the pop-count process) with the SOT-MRAM-based IMC to accelerate the computing of BNNs.
read more
Abstract: Binary Neural Networks (BNNs) have obtained great attention since they reduce memory usage and power consumption as well as achieve a satisfying recognition accuracy on Image Classification. In particular to the computation of BNNs, the multiply-accumulate operations of convolution-layer are replaced with the bit-wise operations (XNOR and pop-count). Such bit-wise operations are well suited for the hardware accelerator such as in-memory computing (IMC). However, an additional digital processing unit (DPU) is required for the pop-count operation, which induces considerable data movement between the Process Engines (PEs) and data buffers reducing the efficiency of the IMC. In this paper, we present a BNN computing accelerator, namely CORN, which consists of a Spin-Orbit-Torque Magnetic RAM (SOT-MRAM) based data buffer to perform the majority operation (to replace the pop-count process) with the SOT-MRAM-based IMC to accelerate the computing of BNNs. CORN can naturally implement the XNOR operation in the NVM memory array, and feed results to the computing data buffer for the majority write operation. Such a design removes the pop-counter implemented by the DPU and reduces data movement between the data buffer and the memory array. Based on the evaluation results, CORN achieves 61% and 14% power saving with 1.74× and 2.12× speedup, compared to the FPGA and DPU based IMC architecture, respectively.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures
Hasan Genc,Ameer Haj-Ali,Vighnesh Iyer,Alon Amid,Howard Mao,John Wright,Colin Schmidt,Jerry Zhao,Albert Ou,Max Banister,Yakun Sophia Shao,Borivoje Nikolic,Ion Stoica,Krste Asanovic +13 more
- 22 Nov 2019
TL;DR: Gemmini is presented -- an open source and agile systolic array generator enabling systematic evaluations of deep-learning architectures and achieves two to three orders of magnitude speedup in deep neural network inference compared to the baseline execution on a host processor.
82
PXNOR-BNN: In/With Spin-Orbit Torque MRAM Preset-XNOR Operation-Based Binary Neural Networks
TL;DR: An NVM-based CIM architecture employing a Preset-XNOR operation in/with the spin–orbit torque magnetic random access memory (SOT-MRAM) to accelerate the computation of BNNs (PXNOR-BNN) is proposed.
59
A survey of in-spin transfer torque MRAM computing
Hao Cai,Bo Liu,Juntong Chen,Lirida Alves de Barros Naviner,Yongliang Zhou,Zhen Wang,Jun Yang +6 more
TL;DR: This study reviews state-of-the-art techniques for managing IMC with an emphasis on spin-transfer torque-MRAM computing via design schemes at the bit-cell, circuit, and system levels and demonstrates the existing limitations of in- MRAM computing and potential methods for overcoming these issues.
27
DASM: Data-Streaming-Based Computing in Nonvolatile Memory Architecture for Embedded System
TL;DR: A data-streaming design for the NVM-based CIM (e.g., DASM), which achieves speedup compared to the NVIDIA Jetson TK1 embedded GPU board, Intel Xeon E5-2640 CPU, the state-of-the-art field-programmable gate array (FPGA) design, with much lower power consumption.
17
Energy-efficient computing-in-memory architecture for AI processor: device, circuit, architecture perspective
Liang Chang,Chenglong Li,Zhang Zhaomin,Jianbiao Xiao,Qingsong Liu,Zhen Zhu,Weihang Li,Zixuan Zhu,Siqi Yang,Jun Zhou +9 more
TL;DR: In this article, the authors analyze the requirement of AI algorithms on the data movement and low power requirement of the AI processors and present several novel solutions beyond traditional analog-digital mixed static random access memory (SRAM)-based CIM architecture.
16
References
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Mohammad Rastegari,Vicente Ordonez,Joseph Redmon,Ali Farhadi,Ali Farhadi +4 more
- 08 Oct 2016
TL;DR: The Binary-Weight-Network version of AlexNet is compared with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy.
•Posted Content
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
TL;DR: A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.
2.8K
NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory
TL;DR: NVSim is developed, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash and is expected to help boost architecture-level NVM-related studies.
1.3K