General-purpose code acceleration with limited-precision analog computation
Renee St. Amant,Amir Yazdanbakhsh,Jongse Park,Bradley Thwaites,Hadi Esmaeilzadeh,Arjang Hassibi,Luis Ceze,Doug Burger +7 more
- 14 Jun 2014
- Vol. 42, Iss: 3, pp 505-516
TL;DR: An algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to an “analog” neural model is utilized that enables general-purpose use of limited-precision, analog hardware to accelerate “approximable” code-code that can tolerate imprecise execution.
read more
Abstract: As improvements in per-transistor speed and energy efficiency diminish, radical departures from conventional approaches are becoming critical to improving the performance and energy efficiency of general-purpose processors. We propose a solution--from circuit to compiler-that enables general-purpose use of limited-precision, analog hardwareto accelerate "approximable" code---code that can tolerate imprecise execution. We utilize an algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to an "analog" neural model. We outline the challenges of taking an analog approach, including restricted-range value encoding, limited precision in computation, circuit inaccuracies, noise, and constraints on supported topologies. We address these limitations with a combination of circuit techniques, a hardware/software interface, neuralnetwork training techniques, and compiler support. Analog neural acceleration provides whole application speedup of 3.7x and energy savings of 6.3x with quality loss less than 10% for all except one benchmark. These results show that using limited-precision analog circuits for code acceleration, through a neural approach, is both feasible and beneficial over a range of approximation-tolerant, emerging applications including financial analysis, signal processing, robotics, 3D gaming, compression, and image processing
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 8: Application error with limited bit-width analog neural computation. 
Figure 1: Framework for using limited-precision analog computation to accelerate code written in conventional languages. 
Figure 2: One neuron and its conceptual analog circuit. 
Figure 5: A-NPU with 8 ANUs vs. D-NPU with 8 PEs. 
Figure 4: Mixed-signal neural accelerator, A-NPU. Only four of the ANUs are shown. Each ANU processes eight 8-bit inputs. 
Figure 3: A single analog neuron (ANU).
Citations
ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars
Ali Shafiee,Anirban Nag,Naveen Muralimanohar,Rajeev Balasubramonian,John Paul Strachan,Miao Hu,R. Stanley Williams,Vivek Srikumar +7 more
- 18 Jun 2016
TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.
1.9K
PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory
Ping Chi,Shuangchen Li,Cong Xu,Tao Zhang,Jishen Zhao,Yongpan Liu,Yu Wang,Yuan Xie +7 more
- 18 Jun 2016
TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.
1.5K
A Survey of Techniques for Approximate Computing
TL;DR: A survey of techniques for approximate computing (AC), which discusses strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units, processor components, memory technologies, and so forth, as well as programming frameworks for AC.
Approximate Computing: A Survey
TL;DR: This paper presents a survey of state-of-the-art work in all aspects of approximate computing and highlights future research challenges in this field.
556
RedEye: analog ConvNet image sensor architecture for continuous mobile vision
Robert LiKamWa,Yunhui Hou,Julian Gao,Mia Polansky,Lin Zhong +4 more
- 18 Jun 2016
TL;DR: The design of RedEye is designed to mitigate analog design complexity, using a modular column-parallel design to promote physical design reuse and algorithmic cyclic reuse and programmable mechanisms to admit noise for tunable energy reduction.
238
References
Learning internal representations by error propagation
David E. Rumelhart,Geoffrey E. Hinton,Ronald J. Williams +2 more
- 01 Jan 1988
TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
•Book
Learning internal representations by error propagation
David E. Rumelhart,Geoffrey E. Hinton,Ronald J. Williams +2 more
- 03 Jan 1986
TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.
16K
•Book
CMOS Analog Circuit Design
Phillip E Allen,douglas R Holberg +1 more
- 01 Jan 1987
TL;DR: In this article, the authors present a simple MOS LARGE-SIGNAL MODEL (SPICE Level 1) and a small-signal model for the MOS TRANSISTOR.
3.6K
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures
Sheng Li,Jung Ho Ahn,Richard Strong,Jay B. Brockman,Dean M. Tullsen,Norman P. Jouppi +5 more
- 12 Dec 2009
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.