General-purpose code acceleration with limited-precision analog computation

doi:10.1145/2678373.2665746

Open AccessJournal Article10.1145/2678373.2665746

General-purpose code acceleration with limited-precision analog computation

Renee St. Amant, +7 more

- 14 Jun 2014

- Vol. 42, Iss: 3, pp 505-516

174

TL;DR: An algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to an “analog” neural model is utilized that enables general-purpose use of limited-precision, analog hardware to accelerate “approximable” code-code that can tolerate imprecise execution.

Abstract: As improvements in per-transistor speed and energy efficiency diminish, radical departures from conventional approaches are becoming critical to improving the performance and energy efficiency of general-purpose processors. We propose a solution--from circuit to compiler-that enables general-purpose use of limited-precision, analog hardwareto accelerate "approximable" code---code that can tolerate imprecise execution. We utilize an algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to an "analog" neural model. We outline the challenges of taking an analog approach, including restricted-range value encoding, limited precision in computation, circuit inaccuracies, noise, and constraints on supported topologies. We address these limitations with a combination of circuit techniques, a hardware/software interface, neuralnetwork training techniques, and compiler support. Analog neural acceleration provides whole application speedup of 3.7x and energy savings of 6.3x with quality loss less than 10% for all except one benchmark. These results show that using limited-precision analog circuits for code acceleration, through a neural approach, is both feasible and beneficial over a range of approximation-tolerant, emerging applications including financial analysis, signal processing, robotics, 3D gaming, compression, and image processing

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 8: Application error with limited bit-width analog neural computation.

Figure 1: Framework for using limited-precision analog computation to accelerate code written in conventional languages.

Figure 2: One neuron and its conceptual analog circuit.

Figure 5: A-NPU with 8 ANUs vs. D-NPU with 8 PEs.

Figure 4: Mixed-signal neural accelerator, A-NPU. Only four of the ANUs are shown. Each ANU processes eight 8-bit inputs.

Citations

Journal Article•10.1145/3007787.3001139

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

Ali Shafiee, +7 more

- 18 Jun 2016

TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.

...read moreread less

1.9K

Journal Article•10.1145/3007787.3001140

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

Ping Chi, +7 more

- 18 Jun 2016

TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.

...read moreread less

1.5K

•Journal Article•10.1145/2893356

A Survey of Techniques for Approximate Computing

Sparsh Mittal

- 18 Mar 2016

- ACM Computing Surveys

TL;DR: A survey of techniques for approximate computing (AC), which discusses strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units, processor components, memory technologies, and so forth, as well as programming frameworks for AC.

...read moreread less

1K

Journal Article•10.1109/MDAT.2015.2505723

Approximate Computing: A Survey

Qiang Xu, +2 more

- 01 Feb 2016

- IEEE Design & Test of Computers

TL;DR: This paper presents a survey of state-of-the-art work in all aspects of approximate computing and highlights future research challenges in this field.

...read moreread less

556

Journal Article•10.1145/3007787.3001164

RedEye: analog ConvNet image sensor architecture for continuous mobile vision

Robert LiKamWa, +4 more

- 18 Jun 2016

TL;DR: The design of RedEye is designed to mitigate analog design complexity, using a modular column-parallel design to promote physical design reuse and algorithmic cyclic reuse and programmable mechanisms to admit noise for tunable energy reduction.

...read moreread less

238

...

Expand

References

UCI Machine Learning Repository

A. Asuncion

- 01 Jan 2007

24.3K

Book Chapter•10.1016/B978-1-4832-1446-7.50035-2

Learning internal representations by error propagation

David E. Rumelhart, +2 more

- 01 Jan 1988

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.

...read moreread less

18.9K

•Book

Learning internal representations by error propagation

David E. Rumelhart, +2 more

- 03 Jan 1986

TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.

...read moreread less

16K

•Book

CMOS Analog Circuit Design

Phillip E Allen, +1 more

- 01 Jan 1987

TL;DR: In this article, the authors present a simple MOS LARGE-SIGNAL MODEL (SPICE Level 1) and a small-signal model for the MOS TRANSISTOR.

...read moreread less

3.6K

Proceedings Article•10.1145/1669112.1669172

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Sheng Li, +5 more

- 12 Dec 2009

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.

...read moreread less

2.6K

...

Expand

General-purpose code acceleration with limited-precision analog computation

Chat with Paper

AI Agents for this Paper

Figures

Citations

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

A Survey of Techniques for Approximate Computing

Approximate Computing: A Survey

RedEye: analog ConvNet image sensor architecture for continuous mobile vision

References

UCI Machine Learning Repository

Learning internal representations by error propagation

Learning internal representations by error propagation

CMOS Analog Circuit Design

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Related Papers (5)

Managing performance vs. accuracy trade-offs with loop perforation

Green: a framework for supporting energy-conscious programming using controlled approximation

Verifying quantitative reliability for programs that execute on unreliable hardware

SNNAP: Approximate computing on programmable SoCs via neural acceleration

Load Value Approximation