eDRAM

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1145/3007787.3001139•

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

[...]

Ali Shafiee¹, Anirban Nag¹, Naveen Muralimanohar², Rajeev Balasubramonian¹, John Paul Strachan², Miao Hu², R. Stanley Williams², Vivek Srikumar¹ - Show less +4 more•Institutions (2)

University of Utah¹, Hewlett-Packard²

18 Jun 2016

TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.

...read moreread less

Abstract: A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (dot-product) operations. A recent project, DaDianNao, adopts a near data processing approach, where a specialized neural functional unit performs all the digital arithmetic operations and receives input weights from adjacent eDRAM banks.This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner. While the use of crossbar memory as an analog dot-product engine is well known, no prior work has designed or characterized a full-fledged accelerator based on crossbars. In particular, our work makes the following contributions: (i) We design a pipelined architecture, with some crossbars dedicated for each neural network layer, and eDRAM buffers that aggregate data between pipeline stages. (ii) We define new data encoding techniques that are amenable to analog computations and that can reduce the high overheads of analog-to-digital conversion (ADC). (iii) We define the many supporting digital components required in an analog CNN accelerator and carry out a design space exploration to identify the best balance of memristor storage/compute, ADCs, and eDRAM storage on a chip. On a suite of CNN and DNN workloads, the proposed ISAAC architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density (respectively), relative to the state-of-the-art DaDianNao architecture.

...read moreread less

1,976 citations

Proceedings Article•10.1145/1815961.1815973•

Reducing cache power with low-cost, multi-bit error-correcting codes

[...]

Christopher B. Wilkerson¹, Alaa R. Alameldeen¹, Zeshan A. Chishti¹, Wei Wu¹, Dinesh Somasekhar¹, Shih-Lien Lu¹ - Show less +2 more•Institutions (1)

Intel¹

19 Jun 2010

TL;DR: The significant impact of variations on refresh time and cache power consumption for large eDRAM caches is shown and Hi-ECC, a technique that incorporates multi-bit error-correcting codes to significantly reduce refresh rate, is proposed.

...read moreread less

Abstract: Technology advancements have enabled the integration of large on-die embedded DRAM (eDRAM) caches. eDRAM is significantly denser than traditional SRAMs, but must be periodically refreshed to retain data. Like SRAM, eDRAM is susceptible to device variations, which play a role in determining refresh time for eDRAM cells. Refresh power potentially represents a large fraction of overall system power, particularly during low-power states when the CPU is idle. Future designs need to reduce cache power without incurring the high cost of flushing cache data when entering low-power states. In this paper, we show the significant impact of variations on refresh time and cache power consumption for large eDRAM caches. We propose Hi-ECC, a technique that incorporates multi-bit error-correcting codes to significantly reduce refresh rate. Multi-bit error-correcting codes usually have a complex decoder design and high storage cost. Hi-ECC avoids the decoder complexity by using strong ECC codes to identify and disable sections of the cache with multi-bit failures, while providing efficient single-bit error correction for the common case. Hi-ECC includes additional optimizations that allow us to amortize the storage cost of the code over large data words, providing the benefit of multi-bit correction at same storage cost as a single-bit error-correcting (SECDED) code (2% overhead). Our proposal achieves a 93% reduction in refresh power vs. a baseline eDRAM cache without error correcting capability, and a 66% reduction in refresh power vs. a system using SECDED codes.

...read moreread less

251 citations

Journal Article•10.1109/MM.2017.38•

Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake

[...]

Jack Doweck¹, Wen-fu Kao¹, Allen Kuan-yu Lu¹, Julius Mandelblat¹, Anirudha Rahatekar¹, Lihu Rappoport¹, Efraim Rotem¹, Ahmad Yasin¹, Adi Yoaz¹ - Show less +5 more•Institutions (1)

Intel¹

01 Mar 2017-IEEE Micro

TL;DR: The Intel Architecture core delivers higher power efficiency, higher frequency, and a wider dynamic power range, supporting smaller form factors, and offers a rich performance monitoring unit that enhances software developers' ability to optimize their applications.

...read moreread less

Abstract: Skylake's core, processor graphics, and system on chip were designed to meet a demanding set of requirements for a wide range of power-performance points. Its coherent fabric was designed to provide high-memory bandwidth from multiple memory sources. Skylake's power management, which includes Intel Speed Shift technology, was designed to provide the largest dynamic power range among prior Intel processors. The Intel Architecture core delivers higher power efficiency, higher frequency, and a wider dynamic power range, supporting smaller form factors. Skylake's Gen9 graphics provides new features designed to maximize energy efficiency and bring the best visual experience for gaming and media. Skylake offers a rich performance monitoring unit that enhances software developers' ability to optimize their applications.

...read moreread less

173 citations

Patent•

eDRAM hierarchical differential sense amp

[...]

Richard E. Matick¹, Stanley E. Schuster¹•Institutions (1)

IBM¹

5 Jan 2007

TL;DR: In this article, a hierarchical differential sensing approach is employed where an array of 1T DRAM cells are organized in rows and columns in which the rows represent words and the columns represent bits of the word, each bit column having more than one pair of balanced, true and complement local bit lines.

...read moreread less

Abstract: In an embodiment of the present invention, a hierarchical differential sensing approach is effectuated wherein an array of 1T DRAM cells are organized in rows and columns in which the rows represent words and the columns represent bits of the word, each bit column having more than one pair of balanced, true and complement local bit lines, the local bit lines being connected to a pair of balanced, true and complement global bit lines by way of CMOS transistor switches.

...read moreread less

138 citations

Proceedings Article•10.1109/ISSCC42613.2021.9365932•

16.2 eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded-Dynamic-Memory Array Realizing Adaptive Data Converters and Charge-Domain Computing

[...]

Shanshan Xie¹, Can Ni¹, Aseem Sayal¹, Pulkit Jain², Fatih Hamzaoglu², Jaydeep P. Kulkarni¹ - Show less +2 more•Institutions (2)

University of Texas at Austin¹, Intel²

13 Feb 2021

TL;DR: In this paper, the intrinsic charge sharing operation during a dynamic memory access can be used effectively to perform analog CIM computations: by reconfiguring existing eDRAM columns as charge domain circuits, thus, greatly minimizing peripheral circuit area and power overhead.

...read moreread less

Abstract: The unprecedented growth in deep neural networks (DNN) size has led to massive amounts of data movement from off-chip memory to on-chip processing cores in modern machine learning (ML) accelerators. Compute-in-memory (CIM) designs performing analog DNN computations within a memory array, along with peripheral mixed-signal circuits, are being explored to mitigate this memory-wall bottleneck: consisting of memory latency and energy overhead. Embedded-dynamic random-access memory (eDRAM) [1], [2], which integrates the 1T1C (T=Transistor, C=Capacitor) DRAM bitcell monolithically along with high-performance logic transistors and interconnects, can enable custom CIM designs. It offers the densest embedded bitcell, a low pJ/bit access energy, a low soft error rate, high-endurance, high-performance, and high-bandwidth: all desired attributes for ML accelerators. In addition, the intrinsic charge sharing operation during a dynamic memory access can be used effectively to perform analog CIM computations: by reconfiguring existing eDRAM columns as charge domain circuits, thus, greatly minimizing peripheral circuit area and power overhead. Configuring a part of eDRAM as a CIM engine (for data conversion, DNN computations, and weight storage) and retaining the remaining part as a regular memory (for inputs, gradients during training, and non-CIM workload data) can help to meet the layer/kernel dependent variable storage needs during a DNN inference/training step. Thus, the high cost/bit of eDRAM can be amortized by repurposing part of existing large capacity, level-4 eDRAM caches [7] in high-end microprocessors, into large-scale CIM engines.

...read moreread less

127 citations

...

Expand

Topic Tools

Papers published on a yearly basis

Papers

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

Reducing cache power with low-cost, multi-bit error-correcting codes

Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake

eDRAM hierarchical differential sense amp

16.2 eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded-Dynamic-Memory Array Realizing Adaptive Data Converters and Charge-Domain Computing

Related Topics (5)

Performance Metrics

No. of papers in the topic in previous years
Year	Papers
2021	16
2020	16
2019	18
2018	21
2017	24
2016	20