Extended memory

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Journal Article•10.1145/3007787.3001140•

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

[...]

Ping Chi¹, Shuangchen Li¹, Cong Xu², Tao Zhang³, Jishen Zhao¹, Yongpan Liu⁴, Yu Wang⁴, Yuan Xie¹ - Show less +4 more•Institutions (4)

University of California¹, Hewlett-Packard², Nvidia³, Tsinghua University⁴

18 Jun 2016

TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.

...read moreread less

Abstract: Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. Moreover, with its crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and has been widely studied to accelerate neural network (NN) applications. In this work, we propose a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory. In PRIME, a portion of ReRAM crossbar arrays can be configured as accelerators for NN applications or as normal memory for a larger memory space. We provide microarchitecture and circuit designs to enable the morphable functions with an insignificant area overhead. We also design a software/hardware interface for software developers to implement various NNs on PRIME. Benefiting from both the PIM architecture and the efficiency of using ReRAM for NN computation, PRIME distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance by ~2360× and the energy consumption by ~895×, across the evaluated machine learning benchmarks.

...read moreread less

1,500 citations

Proceedings Article•10.1145/339647.339668•

Memory access scheduling

[...]

Scott Rixner¹, William J. Dally², Ujval J. Kapasi², Peter Mattson², John D. Owens² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Stanford University²

1 May 2000

TL;DR: This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.

...read moreread less

Abstract: The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the “3-D” structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference in bandwidth between successive references to different columns within a row and different rows within a bank. This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Conservative reordering, in which the first ready reference in a sequence is performed, improves bandwidth by 40% for traces from five media benchmarks. Aggressive reordering, in which operations are scheduled to optimize memory bandwidth, improves bandwidth by 93% for the same set of applications. Memory access scheduling is particularly important for media processors where it enables the processor to make the most efficient use of scarce memory bandwidth.

...read moreread less

1,106 citations

Journal Article•10.1109/2.485843•

TreadMarks: shared memory computing on networks of workstations

[...]

Cristiana Amza¹, Alan L. Cox¹, Sandhya Dwarkadas¹, P. Keleher¹, Honghui Lu¹, Ramakrishnan Rajamony¹, Weimin Yu¹, Willy Zwaenepoel¹ - Show less +4 more•Institutions (1)

Rice University¹

01 Feb 1996-IEEE Computer

TL;DR: This work discusses the experience with parallel computing on networks of workstations using the TreadMarks distributed shared memory system, which allows processes to assume a globally shared virtual memory even though they execute on nodes that do not physically share memory.

...read moreread less

Abstract: Shared memory facilitates the transition from sequential to parallel processing. Since most data structures can be retained, simply adding synchronization achieves correct, efficient programs for many applications. We discuss our experience with parallel computing on networks of workstations using the TreadMarks distributed shared memory system. DSM allows processes to assume a globally shared virtual memory even though they execute on nodes that do not physically share memory. We illustrate a DSM system consisting of N networked workstations, each with its own memory. The DSM software provides the abstraction of a globally shared memory, in which each processor can access any data item without the programmer having to worry about where the data is or how to obtain its value.

...read moreread less

951 citations

Patent•

Selective operation of a multi-state non-volatile memory system in a binary mode

[...]

Jian Chen¹•Institutions (1)

SanDisk¹

13 Sep 2002

TL;DR: In this article, a flash nonvolatile memory system that normally operates its memory cells in multiple storage states is provided with the ability to operate some selected or all of its memory cell blocks in two states instead.

...read moreread less

Abstract: A flash non-volatile memory system that normally operates its memory cells in multiple storage states is provided with the ability to operate some selected or all of its memory cell blocks in two states instead. The two states are selected to be the furthest separated of the multiple states, thereby providing an increased margin during two state operation. This allows faster programming and a longer operational life of the memory cells being operated in two states when it is more desirable to have these advantages than the increased density of data storage that multi-state operation provides.

...read moreread less

703 citations

Journal Article•10.1109/TPDS.2004.8•

Hazard pointers: safe memory reclamation for lock-free objects

[...]

Maged M. Michael¹•Institutions (1)

IBM¹

01 Jun 2004-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Hazard pointers is presented, a memory management methodology that allows memory reclamation for arbitrary reuse and offers a lock-free solution for the ABA problem using only practical single-word instructions and guaranteeing continuous progress and availability, even in the presence of thread failures and arbitrary delays.

...read moreread less

Abstract: Lock-free objects offer significant performance and reliability advantages over conventional lock-based objects. However, the lack of an efficient portable lock-free method for the reclamation of the memory occupied by dynamic nodes removed from such objects is a major obstacle to their wide use in practice. We present hazard pointers, a memory management methodology that allows memory reclamation for arbitrary reuse. It is very efficient, as demonstrated by our experimental results. It is suitable for user-level applications - as well as system programs - without dependence on special kernel or scheduler support. It is wait-free. It requires only single-word reads and writes for memory access in its core operations. It allows reclaimed memory to be returned to the operating system. In addition, it offers a lock-free solution for the ABA problem using only practical single-word instructions. Our experimental results on a multiprocessor system show that the new methodology offers equal and, more often, significantly better performance than other memory management methods, in addition to its qualitative advantages regarding memory reclamation and independence of special hardware support. We also show that lock-free implementations of important object types, using hazard pointers, offer comparable performance to that of efficient lock-based implementations under no contention and no multiprogramming, and outperform them by significant margins under moderate multiprogramming and/or contention, in addition to guaranteeing continuous progress and availability, even in the presence of thread failures and arbitrary delays.

...read moreread less

633 citations

...

Expand

Year	Papers
2024	1
2023	7
2022	14
2021	5
2020	8
2019	8

Topic Tools

Papers published on a yearly basis

Papers

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

Memory access scheduling

TreadMarks: shared memory computing on networks of workstations

Selective operation of a multi-state non-volatile memory system in a binary mode

Hazard pointers: safe memory reclamation for lock-free objects

Related Topics (5)

Performance Metrics