Topic

Uniform memory access

About: Uniform memory access is a research topic. Over the lifetime, 9589 publications have been published within this topic receiving 197804 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

1 / 2

Papers

Journal Article•10.1145/3007787.3001140•

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

[...]

Ping Chi¹, Shuangchen Li¹, Cong Xu², Tao Zhang³, Jishen Zhao¹, Yongpan Liu⁴, Yu Wang⁴, Yuan Xie¹ - Show less +4 more•Institutions (4)

University of California¹, Hewlett-Packard², Nvidia³, Tsinghua University⁴

18 Jun 2016

TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.

...read moreread less

Abstract: Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. Moreover, with its crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and has been widely studied to accelerate neural network (NN) applications. In this work, we propose a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory. In PRIME, a portion of ReRAM crossbar arrays can be configured as accelerators for NN applications or as normal memory for a larger memory space. We provide microarchitecture and circuit designs to enable the morphable functions with an insignificant area overhead. We also design a software/hardware interface for software developers to implement various NNs on PRIME. Benefiting from both the PIM architecture and the efficiency of using ReRAM for NN computation, PRIME distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance by ~2360× and the energy consumption by ~895×, across the evaluated machine learning benchmarks.

...read moreread less

1,500 citations

Proceedings Article•10.1145/285930.285997•

Memory consistency and event ordering in scalable shared-memory multiprocessors

[...]

Kourosh Gharachorloo¹, Daniel E. Lenoski¹, James Laudon¹, Phillip B. Gibbons¹, Anoop Gupta¹, John L. Hennessy¹ - Show less +2 more•Institutions (1)

Stanford University¹

1 May 1990

TL;DR: A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced and is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization.

...read moreread less

Abstract: Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.

...read moreread less

1,275 citations

Proceedings Article•10.1145/339647.339668•

Memory access scheduling

[...]

Scott Rixner¹, William J. Dally², Ujval J. Kapasi², Peter Mattson², John D. Owens² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Stanford University²

1 May 2000

TL;DR: This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.

...read moreread less

Abstract: The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the “3-D” structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference in bandwidth between successive references to different columns within a row and different rows within a bank. This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Conservative reordering, in which the first ready reference in a sequence is performed, improves bandwidth by 40% for traces from five media benchmarks. Aggressive reordering, in which operations are scheduled to optimize memory bandwidth, improves bandwidth by 93% for the same set of applications. Memory access scheduling is particularly important for media processors where it enables the processor to make the most efficient use of scarce memory bandwidth.

...read moreread less

1,106 citations

Journal Article•10.1167/9.10.7•

The precision of visual working memory is set by allocation of a shared resource.

[...]

Paul M. Bays¹, Raquel F. G. Catalao¹, Masud Husain¹•Institutions (1)

UCL Institute of Neurology¹

09 Sep 2009-Journal of Vision

TL;DR: In this article, it was shown that the performance of visual working memory for color report task also depends on memory for location, and when errors in memory are considered for both color and location, performance on this task is in fact well explained by the resource model.

...read moreread less

Abstract: The mechanisms underlying visual working memory have recently become controversial. One account proposes a small number of memory "slots," each capable of storing a single visual object with fixed precision. A contrary view holds that working memory is a shared resource, with no upper limit on the number of items stored; instead, the more items that are held in memory, the less precisely each can be recalled. Recent findings from a color report task have been taken as crucial new evidence in favor of the slot model. However, while this task has previously been thought of as a simple test of memory for color, here we show that performance also critically depends on memory for location. When errors in memory are considered for both color and location, performance on this task is in fact well explained by the resource model. These results demonstrate that visual working memory consists of a common resource distributed dynamically across the visual scene, with no need to invoke an upper limit on the number of objects represented.

...read moreread less

992 citations

Patent•

Distributed memory switching hub

[...]

Stephen R. Haddock, Michael J. Harwood, Darrell R. Scherbarth, Herb Schneider

9 Dec 1997

TL;DR: In this article, a distributed memory switching hub interconnecting heterogeneous local area networks operating at different transmission speeds for receiving, storing and forwarding frames of data is described, which renders unnecessary the need for a central programmable processor or shared common memory to store and forward frames received by the DMS.

...read moreread less

Abstract: A distributed memory switching hub interconnecting heterogeneous local area networks operating at different transmission speeds for receiving, storing and forwarding frames of data The distributed memory switching hub employs a distributed memory architecture in which memory storage for frames of data received and to be transmitted is located at each low speed LAN port of the distributed memory switching hub A distributed memory architecture renders unnecessary the need for a central programmable processor or shared common memory to store and forward frames received by the distributed memory switching hub

...read moreread less

982 citations

...

Expand

Performance Metrics

9,632

Papers

131,201

Citations

No. of papers in the topic in previous years
Year	Papers
2025	3
2024	2
2023	13
2022	21
2021	3
2020	1

Uniform memory access

Topic Tools

Papers published on a yearly basis

Papers

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

Memory consistency and event ordering in scalable shared-memory multiprocessors

Memory access scheduling

The precision of visual working memory is set by allocation of a shared resource.

Distributed memory switching hub

Related Topics (5)

Performance Metrics