Average memory access time

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1145/356887.356892•

Cache Memories

[...]

Alan Jay Smith¹•Institutions (1)

University of California, Berkeley¹

01 Sep 1982-ACM Computing Surveys

TL;DR: Specific aspects of cache memories investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size.

...read moreread less

Abstract: design issues. Specific aspects of cache memories tha t are investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size. Our discussion includes other aspects of memory system architecture, including translation lookaside buffers. Throughout the paper, we use as examples the implementation of the cache in the Amdahl 470V/6 and 470V/7, the IBM 3081, 3033, and 370/168, and the DEC VAX 11/780. An extensive bibliography is provided.

...read moreread less

1,675 citations

Proceedings Article•10.1145/3445814.3446713•

Rethinking software runtimes for disaggregated memory

[...]

Irina Calciu¹, M. Talha Imran², Ivan Puddu³, Sanidhya Kashyap⁴, Hasan Al Maruf⁵, Onur Mutlu³, Aasheesh Kolli² - Show less +3 more•Institutions (5)

VMware¹, Pennsylvania State University², ETH Zurich³, École Polytechnique Fédérale de Lausanne⁴, University of Michigan⁵

19 Apr 2021

TL;DR: In this paper, cache coherence is used instead of virtual memory for tracking applications' memory accesses transparently, at cache-line granularity, eliminating page faults from the application critical path when accessing remote data, and decoupling the application memory access tracking from the virtual memory page size.

...read moreread less

Abstract: Disaggregated memory can address resource provisioning inefficiencies in current datacenters. Multiple software runtimes for disaggregated memory have been proposed in an attempt to make disaggregated memory practical. These systems rely on the virtual memory subsystem to transparently offer disaggregated memory to applications using a local memory abstraction. Unfortunately, using virtual memory for disaggregation has multiple limitations, including high overhead that comes from the use of page faults to identify what data to fetch and cache locally, and high dirty data amplification that comes from the use of page-granularity for tracking changes to the cached data (4KB or higher). In this paper, we propose a fundamentally new approach to designing software runtimes for disaggregated memory that addresses these limitations. Our main observation is that we can use cache coherence instead of virtual memory for tracking applications' memory accesses transparently, at cache-line granularity. This simple idea (1) eliminates page faults from the application critical path when accessing remote data, and (2) decouples the application memory access tracking from the virtual memory page size, enabling cache-line granularity dirty data tracking and eviction. Using this observation, we implemented a new software runtime for disaggregated memory that improves average memory access time by 1.7-5X and reduces dirty data amplification by 2-10X, compared to state-of-the-art systems.

...read moreread less

116 citations

Journal Article•10.1109/TC.1987.5009496•

Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme

[...]

David T. Harper¹, J. Robert Jump²•Institutions (2)

University of Texas at Dallas¹, Rice University²

01 Dec 1987-IEEE Transactions on Computers

TL;DR: The skewing scheme evaluated here does not eliminate all memory conflicts but it does improve the average performance of vector access over interleaved systems for a wide range of strides.

...read moreread less

Abstract: The degree to which high-speed vector processors approach their peak performance levels is closely tied to the amount of interference they encounter while accessing vectors in memory. In this paper we present an evaluation of a storage scheme that reduces the average memory access time in a vector-oriented architecture. A skewing scheme is used to map vector components into parallel memory modules such that, for most vector access patterns, the number of memory conflicts is reduced over that observed in interleaved parallel memory systems. Address and data buffers are used locally in each module so that transient nonuniformities which occur in some access patterns do not degrade performance. Previous investigations into skewing techniques have attempted to provide conflict-free access for a limited subset of access patterns. The goal of this investigation is different. The skewing scheme evaluated here does not eliminate all memory conflicts but it does improve the average performance of vector access over interleaved systems for a wide range of strides. It is shown that little extra hardware is required to implement the skewing scheme. Also, far fewer restrictions are placed on the number of memory modules in the system than are present in other proposed schemes.

...read moreread less

89 citations

Journal Article•10.1145/2504905•

In-network monitoring and control policy for DVFS of CMP networks-on-chip and last level caches

[...]

Xi Chen¹, Zheng Xu¹, Hyungjun Kim¹, Paul V. Gratz¹, Jiang Hu¹, Michael Kishinevsky², Umit Y. Ogras² - Show less +3 more•Institutions (2)

Texas A&M University¹, Intel²

25 Oct 2013-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This work considers a practical system architecture where the distributed LLC and the NoC share a voltage/frequency domain which is separate from the core domain, and proposes an average memory access time (AMAT)-based monitoring technique and integrate it with DVFS based on PID control theory.

...read moreread less

Abstract: In chip design today and for a foreseeable future, the last-level cache and on-chip interconnect is not only performance critical but also a substantial power consumer. This work focuses on employing dynamic voltage and frequency scaling (DVFS) policies for networks-on-chip (NoC) and shared, distributed last-level caches (LLC). In particular, we consider a practical system architecture where the distributed LLC and the NoC share a voltage/frequency domain that is separate from the core domain. This architecture enables the control of the relative speed between the cores and memory hierarchy without introducing synchronization delays within the NoC. DVFS for this architecture is more complex than individual link/core-based DVFS since it involves spatially distributed monitoring and control. We propose an average memory access time (AMAT)-based monitoring technique and integrate it with DVFS based on PID control theory. Simulations on PARSEC benchmarks yield a 27p energy savings with a negligible impact on system performance.

...read moreread less

62 citations

Proceedings Article•10.1109/ECRTS.2006.33•

Worst case timing analysis of input dependent data cache behavior

[...]

Jan Staschulat¹, Rolf Ernst¹•Institutions (1)

Braunschweig University of Technology¹

5 Jul 2006

TL;DR: A worst case timing analysis for direct mapped data caches that classifies memory accesses as predictable or unpredictable and a novel analysis framework is proposed that tightly bounds the impact on the existing cache contents as well as cache behavior of unpredictableMemory accesses themselves.

...read moreread less

Abstract: Data caches significantly reduce the average memory access time and are necessary for an efficient design. Due to its direct dependency on input data is difficult to predict the worst case timing behavior, which is crucial for a reliable system. While simulation is too time-consuming, current worst case execution time approaches focus on instruction caches only. Current approaches to data cache analysis restrict cache behavior to predictable data accesses or classify input dependent memory accesses as non-cache able. In this paper we propose a worst case timing analysis for direct mapped data caches that classifies memory accesses as predictable or unpredictable. For unpredictable memory accesses, a novel analysis framework is proposed that tightly bounds the impact on the existing cache contents as well as cache behavior of unpredictable memory accesses themselves. For predictable memory accesses, we use a local cache simulation and dataflow techniques. Furthermore, we describe an implementation of the analysis framework. Several experiments demonstrate its applicability. The approach targets real-time software verification but is also useful for design space exploration.

...read moreread less

44 citations

...

Expand

Year	Papers
2021	6
2020	3
2019	4
2018	3
2017	5
2016	6

Topic Tools

Papers published on a yearly basis

Papers

Cache Memories

Rethinking software runtimes for disaggregated memory

Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme

In-network monitoring and control policy for DVFS of CMP networks-on-chip and last level caches

Worst case timing analysis of input dependent data cache behavior

Related Topics (5)

Performance Metrics