Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems

doi:10.1109/CLUSTER.2017.42

Proceedings Article10.1109/CLUSTER.2017.42

Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems

Yingchao Huang, +1 more

- 01 Sep 2017

- pp 166-177

14

TL;DR: This paper introduces performance modeling techniques to predict performance of various data placements on GPU, and introduces a series of techniques to model critical performance factors that cause performance variation across data placement.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/PMBS.2018.8641666

Is Data Placement Optimization Still Relevant on Newer GPUs

Abdullah Shahneous Bari, +5 more

- 01 Nov 2018

TL;DR: A set of experiments is designed to explore the relevance of data placement optimizations on several generations of NVIDIA GPUs, including Kepler, Maxwell, Pascal, and Volta, and show that newer generations of GPUs are less sensitive to data placement optimization compared to older ones.

...read moreread less

12

Proceedings Article•10.1145/3572848.3577497

Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness

Zheng Xie, +3 more

- 25 Feb 2023

TL;DR: In this article , a load balance-aware page management system, named Merchandiser, is proposed to solve the problem of load imbalance among tasks in task-parallel HPC applications.

...read moreread less

6

Journal Article•10.1007/S11227-020-03452-2

Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading

Farui Wang, +5 more

- 01 May 2021

- The Journal of Supercomputing

TL;DR: OAO is presented, a compiler-based approach to automatically translate shared-memory OpenMP data-parallel programs to run on heterogeneous multicores through OpenMP offloading directives, allowing programmers to continue using a single-source-based programming language that they are familiar with while benefiting from the heterogeneous performance.

...read moreread less

5

•Journal Article•10.1109/access.2022.3196008

XUnified: A Framework for Guiding Optimal Use of GPU Unified Memory

01 Jan 2022

- IEEE Access

TL;DR: XUnified as mentioned in this paper is an advice controller that combines the offline training with the online adaptation to guide the optimal use of unified memory and discrete memory for various applications at run-time.

...read moreread less

4

•Proceedings Article•10.1145/3357526.3357559

3D photonics as enabling technology for deep 3D DRAM stacking

Sebastian Werner, +6 more

- 30 Sep 2019

TL;DR: This paper proposes a hierarchical approach to stacking 3D DRAM to tens of layers by utilizing sub-stacks which are optically-interconnected to a memory interface on the processor die, and shows that photonics could be a key enabler for deep-3DDRAM offering at least 2× interconnect area savings compared to TSVs for the same bandwidth with comparable performance and less power.

...read moreread less

3

References

•Proceedings Article•10.1109/ISPASS.2009.4919648

Analyzing CUDA workloads using a detailed GPU simulator

Ali Bakhoda, +4 more

- 26 Apr 2009

TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.

...read moreread less

1.8K

Proceedings Article•10.1145/1555754.1555760

Scalable high performance main memory system using phase-change memory technology

Moinuddin K. Qureshi, +2 more

- 20 Jun 2009

TL;DR: This paper analyzes a PCM-based hybrid main memory system using an architecture level model of PCM and proposes simple organizational and management solutions of the hybrid memory that reduces the write traffic to PCM, boosting its lifetime from 3 years to 9.7 years.

...read moreread less

1.5K

•Journal Article

Modern Information Retrieval : A Brief Overview

Amit Singhal

- 01 Jan 2001

- IEEE Data(base) Engineering Bulletin

TL;DR: This article is a brief overview of the key advances in the field of Information Retrieval, and a description of where the state-of-the-art is at inThe field.

...read moreread less

1.5K

•Proceedings Article•10.1145/339647.339668

Memory access scheduling

Scott Rixner, +4 more

- 01 May 2000

TL;DR: This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.

...read moreread less

1.1K

Proceedings Article•10.1145/1345206.1345220

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Shane Ryoo, +5 more

- 20 Feb 2008

TL;DR: This work discusses the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies, and achieves increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations.

...read moreread less

1K