A memory optimization technique for software-managed scratchpad memory in GPUs

doi:10.1109/SASP.2009.5226334

Proceedings Article10.1109/SASP.2009.5226334

A memory optimization technique for software-managed scratchpad memory in GPUs

Maryam Moazeni, +2 more

- 27 Jul 2009

- pp 43-49

32

TL;DR: A memory optimization scheme that minimizes the usage of memory space by discovering the chances of memory reuse with the goal of maximizing the application performance is proposed, based on graph coloring.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1145/3570638

Optimization Techniques for GPU Programming

Pieter Hijma, +4 more

- 14 Nov 2022

- ACM Computing Surveys

TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.

...read moreread less

54

Proceedings Article•10.1109/NVMSA.2015.7304358

Exploring data placement in racetrack memory based scratchpad memory

Haiyu Mao, +3 more

- 29 Oct 2015

TL;DR: This paper explored data allocation in SPM based on racetrack memory (RM), which is an emerging NVM with ultra-high storage density and fast access speed and addressed how to leverage genetic algorithm to achieve near-optimal data allocation.

...read moreread less

34

Proceedings Article•10.1145/1854273.1854348

An integer programming framework for optimizing shared memory use on GPUs

Wenjing Ma, +1 more

- 11 Sep 2010

TL;DR: A global (intraprocedural) framework which can model structured control flow, and is not restricted to a single loop nest is presented, which outperforms a recently published heuristic method, and loop transformations also improve performance for many applications.

...read moreread less

33

Journal Article•10.1145/2764905

Architecting the Last-Level Cache for GPUs using STT-RAM Technology

Mohammad Hossein Samavatian, +3 more

- 28 Sep 2015

- ACM Transactions on Design Automation of...

TL;DR: The proposed two-part STT-RAM-based L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications.

...read moreread less

15

Proceedings Article•10.1145/1995896.1995900

An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

Xin Huo, +3 more

- 31 May 2011

TL;DR: This paper describes an execution methodology that can address the challenges of implementing irregular applications arising from unstructured grids on modern NVIDIA GPUs and has developed optimized runtime modules to support the execution methodology.

...read moreread less

15

...

Expand

References

•Book

Approximation Algorithms

Vijay V. Vazirani

- 02 Jul 2001

TL;DR: Covering the basic techniques used in the latest research work, the author consolidates progress made so far, including some very recent and promising results, and conveys the beauty and excitement of work in the field.

...read moreread less

4.5K

•Book

MPI: The Complete Reference

Marc Snir, +4 more

- 01 Jan 1996

TL;DR: MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing.

...read moreread less

2.8K

Proceedings Article•10.1145/1345206.1345220

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Shane Ryoo, +5 more

- 20 Feb 2008

TL;DR: This work discusses the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies, and achieves increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations.

...read moreread less

1K

Journal Article•10.1145/1360612.1360617

Larrabee: a many-core x86 architecture for visual computing

Larry D. Seiler, +13 more

- 01 Aug 2008

TL;DR: This article consists of a collection of slides from the author's conference presentation, some of the topics discussed include: architecture convergence; Larrabee architecture; and graphics pipeline.

...read moreread less

823

Journal Article•10.1109/MM.2009.9

Larrabee: A Many-Core x86 Architecture for Visual Computing

Larry D. Seiler, +13 more

- 01 Jan 2009

- IEEE Micro

TL;DR: The Larrabee many-core visual computing architecture uses multiple in-order x86 cores augmented by wide vector processor units, together with some fixed-function logic, which increases the architecture's programmability as compared to standard GPUs.

...read moreread less

722

...

Expand

A memory optimization technique for software-managed scratchpad memory in GPUs

Chat with Paper

AI Agents for this Paper

Citations

Optimization Techniques for GPU Programming

Exploring data placement in racetrack memory based scratchpad memory

An integer programming framework for optimizing shared memory use on GPUs

Architecting the Last-Level Cache for GPUs using STT-RAM Technology

An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

References

Approximation Algorithms

MPI: The Complete Reference

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Larrabee: a many-core x86 architecture for visual computing

Larrabee: A Many-Core x86 Architecture for Visual Computing

Related Papers (5)

A GPGPU compiler for memory optimization and parallelism management

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

ScatterAlloc: Massively parallel dynamic memory allocation for the GPU

CudaDMA: optimizing GPU memory bandwidth via warp specialization

RSVM: a region-based software virtual memory for GPU