Proceedings Article10.1109/SASP.2009.5226334
A memory optimization technique for software-managed scratchpad memory in GPUs
Maryam Moazeni,Alex A. T. Bui,Majid Sarrafzadeh +2 more
- 27 Jul 2009
- pp 43-49
32
TL;DR: A memory optimization scheme that minimizes the usage of memory space by discovering the chances of memory reuse with the goal of maximizing the application performance is proposed, based on graph coloring.
read more
Abstract: With the appearance of massively parallel and inexpensive platforms such as the G80 generation of NVIDIA GPUs, more real-life applications will be designed or ported to these platforms. This requires structured transformation methods that remove existing application bottlenecks in these platforms. Balancing the usage of on-chip resources, used for improving the application performance, in these platforms is often non-intuitive and some applications will run into resource limits. In this paper, we present a memory optimization technique for the software-managed scratchpad memory in the G80 architecture to alleviate the constraints of using the scratchpad memory. We propose a memory optimization scheme that minimizes the usage of memory space by discovering the chances of memory reuse with the goal of maximizing the application performance. Our solution is based on graph coloring. We evaluated our memory optimization scheme by a set of experiments on an image processing benchmark suite in medical imaging domain using NVIDIA Quadro FX 5600 and CUDA. Implementations based on our proposed memory optimization scheme showed up to 37% decrease in execution time comparing to their naive GPU implementations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Optimization Techniques for GPU Programming
TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
54
Exploring data placement in racetrack memory based scratchpad memory
Haiyu Mao,Chao Zhang,Guangyu Sun,Jiwu Shu +3 more
- 29 Oct 2015
TL;DR: This paper explored data allocation in SPM based on racetrack memory (RM), which is an emerging NVM with ultra-high storage density and fast access speed and addressed how to leverage genetic algorithm to achieve near-optimal data allocation.
An integer programming framework for optimizing shared memory use on GPUs
Wenjing Ma,Gagan Agrawal +1 more
- 11 Sep 2010
TL;DR: A global (intraprocedural) framework which can model structured control flow, and is not restricted to a single loop nest is presented, which outperforms a recently published heuristic method, and loop transformations also improve performance for many applications.
33
Architecting the Last-Level Cache for GPUs using STT-RAM Technology
TL;DR: The proposed two-part STT-RAM-based L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications.
15
An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs
Xin Huo,Vignesh T. Ravi,Wenjing Ma,Gagan Agrawal +3 more
- 31 May 2011
TL;DR: This paper describes an execution methodology that can address the challenges of implementing irregular applications arising from unstructured grids on modern NVIDIA GPUs and has developed optimized runtime modules to support the execution methodology.
15
References
•Book
Approximation Algorithms
Vijay V. Vazirani
- 02 Jul 2001
TL;DR: Covering the basic techniques used in the latest research work, the author consolidates progress made so far, including some very recent and promising results, and conveys the beauty and excitement of work in the field.
4.5K
•Book
MPI: The Complete Reference
Marc Snir,Steve W. Otto,David W. Walker,Jack Dongarra,Steven Huss-Lederman +4 more
- 01 Jan 1996
TL;DR: MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing.
2.8K
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Shane Ryoo,Christopher I. Rodrigues,Sara S. Baghsorkhi,Sam S. Stone,David B. Kirk,Wen-mei W. Hwu +5 more
- 20 Feb 2008
TL;DR: This work discusses the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies, and achieves increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations.
Larrabee: a many-core x86 architecture for visual computing
Larry D. Seiler,Doug Carmean,Eric Sprangle,Tom Forsyth,Michael Abrash,Pradeep Dubey,Stephen Junkins,Adam T. Lake,Jeremy Sugerman,Robert Dale Cavin,Roger Espasa,Ed Grochowski,Toni Juan,Pat Hanrahan +13 more
- 01 Aug 2008
TL;DR: This article consists of a collection of slides from the author's conference presentation, some of the topics discussed include: architecture convergence; Larrabee architecture; and graphics pipeline.
Larrabee: A Many-Core x86 Architecture for Visual Computing
Larry D. Seiler,Douglas M. Carmean,Eric Sprangle,Tom Forsyth,Pradeep Dubey,Stephen Junkins,Adam T. Lake,Robert Dale Cavin,Roger Espasa,Edward T. Grochowski,Toni Juan,Michael Abrash,Jeremy Sugerman,Pat Hanrahan +13 more
TL;DR: The Larrabee many-core visual computing architecture uses multiple in-order x86 cores augmented by wide vector processor units, together with some fixed-function logic, which increases the architecture's programmability as compared to standard GPUs.
722
Related Papers (5)
Michael Bauer,Henry Cook,Brucek Khailany +2 more
- 12 Nov 2011
Feng Ji,Heshan Lin,Xiaosong Ma +2 more
- 07 Oct 2013