Optimization Techniques for GPU Programming

doi:10.1145/3570638

Open AccessJournal Article10.1145/3570638

Optimization Techniques for GPU Programming

Pieter Hijma, +4 more

- 14 Nov 2022

- ACM Computing Surveys

- Vol. 55, Iss: 11, pp 1-81

54

TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1007/s00607-023-01255-w

Many-BSP: an analytical performance model for CUDA kernels

Ali Riahi, +2 more

- 26 Feb 2024

- Computing

Journal Article•10.1145/3624309.3624314

GPotion: An embedded DSL for GPU programming in Elixir

André Rauber Du Bois, +1 more

- 25 Sep 2023

TL;DR: GPotion is an embedded DSL for GPU programming in Elixir that allows writing low-level kernels and high-level facilities with little overhead.

...read moreread less

Journal Article•10.3390/app13148207

Area Division Using Affinity Propagation for Multi-Robot Coverage Path Planning

Nikolaos Baras, +1 more

- 14 Jul 2023

- Applied Sciences

TL;DR: In this article , an innovative methodology that employs affinity propagation (AP) for area allocation in multi-robot CPP is introduced, where the area is partitioned into 'n' clusters through AP, with each cluster subsequently assigned to a robot.

...read moreread less

Preprint•10.2139/ssrn.4790635

Gsparlib: A Multi-Level Programming Interface Unifying Opencl and Cuda for Expressing Stream and Data Parallelism

Dinei A. Rockenbach, +3 more

- 01 Jan 2024

TL;DR: Gsparlib is a programming interface that unifies Opencl and Cuda, enabling the expression of stream and data parallelism across multiple platforms.

...read moreread less

Journal Article•10.1145/3656411

Descend: A Safe GPU Systems Programming Language

Bastian Köpcke, +2 more

- 20 Jun 2024

- Proceedings of the ACM on programming la...

TL;DR: Descend is a safe GPU systems programming language that enforces safe CPU and GPU memory management in the type system, preventing data races and deadlocks.

...read moreread less

...

Expand

References

Journal Article•10.1038/NATURE14539

Deep learning

Yann LeCun, +4 more

- 28 May 2015

- Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

67K

Journal Article•10.1287/OPRE.9.3.383

A Proof for the Queuing Formula: L = λW

John D. C. Little

- 01 Jun 1961

- Operations Research

TL;DR: In this paper, it was shown that if the three means are finite and the corresponding stochastic processes strictly stationary, and if the arrival process is metrically transitive with nonzero mean, then L = λW.

...read moreread less

2.7K

•Journal Article•10.1145/1498765.1498785

Roofline: an insightful visual performance model for multicore architectures

Samuel Williams, +2 more

- 01 Apr 2009

- Communications of The ACM

TL;DR: The Roofline model offers insight on how to improve the performance of software and hardware in the rapidly changing world of connected devices.

...read moreread less

2.6K

Journal Article•10.1109/MM.2008.31

NVIDIA Tesla: A Unified Graphics and Computing Architecture

Erik Lindholm, +3 more

- 01 Mar 2008

- IEEE Micro

TL;DR: To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture, which is massively multithreaded and programmable in C or via graphics APIs.

...read moreread less

1.6K

Proceedings Article•10.1145/2491956.2462176

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

Jonathan Ragan-Kelley, +5 more

- 16 Jun 2013

TL;DR: A systematic model of the tradeoff space fundamental to stencil pipelines is presented, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule are presented.

...read moreread less

1.2K

...

Expand

Optimization Techniques for GPU Programming

Chat with Paper

AI Agents for this Paper

Citations

Many-BSP: an analytical performance model for CUDA kernels

GPotion: An embedded DSL for GPU programming in Elixir

Area Division Using Affinity Propagation for Multi-Robot Coverage Path Planning

Gsparlib: A Multi-Level Programming Interface Unifying Opencl and Cuda for Expressing Stream and Data Parallelism

Descend: A Safe GPU Systems Programming Language

References

Deep learning

A Proof for the Queuing Formula: L = λW

Roofline: an insightful visual performance model for multicore architectures

NVIDIA Tesla: A Unified Graphics and Computing Architecture

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

Related Papers (5)

GPU Acceleration Using CUDA Framework

SkelCL: a high-level extension of OpenCL for multi-GPU systems

High Performance Matrix Multiplication on General Purpose Graphics Processing Units

Graphics Processing Units and Open Computing Language for parallel computing

GPU accelerated fast FEM deformation simulation