Optimization Techniques for GPU Programming

doi:10.1145/3570638

Open AccessJournal Article10.1145/3570638

Optimization Techniques for GPU Programming

Pieter Hijma, +4 more

- 14 Nov 2022

- ACM Computing Surveys

- Vol. 55, Iss: 11, pp 1-81

54

TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1515/nanoph-2023-0759

Unleashing the potential: AI empowered advanced metasurface research

Yunlai Fu, +6 more

- 27 Feb 2024

- Nanophotonics

TL;DR: AI-powered advanced metasurface research explores the intersection of AI and metasurfaces, leveraging AI's computational power to design, analyze, and optimize metasurfaces for various applications.

...read moreread less

11

•Journal Article•10.1016/j.parco.2023.103019

GPU acceleration of Levenshtein distance computation between long strings

David Castells-Rufas

- 01 Jul 2023

- Parallel Computing

TL;DR: In this paper , a GPU implementation of the WFA algorithm and a new optimization that can halve the elements to be computed, providing additional performance gains, are presented, which is the best ever reported.

...read moreread less

7

Journal Article•10.48550/arxiv.2402.04286

Progress and Opportunities of Foundation Models in Bioinformatics

Qing Li, +7 more

- 06 Feb 2024

- arXiv.org

TL;DR: A systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed, aiming to guide the research community in choosing appropriate FMs for their research needs.

...read moreread less

6

Journal Article•10.3390/su16041519

Sustainable Optimizing Performance and Energy Efficiency in Proof of Work Blockchain: A Multilinear Regression Approach

Meennapa Rukhiran, +2 more

- 10 Feb 2024

- Sustainability

TL;DR: The results reveal that strategically adjusting GPU hardware, software, and configuration can preserve substantial energy while preserving computational efficiency, and offer practical recommendations for optimizing the feature configurations of GPUs to reduce energy consumption, mitigate the environmental impacts of blockchain operations, and contribute to the current research on performance in PoW blockchain applications.

...read moreread less

6

Journal Article•10.2139/ssrn.4244720

GPU acceleration of Levenshtein distance computation between long strings

David Castells-Rufas

- 01 Apr 2023

- Parallel Computing

5

...

Expand

References

•Proceedings Article•10.1145/2688500.2688538

Gunrock: a high-performance graph processing library on the GPU

Yangzihao Wang, +5 more

- 24 Jan 2015

TL;DR: This work evaluates Gunrock on five graph primitives and shows that Gunrock has at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.

...read moreread less

306

•Proceedings Article•10.1109/IPDPSW.2010.5470941

Dense linear algebra solvers for multicore with GPU accelerators

Stanimire Tomov, +3 more

- 19 Apr 2010

TL;DR: This work describes how to code/develop solvers to effectively use the high computing power available in these new and emerging hybrid architectures of dense linear algebra (DLA) for multicore with GPU accelerators, and develops newly developed DLA solvers.

...read moreread less

297

•Journal Article•10.1186/1756-0500-2-73

CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

Yongchao Liu, +2 more

- 06 May 2009

- BMC Research Notes

TL;DR: The CUDASW++ implementation provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

...read moreread less

296

Book Chapter•10.1007/978-3-642-11515-8_10

Automatically tuning sparse matrix-vector multiplication for GPU architectures

Alexander Monakov, +2 more

- 25 Jan 2010

TL;DR: In this paper, a new storage format for sparse matrices is presented, which employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning.

...read moreread less

277

Journal Article•10.1016/J.SYSARC.2019.01.011

A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform

Sparsh Mittal

- 01 Aug 2019

- Journal of Systems Architecture

TL;DR: A survey of works that evaluate and optimize neural network applications on Jetson platform, which seeks to provide a glimpse of the recent progress towards that goal and shows the real-life applications where these algorithms have been applied.

...read moreread less

270

...

Expand

Optimization Techniques for GPU Programming

Chat with Paper

AI Agents for this Paper

Citations

Unleashing the potential: AI empowered advanced metasurface research

GPU acceleration of Levenshtein distance computation between long strings

Progress and Opportunities of Foundation Models in Bioinformatics

Sustainable Optimizing Performance and Energy Efficiency in Proof of Work Blockchain: A Multilinear Regression Approach

GPU acceleration of Levenshtein distance computation between long strings

References

Gunrock: a high-performance graph processing library on the GPU

Dense linear algebra solvers for multicore with GPU accelerators

CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

Automatically tuning sparse matrix-vector multiplication for GPU architectures

A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform

Related Papers (5)

GPU Acceleration Using CUDA Framework

SkelCL: a high-level extension of OpenCL for multi-GPU systems

High Performance Matrix Multiplication on General Purpose Graphics Processing Units

Graphics Processing Units and Open Computing Language for parallel computing

GPU accelerated fast FEM deformation simulation