Optimization Techniques for GPU Programming

doi:10.1145/3570638

Open AccessJournal Article10.1145/3570638

Optimization Techniques for GPU Programming

Pieter Hijma, +4 more

- 14 Nov 2022

- ACM Computing Surveys

- Vol. 55, Iss: 11, pp 1-81

54

TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1515/nanoph-2023-0759

Unleashing the potential: AI empowered advanced metasurface research

Yunlai Fu, +6 more

- 27 Feb 2024

- Nanophotonics

TL;DR: AI-powered advanced metasurface research explores the intersection of AI and metasurfaces, leveraging AI's computational power to design, analyze, and optimize metasurfaces for various applications.

...read moreread less

11

•Journal Article•10.1016/j.parco.2023.103019

GPU acceleration of Levenshtein distance computation between long strings

David Castells-Rufas

- 01 Jul 2023

- Parallel Computing

TL;DR: In this paper , a GPU implementation of the WFA algorithm and a new optimization that can halve the elements to be computed, providing additional performance gains, are presented, which is the best ever reported.

...read moreread less

7

Journal Article•10.48550/arxiv.2402.04286

Progress and Opportunities of Foundation Models in Bioinformatics

Qing Li, +7 more

- 06 Feb 2024

- arXiv.org

TL;DR: A systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed, aiming to guide the research community in choosing appropriate FMs for their research needs.

...read moreread less

6

Journal Article•10.3390/su16041519

Sustainable Optimizing Performance and Energy Efficiency in Proof of Work Blockchain: A Multilinear Regression Approach

Meennapa Rukhiran, +2 more

- 10 Feb 2024

- Sustainability

TL;DR: The results reveal that strategically adjusting GPU hardware, software, and configuration can preserve substantial energy while preserving computational efficiency, and offer practical recommendations for optimizing the feature configurations of GPUs to reduce energy consumption, mitigate the environmental impacts of blockchain operations, and contribute to the current research on performance in PoW blockchain applications.

...read moreread less

6

Journal Article•10.2139/ssrn.4244720

GPU acceleration of Levenshtein distance computation between long strings

David Castells-Rufas

- 01 Apr 2023

- Parallel Computing

5

...

Expand

References

•Proceedings Article•10.1109/IPDPS47924.2020.00080

A High-Throughput Solver for Marginalized Graph Kernels on GPU

Yu-Hang Tang, +3 more

- 01 May 2020

TL;DR: The design and optimization of a linear solver for the efficient and high-throughput evaluation of the marginalized graph kernel between pairs of labeled graphs and a new partition-based reordering algorithm for aggregating nonzero elements of the graphs into fewer but denser tiles to improve the efficiency of the sparse format are presented.

...read moreread less

Journal Article•10.1088/1757-899X/10/1/012009

Introduction to assembly of finite element methods on graphics processors

Cristopher Cecka, +2 more

- 01 Jun 2010

TL;DR: It is found that with appropriate preprocessing and arrangement of support data, the GPU coprocessor achieves speedups of 30x or more in comparison to a well optimized serial implementation on the CPU.

...read moreread less

Journal Article•10.1109/MC.2017.3001256

Computer Architectures for Autonomous Driving

Shaoshan Liu, +3 more

- 01 Aug 2017

- IEEE Computer

TL;DR: To enable autonomous driving, a computing stack must simultaneously ensure high performance, consume minimal power, and have low thermal dissipation—all at an acceptable cost.

...read moreread less

•Proceedings Article•10.1145/1356058.1356084

Program optimization space pruning for a multithreaded gpu

Shane Ryoo, +6 more

- 06 Apr 2008

TL;DR: The complexity involved in optimizing applications for one highly-parallel system and one relatively simple methodology for reducing the workload involved in the optimization process are shown.

...read moreread less

•Journal Article•10.1016/J.PROCS.2013.05.196

An architecture-aware technique for optimizing sparse matrix-vector multiplication on GPUs

Marco Maggioni, +1 more

- 01 Jan 2013

TL;DR: The design of an architecture-aware technique for improving the performance of the SpMV on Graphic Processing Units (GPUs) is proposed, based on a novel heuristic capable of reducing cache memory accesses within hardware-level thread blocks (warps).

...read moreread less

...

Expand

Optimization Techniques for GPU Programming

Chat with Paper

AI Agents for this Paper

Citations

Unleashing the potential: AI empowered advanced metasurface research

GPU acceleration of Levenshtein distance computation between long strings

Progress and Opportunities of Foundation Models in Bioinformatics

Sustainable Optimizing Performance and Energy Efficiency in Proof of Work Blockchain: A Multilinear Regression Approach

GPU acceleration of Levenshtein distance computation between long strings

References

A High-Throughput Solver for Marginalized Graph Kernels on GPU

Introduction to assembly of finite element methods on graphics processors

Computer Architectures for Autonomous Driving

Program optimization space pruning for a multithreaded gpu

An architecture-aware technique for optimizing sparse matrix-vector multiplication on GPUs

Related Papers (5)

GPU Acceleration Using CUDA Framework

SkelCL: a high-level extension of OpenCL for multi-GPU systems

High Performance Matrix Multiplication on General Purpose Graphics Processing Units

Graphics Processing Units and Open Computing Language for parallel computing

GPU accelerated fast FEM deformation simulation