Optimization Techniques for GPU Programming
54
TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
read more
Abstract: In the past decade, Graphics Processing Units have played an important role in the field of high-performance computing and they still advance new fields such as IoT, autonomous vehicles, and exascale computing. It is therefore important to understand how to extract performance from these processors, something that is not trivial. This survey discusses various optimization techniques found in 450 articles published in the last 14 years. We analyze the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Unleashing the potential: AI empowered advanced metasurface research
Yunlai Fu,Xuxi Zhou,Yiwan Yu,Jiawang Chen,Shuming Wang,Shining Zhu,Zhenlin Wang +6 more
TL;DR: AI-powered advanced metasurface research explores the intersection of AI and metasurfaces, leveraging AI's computational power to design, analyze, and optimize metasurfaces for various applications.
GPU acceleration of Levenshtein distance computation between long strings
TL;DR: In this paper , a GPU implementation of the WFA algorithm and a new optimization that can halve the elements to be computed, providing additional performance gains, are presented, which is the best ever reported.
7
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li,Zhihang Hu,Yixuan Wang,Lei Li,Yimin Fan,Irwin King,Le Song,Yu Li +7 more
TL;DR: A systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed, aiming to guide the research community in choosing appropriate FMs for their research needs.
6
Sustainable Optimizing Performance and Energy Efficiency in Proof of Work Blockchain: A Multilinear Regression Approach
Meennapa Rukhiran,Songwut Boonsong,Paniti Netinant +2 more
TL;DR: The results reveal that strategically adjusting GPU hardware, software, and configuration can preserve substantial energy while preserving computational efficiency, and offer practical recommendations for optimizing the feature configurations of GPUs to reduce energy consumption, mitigate the environmental impacts of blockchain operations, and contribute to the current research on performance in PoW blockchain applications.
6
References
Optimization of N-queens solvers on graphics processors
Tao Zhang,Wei Shu,Min-You Wu +2 more
- 26 Sep 2011
TL;DR: The experimental results show that the proposed approaches can substantially improve the performance of irregular computation on GPUs and could be easily applied to many other irregular problems to improve their performance.
16
Optimized strategies for mapping three-dimensional FFTs onto CUDA GPUs
Jing Wu,Joseph JaJa +1 more
- 13 May 2012
TL;DR: This paper addresses the problem of mapping three-dimensional Fast Fourier Transforms (FFTs) onto the recent, highly multithreaded CUDA Graphics Processing Units (GPUs) and presents some of the fastest known algorithms for a wide range of 3-D FFTs on the NVIDIA Tesla and Fermi architectures.
Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs
TL;DR: The proposed SpMV kernel outperforms existing state‐of‐the‐art implementations using matrices with real structures from different applications, and a technique is proposed to balance the workload among thread blocks when there are large variations in the lengths of nonzero rows.
Detailed Analysis and Optimization of CUDA K-means Algorithm
Martin Kruliš,Miroslav Kratochvíl +1 more
- 17 Aug 2020
TL;DR: This work presents a detailed analysis of individual computation steps of the k-means algorithm and proposes several optimizations that improve the overall performance on contemporary GPU architectures.
16
Clustering Throughput Optimization on the GPU
Michael Gowanlock,Cody Rude,David M. Blair,Justin D. Li,Victor Pankratius +4 more
- 01 May 2017
TL;DR: A novel hybrid approach that uses GPUs in conjunction with multicore CPUs for algorithmic throughput optimizations that yields a speedup of up to 50x over the sequential implementation on one of the experimental scenarios, which is respectable for I/O intensive clustering.
16