Optimization Techniques for GPU Programming
54
TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
read more
Abstract: In the past decade, Graphics Processing Units have played an important role in the field of high-performance computing and they still advance new fields such as IoT, autonomous vehicles, and exascale computing. It is therefore important to understand how to extract performance from these processors, something that is not trivial. This survey discusses various optimization techniques found in 450 articles published in the last 14 years. We analyze the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Unleashing the potential: AI empowered advanced metasurface research
Yunlai Fu,Xuxi Zhou,Yiwan Yu,Jiawang Chen,Shuming Wang,Shining Zhu,Zhenlin Wang +6 more
TL;DR: AI-powered advanced metasurface research explores the intersection of AI and metasurfaces, leveraging AI's computational power to design, analyze, and optimize metasurfaces for various applications.
GPU acceleration of Levenshtein distance computation between long strings
TL;DR: In this paper , a GPU implementation of the WFA algorithm and a new optimization that can halve the elements to be computed, providing additional performance gains, are presented, which is the best ever reported.
7
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li,Zhihang Hu,Yixuan Wang,Lei Li,Yimin Fan,Irwin King,Le Song,Yu Li +7 more
TL;DR: A systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed, aiming to guide the research community in choosing appropriate FMs for their research needs.
6
Sustainable Optimizing Performance and Energy Efficiency in Proof of Work Blockchain: A Multilinear Regression Approach
Meennapa Rukhiran,Songwut Boonsong,Paniti Netinant +2 more
TL;DR: The results reveal that strategically adjusting GPU hardware, software, and configuration can preserve substantial energy while preserving computational efficiency, and offer practical recommendations for optimizing the feature configurations of GPUs to reduce energy consumption, mitigate the environmental impacts of blockchain operations, and contribute to the current research on performance in PoW blockchain applications.
6
References
Reduction drawing: Language constructs and polyhedral compilation for reductions on GPUs
Chandan Reddy,Michael Kruse,Albert Cohen +2 more
- 11 Sep 2016
TL;DR: This work presents language constructs that let a programmer express arbitrary reductions on user-defined data types matching the performance of tuned library implementations, and extends a polyhedral compilation flow to process these user- defined reductions.
Radio-Astronomical Imaging: FPGAs vs GPUs
Bram Veenboer,J. W. Romein +1 more
- 26 Aug 2019
TL;DR: Recent FPGA technology developments: support for the high-level OpenCL programming language, hard floating-point units, and tight integration with CPU cores combined are game changers: they dramatically reduce development times and allow using FPGAs for applications that were previously deemed too complex.
24
AN5D: automated stencil framework for high-degree temporal blocking on GPUs
Kazuaki Matsumura,Hamid Reza Zohouri,Mohamed Wahib,Toshio Endo,Satoshi Matsuoka +4 more
- 22 Feb 2020
TL;DR: AN5D as discussed by the authors automatically transforms and optimizes stencil patterns in a given C source code, and generates corresponding CUDA code for stencil computation on a given CUDA architecture.
24
NQueens on CUDA: Optimization Issues
Frank Feinbube,Bernhard Rabe,Martin von Löwis,Andreas Polze +3 more
- 07 Jul 2010
TL;DR: This paper shows the experience in applying the NQueens puzzle solution on GPUs using Nvidia's CUDA (Compute Unified Device Architecture) technology, and demonstrates that optimizations of CUDA programs may have contrary results on different CUDA architectures.
24
Optimized Acoustic Likelihoods Computation for NVIDIA and ATI/AMD Graphics Processors
TL;DR: An optimized version of a Gaussian-mixture-based acoustic model likelihood evaluation algorithm for graphical processing units (GPUs) that enables us to apply fusion techniques together with evaluating many (10 or even more) speaker-specific acoustic models to a real-time parliamentary speech recognition system.