Optimization Techniques for GPU Programming
54
TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
read more
Abstract: In the past decade, Graphics Processing Units have played an important role in the field of high-performance computing and they still advance new fields such as IoT, autonomous vehicles, and exascale computing. It is therefore important to understand how to extract performance from these processors, something that is not trivial. This survey discusses various optimization techniques found in 450 articles published in the last 14 years. We analyze the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Unleashing the potential: AI empowered advanced metasurface research
Yunlai Fu,Xuxi Zhou,Yiwan Yu,Jiawang Chen,Shuming Wang,Shining Zhu,Zhenlin Wang +6 more
TL;DR: AI-powered advanced metasurface research explores the intersection of AI and metasurfaces, leveraging AI's computational power to design, analyze, and optimize metasurfaces for various applications.
GPU acceleration of Levenshtein distance computation between long strings
TL;DR: In this paper , a GPU implementation of the WFA algorithm and a new optimization that can halve the elements to be computed, providing additional performance gains, are presented, which is the best ever reported.
7
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li,Zhihang Hu,Yixuan Wang,Lei Li,Yimin Fan,Irwin King,Le Song,Yu Li +7 more
TL;DR: A systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed, aiming to guide the research community in choosing appropriate FMs for their research needs.
6
Sustainable Optimizing Performance and Energy Efficiency in Proof of Work Blockchain: A Multilinear Regression Approach
Meennapa Rukhiran,Songwut Boonsong,Paniti Netinant +2 more
TL;DR: The results reveal that strategically adjusting GPU hardware, software, and configuration can preserve substantial energy while preserving computational efficiency, and offer practical recommendations for optimizing the feature configurations of GPUs to reduce energy consumption, mitigate the environmental impacts of blockchain operations, and contribute to the current research on performance in PoW blockchain applications.
6
References
Using Dynamic Parallelism for Fine-Grained, Irregular Workloads: A Case Study of the N-Queens Problem
Max Plauth,Frank Feinbube,Frank Schlegel,Andreas Polze +3 more
- 08 Dec 2015
TL;DR: Methods for employing dynamic parallelism with the goal of improved workload distribution for tree search algorithms on modern GPU hardware are investigated, and novel memory management concepts for passing parameters to child grids are presented.
12
High Performance Parallel Graph Coloring on GPGPUs
Pingfan Li,Xuhao Chen,Zhe Quan,Jianbin Fang,Huayou Su,Tao Tang,Canqun Yang +6 more
- 23 May 2016
TL;DR: This work presents a high performance parallel graph coloring implementation on GPGPUs with good coloring quality, and adapt the algorithm to improve work efficiency and reduce overhead, and incorporate several optimization techniques which reduce memory access latency and atomic operation overhead.
12
Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+
Guibin Wang,Tao Tang,Xudong Fang,Xiaoguang Ren +3 more
- 08 Dec 2009
TL;DR: This paper optimize and implement two SPEC2k benchmarks mgrid and swim on multithreaded GPU using CUDA and Brook+ and introduces a diverge elimination technology to convert condition expression into computing operation.
12
•Posted Content
GPU based Parallel Optimization for Real Time Panoramic Video Stitching
TL;DR: A real-time panoramic video stitching framework that mainly consists of three algorithms, L-ORB image feature extraction algorithm, feature point matching algorithm based on LSH and GPU parallel video stitching algorithms based on CUDA, which can improve the performance in the stages of feature extraction of images stitching and matching.
Efficient dense matrix‐vector multiplication on GPU
Guixia He,Jiaquan Gao,Jun Wang +2 more
TL;DR: Experimental results show that the proposed GEMV‐Adaptive and GEMv‐T‐ Adaptive mitigate the performance fluctuations of the implementations in the CUBLAS library, always have high performance, and outperform the most recently proposed G EMV and GemV‐T kernels by Gao et al, respectively, for all test matrices.
12