Journal Article10.1016/J.JPDC.2008.05.011
Program optimization carving for GPU computing
Shane Ryoo,Christopher I. Rodrigues,Sam S. Stone,John A. Stratton,Sain-Zee Ueng,Sara S. Baghsorkhi,Wen-mei W. Hwu +6 more
TL;DR: This work proposes program optimization carving, a technique that begins with a complete optimization space and prunes it down to a set of configurations that are likely to contain the global maximum, and shows that this approach is significantly superior to random sampling of the search space.
read more
About: This article is published in Journal of Parallel and Distributed Computing. The article was published on 01 Oct 2008. The article focuses on the topics: Random search & Program optimization.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A quantitative performance analysis model for GPU architectures
Yao Zhang,John D. Owens +1 more
- 12 Feb 2011
TL;DR: A microbenchmark-based performance model is developed for NVIDIA GeForce 200-series GPUs that identifies GPU program bottlenecks and quantitatively analyzes performance, and thus allows programmers and architects to predict the benefits of potential program optimizations and architectural improvements.
Reducing branch divergence in GPU programs
Tianyi David Han,Tarek S. Abdelrahman +1 more
- 05 Mar 2011
TL;DR: This work proposes two novel software-based optimizations, called iteration delaying and branch distribution that aim to reduce branch divergence, and shows that they improve the performance of the synthetic benchmarks and that of the real-world application by 12% and 16% respectively.
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
Mayank Daga,Ashwin M. Aji,Wu-chun Feng +2 more
- 19 Jul 2011
TL;DR: This paper empirically characterize and analyze the efficacy of AMD Fusion, an architecture that combines general-purposex86 cores and programmable accelerator cores on the same silicon die, and characterize its performance via a set of micro-benchmarks.
Fast network centrality analysis using GPUs
Zhiao Shi,Bing Zhang +1 more
TL;DR: This work proposed an efficient data parallel formulation of the All-Pairs Shortest Path problem, which is the key component for shortest path-based centrality computation, and designed three algorithms based on this core component to compute closeness centrality, eccentricity centrality and stress centrality.
GPU-based island model for evolutionary algorithms
Nouredine Melab,El-Ghazali Talbi +1 more
- 07 Jul 2010
TL;DR: This paper focuses on the parallel island model on GPU and addresses its re-design, implementation, and associated issues related to the GPU execution context.
References
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Shane Ryoo,Christopher I. Rodrigues,Sara S. Baghsorkhi,Sam S. Stone,David B. Kirk,Wen-mei W. Hwu +5 more
- 20 Feb 2008
TL;DR: This work discusses the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies, and achieves increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations.
•Book
Supercompilers for parallel and vector computers
Hans P. Zima,Barbara Chapman +1 more
- 01 Jan 1990
TL;DR: This paper presents a meta-modelling architecture for supercompilers that automates the very labor-intensive and therefore time-heavy and expensive process of learning and optimization of supercomputing systems.
778
NVIDIA cuda software and gpu parallel computing architecture
David B. Kirk
- 21 Oct 2007
TL;DR: This talk will describe NVIDIA's massively multithreaded computing architecture and CUDA software for GPU computing, a scalable, highly parallel architecture that delivers high throughput for data-intensive processing.
476
Using Machine Learning to Focus Iterative Optimization
Felix Agakov,Edwin V. Bonilla,John Cavazos,Björn Franke,Grigori Fursin,Michael O'Boyle,John Thomson,Marc Toussaint,Christopher Williams +8 more
- 26 Mar 2006
TL;DR: A new methodology is developed that uses predictive modelling from the domain of machine learning to automatically focus search on those areas likely to give greatest performance, independent of search algorithm, search space or compiler infrastructure and scales gracefully with the compiler optimization space size.
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Kayvon Fatahalian,Jeremy Sugerman,Pat Hanrahan +2 more
- 29 Aug 2004
TL;DR: An in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times, finds even near-optimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches.
Related Papers (5)
[...]
John D. Owens,Mike Houston,David Luebke,Simon Green,John E. Stone,James C. Phillips +5 more
- 01 May 2008
David B. Kirk,Wen-mei W. Hwu +1 more
- 31 Dec 2012
John R. Nickolls,Ian Buck,Michael Garland,Kevin Skadron +3 more
- 11 Aug 2008