Program optimization carving for GPU computing

doi:10.1016/J.JPDC.2008.05.011

Journal Article10.1016/J.JPDC.2008.05.011

Program optimization carving for GPU computing

Shane Ryoo, +6 more

- 01 Oct 2008

- Journal of Parallel and Distributed Comp...

- Vol. 68, Iss: 10, pp 1389-1401

145

TL;DR: This work proposes program optimization carving, a technique that begins with a complete optimization space and prunes it down to a set of configurations that are likely to contain the global maximum, and shows that this approach is significantly superior to random sampling of the search space.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/HPCA.2011.5749745

A quantitative performance analysis model for GPU architectures

Yao Zhang, +1 more

- 12 Feb 2011

TL;DR: A microbenchmark-based performance model is developed for NVIDIA GeForce 200-series GPUs that identifies GPU program bottlenecks and quantitatively analyzes performance, and thus allows programmers and architects to predict the benefits of potential program optimizations and architectural improvements.

...read moreread less

305

Proceedings Article•10.1145/1964179.1964184

Reducing branch divergence in GPU programs

Tianyi David Han, +1 more

- 05 Mar 2011

TL;DR: This work proposes two novel software-based optimizations, called iteration delaying and branch distribution that aim to reduce branch divergence, and shows that they improve the performance of the synthetic benchmarks and that of the real-world application by 12% and 16% respectively.

...read moreread less

235

•Proceedings Article•10.1109/SAAHPC.2011.29

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Mayank Daga, +2 more

- 19 Jul 2011

TL;DR: This paper empirically characterize and analyze the efficacy of AMD Fusion, an architecture that combines general-purposex86 cores and programmable accelerator cores on the same silicon die, and characterize its performance via a set of micro-benchmarks.

...read moreread less

147

•Journal Article•10.1186/1471-2105-12-149

Fast network centrality analysis using GPUs

Zhiao Shi, +1 more

- 12 May 2011

- BMC Bioinformatics

TL;DR: This work proposed an efficient data parallel formulation of the All-Pairs Shortest Path problem, which is the key component for shortest path-based centrality computation, and designed three algorithms based on this core component to compute closeness centrality, eccentricity centrality and stress centrality.

...read moreread less

134

•Proceedings Article•10.1145/1830483.1830685

GPU-based island model for evolutionary algorithms

Nouredine Melab, +1 more

- 07 Jul 2010

TL;DR: This paper focuses on the parallel island model on GPU and addresses its re-design, implementation, and associated issues related to the GPU execution context.

...read moreread less

94

...

Expand

References

Proceedings Article•10.1145/1345206.1345220

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Shane Ryoo, +5 more

- 20 Feb 2008

TL;DR: This work discusses the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies, and achieves increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations.

...read moreread less

1K

•Book

Supercompilers for parallel and vector computers

Hans P. Zima, +1 more

- 01 Jan 1990

TL;DR: This paper presents a meta-modelling architecture for supercompilers that automates the very labor-intensive and therefore time-heavy and expensive process of learning and optimization of supercomputing systems.

...read moreread less

778

Proceedings Article•10.1145/1296907.1296909

NVIDIA cuda software and gpu parallel computing architecture

David B. Kirk

- 21 Oct 2007

TL;DR: This talk will describe NVIDIA's massively multithreaded computing architecture and CUDA software for GPU computing, a scalable, highly parallel architecture that delivers high throughput for data-intensive processing.

...read moreread less

476

Proceedings Article•10.1109/CGO.2006.37

Using Machine Learning to Focus Iterative Optimization

Felix Agakov, +8 more

- 26 Mar 2006

TL;DR: A new methodology is developed that uses predictive modelling from the domain of machine learning to automatically focus search on those areas likely to give greatest performance, independent of search algorithm, search space or compiler infrastructure and scales gracefully with the compiler optimization space size.

...read moreread less

468

•Proceedings Article•10.1145/1058129.1058148

Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

Kayvon Fatahalian, +2 more

- 29 Aug 2004

TL;DR: An in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times, finds even near-optimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches.

...read moreread less

379

...

Expand

Program optimization carving for GPU computing

Chat with Paper

AI Agents for this Paper

Citations

A quantitative performance analysis model for GPU architectures

Reducing branch divergence in GPU programs

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Fast network centrality analysis using GPUs

GPU-based island model for evolutionary algorithms

References

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Supercompilers for parallel and vector computers

NVIDIA cuda software and gpu parallel computing architecture

Using Machine Learning to Focus Iterative Optimization

Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

Related Papers (5)

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

GPU Computing

Programming Massively Parallel Processors: A Hands-on Approach

Scalable parallel programming with CUDA

Brook for GPUs: stream computing on graphics hardware