Automating GPU computing in MATLAB

doi:10.1145/1995896.1995936

Proceedings Article10.1145/1995896.1995936

Automating GPU computing in MATLAB

Chun-Yu Shei, +2 more

- 31 May 2011

- pp 245-254

21

TL;DR: This work presents a fully automatic source-level compilation technique to exploit a given GPU library for MATLAB, enabling coarse-grained heterogeneous parallelism across CPU and GPU.

Abstract: MATLAB is a popular software platform for scientific and engineering software writers. It offers a high level of abstraction for fundamental mathematical operations and extensive highly optimized domain-specific libraries for several scientific and engineering disciplines. With the recent availability of GPU libraries for MATLAB, it has become possible to easily exploit GPGPUs as coprocessors. However, this requires changing the code by carefully declaring variables that would live on the GPU, breaking the simplicity of the MATLAB programming model.We present a fully automatic source-level compilation technique to exploit a given GPU library for MATLAB, enabling coarse-grained heterogeneous parallelism across CPU and GPU. Our approach is based on empirically characterizing the library's functions, in order to build a comparative model of their performance on the CPU and GPU, which is then used along with a data communication cost model to maximize parallelism by selectively offloading some computation on the GPU. We achieve this by phrasing the problem as a binary integer linear programming problem aimed at minimizing CPU-GPU data movement, and using a hierarchical approach to keep the computational complexity in check. We have implemented our approach in a source-level MATLAB compiler, and present experimental results on a set of MATLAB kernels and applications using the GPUmat library. We show speedups of up to 7 times when the GPU is harnessed, compared to a standalone 8-core CPU.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/1993316.1993517

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors

Ashwin Prasad, +2 more

- 04 Jun 2011

TL;DR: The design and implementation of MEGHA is presented, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors and a set of compiler optimizations tailored for MATLAB is proposed.

...read moreread less

48

•Book

SemCache: Semantics-Aware Caching for Efficient GPU Offloading

Nabeel Al-Saber, +1 more

- 01 Jun 2016

TL;DR: SemCache is introduced, a semantics-aware GPU cache that automatically manages CPU-GPU communication and dynamically optimizes communication by eliminating redundant transfers using caching.

...read moreread less

14

•Journal Article•10.1109/TCI.2017.2654127

Fast GPU-Based Seismogram Simulation From Microseismic Events in Marine Environments Using Heterogeneous Velocity Models

Saptarshi Das, +2 more

- 16 Jan 2017

- IEEE Transactions on Computational Imagi...

TL;DR: In this paper, a novel approach is presented for fast generation of synthetic seismograms due to microseismic events, using heterogeneous marine velocity models, using the Fourier domain pseudo-spectral method which is parallelizable on the graphics processing unit (GPU) cards.

...read moreread less

10

Proceedings Article•10.1145/2935323.2935329

Automatic generation of parallel C code for stencil applications written in MATLAB

Johannes Spazier, +2 more

- 02 Jun 2016

TL;DR: This paper presents the first compiler that generates native MPI code from MATLAB source and thereby showing significant performance improvements, and presents performance results of an automatic translation from a MATLAB subset into efficient parallelized C code for different architectures: multicores, compute clusters, and GPGPUs.

...read moreread less

7

Journal Article•10.1016/J.JCP.2013.05.040

Time-stepping methods for the simulation of the self-assembly of nano-crystals in Matlab on a GPU

Maciek D. Korzec, +1 more

- 01 Oct 2013

- Journal of Computational Physics

TL;DR: A time-adaptive SBDF1/SBDF1-2-step method is presented that yields convincing results reflecting the change in timescales during topological changes of the nanostructures.

...read moreread less

7

...

Expand

References

Journal Article•10.1145/344588.344618

Static scheduling algorithms for allocating directed task graphs to multiprocessors

Yu-Kwong Kwok, +1 more

- 01 Dec 1999

- ACM Computing Surveys

TL;DR: A taxonomy that classifies 27 scheduling algorithms and their functionalities into different categories is proposed, with each algorithm explained through an easy-to-understand description followed by an illustrative example to demonstrate its operation.

...read moreread less

1.4K

Proceedings Article•10.1145/1345206.1345220

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Shane Ryoo, +5 more

- 20 Feb 2008

TL;DR: This work discusses the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies, and achieves increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations.

...read moreread less

1K

•Journal Article

An Updated Set of Basic Linear Algebra Subprograms (BLAS)

Susan Blackford, +12 more

- 01 Jun 2002

- ACM Transactions on Mathematical Softwar...

TL;DR: In this paper, the authors present a list of the companies that have contributed to the development of the Numerical Algorithms Group (NALG), including Intel, Sandia National Laboratories, and IBM.

...read moreread less

929

Proceedings Article•10.1109/SC.2010.36

OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Seyong Lee, +1 more

- 13 Nov 2010

TL;DR: This paper has developed a fully automatic compilation and user-assisted tuning system supporting OpenMPC, which builds on OpenMP to provide an abstraction of the complex CUDA programming model and offers high-level controls of the involved parameters and optimizations.

...read moreread less

271

•Book Chapter•10.1007/978-3-642-11970-5_14

Automatic C-to-CUDA code generation for affine programs

Muthu Baskaran, +2 more

- 20 Mar 2010

TL;DR: An automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular (affine) programs, that is quite close to hand-optimizedCUDA code and considerably better than the benchmarks' performance on a multicore CPU.

...read moreread less

238