Hierarchical partitioning algorithm for scientific computing on highly heterogeneous CPU + GPU clusters

doi:10.1007/978-3-642-32820-6_49

Book Chapter10.1007/978-3-642-32820-6_49

Hierarchical partitioning algorithm for scientific computing on highly heterogeneous CPU + GPU clusters

David Clarke, +3 more

- 27 Aug 2012

- pp 489-501

17

TL;DR: Large scale experiments on a heterogeneous multi-cluster site incorporating multicore CPUs and GPU nodes show that the presented algorithm outperforms current state of the art approaches and successfully load balance very large problems.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1145/2788396

A Survey of CPU-GPU Heterogeneous Computing Techniques

Sparsh Mittal, +1 more

- 21 Jul 2015

- ACM Computing Surveys

TL;DR: This article surveys Heterogeneous Computing Techniques (HCTs) such as workload partitioning that enable utilizing both CPUs and GPUs to improve performance and/or energy efficiency and reviews both discrete and fused CPU-GPU systems.

...read moreread less

542

•Journal Article•10.1109/TPDS.2018.2853151

Recent Advances in Matrix Partitioning for Parallel Computing on Heterogeneous Platforms

Olivier Beaumont, +5 more

- 01 Jan 2019

- IEEE Transactions on Parallel and Distri...

TL;DR: This paper presents recent approaches that relax the restriction that all partitions be rectangles and uses the first exact approach to analyse how close to the known optimal solutions the NRRP algorithm is for small numbers of partitions.

...read moreread less

26

•Journal Article•10.25781/KAUST-9M51I

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Amani AlOnazi, +3 more

- 28 May 2015

- arXiv: Distributed, Parallel, and Cluste...

TL;DR: This work proposes a number of optimizations of the dominant kernel of the Krylov solver, aimed at acceleration of the overall execution of the applications on modern GPU-accelerated heterogeneous platforms.

...read moreread less

24

•Proceedings Article•10.5555/3026877.3026910

To waffinity and beyond: a scalable architecture for incremental parallelization of file system code

Matthew Curtis-Maury, +3 more

- 02 Nov 2016

TL;DR: The evolution of the multiprocessor software architecture employed by the Netapp® Data ONTAP® WAFL® file system is described as a case study in incrementally scaling a production storage system and results demonstrate the success of the proposed MP models in delivering scalable performance while balancing time-to-market requirements.

...read moreread less

19

•Journal Article•10.1109/ACCESS.2019.2959905

A Hierarchical Data-Partitioning Algorithm for Performance Optimization of Data-Parallel Applications on Heterogeneous Multi-Accelerator NUMA Nodes

Hamidreza Khaleghzadeh, +2 more

- 01 Jan 2020

- IEEE Access

TL;DR: This paper proposes a hierarchical two-level data partitioning algorithm minimizing the parallel execution time of data-parallel applications on clusters of identical nodes where each node has $h$ identical nodes and proposes an extension of the algorithm for clusters of non-identical nodes.

...read moreread less

13

...

Expand

References

Lecture Notes in Computer Science 2382

Petrus Bollen

- 01 Jan 2002

36.7K

Journal Article•10.1145/324133.324234

Scheduling multithreaded computations by work stealing

Robert D. Blumofe, +1 more

- 01 Sep 1999

- Journal of the ACM

TL;DR: This paper gives the first provably good work-stealing scheduler for multithreaded computations with dependencies, and shows that the expected time to execute a fully strict computation on P processors using this scheduler is 1:1.

...read moreread less

1.6K

•Proceedings Article•10.1109/CLUSTR.2009.5289128

GPU clusters for high-performance computing

Volodymyr Kindratenko, +7 more

- 16 Oct 2009

TL;DR: This paper presents efforts to address some of the challenges with building and running GPU clusters in HPC environments and touches upon such issues as balanced cluster architecture, resource sharing in a cluster environment, programming models, and applications for GPU clusters.

...read moreread less

270

•Book Chapter•10.1007/BFB0046669

Performance of the decoupled ACRI-1 architecture: the perfect club

Nigel Topham, +1 more

- 03 May 1995

TL;DR: The applicability of access and control decoupling to real-world codes is investigated and bounds for the performance of these codes are derived and it is shown that, whilst some exhibit performance roughly equivalent to that on vector computers, others exhibit considerably higher performance potential in a decoupled system.

...read moreread less

245

•Book Chapter•10.1007/978-3-642-03869-3_80

StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Cédric Augonnet, +3 more

- 01 Jan 2009

- Lecture Notes in Computer Science

TL;DR: StarPU as discussed by the authors is a runtime system that provides a high-level unified execution model for numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware and easily develop and tune powerful scheduling algorithms.

...read moreread less

183