Optimizing a parallel runtime system for multicore clusters: a case study

doi:10.1145/1838574.1838586

Proceedings Article10.1145/1838574.1838586

Optimizing a parallel runtime system for multicore clusters: a case study

Chao Mei, +3 more

- 02 Aug 2010

- pp 12

29

TL;DR: This paper studies several multicore performance issues on clusters using Intel, AMD and IBM processors in the context of the Charm++ runtime system, and presents the optimization techniques that overcome these performance issues.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/ICPP.2012.9

A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems

Laércio Lima Pilla, +8 more

- 10 Sep 2012

TL;DR: NucoLB is introduced, a topology-aware load balancer that focuses on redistributing work while reducing communication costs among and within compute nodes and takes the asymmetric memory access costs present on NUMA multi-core compute nodes, the interconnection network overheads, and the application communication patterns into account in its balancing decisions.

...read moreread less

48

Proceedings Article•10.1109/ICIEAM.2018.8728629

Development of Load Balancer and Parallel Database Management Module

R. F. Gibadullin, +2 more

- 15 May 2018

TL;DR: The article discusses main development aspects of the load balancer for the cluster to increase the speed of queries processing to a database running with PostgreSQL DBMS on the Windows operating system.

...read moreread less

30

•Proceedings Article•10.1109/IPDPSW.2016.105

Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System

Marc Sergent, +3 more

- 23 May 2016

TL;DR: It is shown that the task paradigm allows to control the memory footprint of the application by throttling the task submission flow rate, striking a compromise between the performance benefits of anticipative task submission and the resulting memory consumption.

...read moreread less

30

Proceedings Article•10.1109/ICIEAM.2017.8076380

Realization of replication mechanism in PostgreSQL DBMS

R. F. Gibadullin, +2 more

- 16 May 2017

TL;DR: The very essence of data replication, replication strategy, the basic methods and techniques of replication and also compliance analyses with the replication method for high-performance parallel DBMS on the cluster platform are considered.

...read moreread less

28

Improving Parallel System Performance with a NUMA-aware Load Balancer

Laércio Lima Pilla, +6 more

- 15 Jul 2011

TL;DR: This work proposes a NUMA-aware load balancer that combines the information about the N UMA topology with the statistics captured by the Charm++ runtime system and shows improvements over existing load balancing strategies both in benchmark performance and in the time for load balancing.

...read moreread less

19

...

Expand

References

Journal Article•10.1109/99.660313

OpenMP: an industry standard API for shared-memory programming

L. Dagum, +1 more

- 01 Jan 1998

TL;DR: At its most elemental level, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran (and separately, C and C++ to express shared memory parallelism) and leaves the base language unspecified.

...read moreread less

3.8K

•Journal Article•10.1006/JPDC.1996.0107

Cilk: An Efficient Multithreaded Runtime System

Robert D. Blumofe, +5 more

- 25 Aug 1996

- Journal of Parallel and Distributed Comp...

TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.

...read moreread less

1.7K

Proceedings Article•10.1145/165854.165874

CHARM++: a portable concurrent object oriented system based on C++

Laxmikant V. Kale, +2 more

- 01 Oct 1993

TL;DR: Charm++ is an explicitly parallel language consisting of C++ with a few extensions that provides a clear separation between sequential and parallel objects and helps one write programs that are latency-tolerant.

...read moreread less

1K

Proceedings Article•10.1145/248052.248106

Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

Maged M. Michael, +1 more

- 01 May 1996

TL;DR: Experiments on a 12-node SGI Challenge multiprocessor indicate that the new non-blocking queue consistently outperforms the best known alternatives; it is the clear algorithm of choice for machines that provide a universal atomic primitive (e.g., compare_and_swap or load_linked/store_conditional).

...read moreread less

1K

•Proceedings Article

Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Becky Verastegui

- 16 Nov 2007

TL;DR: An extraordinary technical program is in store for you, and a record number of Birds-of-a-Feather submissions will provide you with highlights on a wide variety of technology and software topics, as SC07 continues in the SC conference tradition.

...read moreread less

347