Journal Article10.1007/S11227-008-0204-2
Performance-based parallel application toolkit for high-performance clusters
Kuan-Ching Li,Tien-Hsiung Weng +1 more
12
TL;DR: The design rationale and implementation of an effective toolkit for performance measurement and analysis of parallel applications in cluster environments is introduced; not only generating parallel applications’ timing graph representation, but also to provide application execution’s performance data charts.
read more
Abstract: Advances in computer technology, encompassed with fast emerging of multicore processor technology, have made the many-core personal computers available and more affordable. The availability of network of workstations and cluster of many-core SMPs have made them an attractive solution for high performance computing by providing computational power equal or superior to supercomputers or mainframes at an affordable cost using commodity components. In order to search alternative ways to extract unused and idle computing power from these computing resources targeting to improve overall performance, as well as to fully utilize the underlying new hardware platforms, these are major topics in this field of research. In this research paper, the design rationale and implementation of an effective toolkit for performance measurement and analysis of parallel applications in cluster environments is introduced; not only generating parallel applications' timing graph representation, but also to provide application execution's performance data charts. The goal in developing this toolkit is to permit application developers have a better understanding of the application's behavior among selected computing nodes purposed for that particular execution. Additionally, multiple execution results of a given application under development can be combined and overlapped, permitting application developers to perform "what-if" analysis, i.e., to deeper understand the utilization of allocated computational resources. Experimentations using this toolkit have shown its effectiveness on the development and performance tuning of parallel applications, extending the use in teaching of message passing, and shared memory model parallel programming courses.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An energy-efficient task migration scheme based on genetic algorithms for mobile applications in CloneCloud
TL;DR: This paper constructs a scheduling problem of task migration as a constrained stochastic shortest path problem in a directed acyclic graph and designs a scheduling algorithm based on genetic algorithm to obtain the optimal task migrations.
11
XtratuM/PPC: a hypervisor for partitioned system on PowerPC processors
TL;DR: XtratuM, a real-time hypervisor designed and implemented based on the concept of a partitioned system, is introduced by enabling partitions to execute simultaneously in spatial and temporal isolation without interfering with each other, but sharing the same hardware.
8
Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs
Tien-Hsiung Weng,Hoa Pham,Hai Jiang,Kuan-Ching Li +3 more
- 01 Jan 2013
TL;DR: In this paper, a parallel algorithm for sparse matrix transposition using CSR format to run on many-core GPUs, utilizing the tremendous computational power and memory bandwidth of the GPU offered by parallel programming in CUDA.
7
Parallel Matrix Transposition and Vector Multiplication Using OpenMP
Tien-Hsiung Weng,Delgerdalai Batjargal,Hoa Pham,Meng-Yen Hsieh,Kuan-Ching Li +4 more
- 01 Jan 2013
TL;DR: Experimental results show that actual matrix transposition algorithm is comparable to the CSB-based algorithm; on the other hand, direct sparse matrix-transpose-vector multiplication using CSR significantly outperforms CSB -based algorithm.
5
Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs
Song Guo,Yong Dou,Yuanwu Lei,Qiang Wang,Fei Xia,Jianning Chen +5 more
- 18 Oct 2015
TL;DR: Experimental results show that the performance of the proposed parallel algorithm to implement the sparse matrix transposition using ELLPACK-R format can be improved up to 8x times on Nvidia Tesla C2070, compared with the implementation on the Intel Xeon E5-2650 CPU.
4
References
•Book
Using MPI: Portable Parallel Programming with the Message-Passing Interface
William Gropp,Ewing Lusk,Anthony Skjellum +2 more
- 01 Jan 1994
TL;DR: Using MPI as mentioned in this paper provides a thoroughly updated guide to the MPI (Message-Passing Interface) standard library for writing programs for parallel computers, including a comparison of MPI with sockets.
2.9K
A high-performance, portable implementation of the MPI message passing interface standard
William Gropp,Ewing Lusk,Nathan E. Doss,Anthony Skjellum +3 more
- 01 Sep 1996
TL;DR: The MPI Message Passing Interface (MPI) as mentioned in this paper is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.
2.4K
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
TL;DR: The PVM system, a heterogeneous network computing trends in distributed computing PVM overview other packages, and troubleshooting: geting PVM installed getting PVM running compiling applications running applications debugging and tracing debugging the system.
2.1K
Portable implementation of the mpi message passing interface standard
William Gropp,Ewing Lusk,Nathan E. Doss,A. Skjeltum. A Highperformance +3 more
- 01 Jan 1996
TL;DR: The MPI Message Passing Interface (MPI) as discussed by the authors is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.
2K
Related Papers (5)
Mark J. Clement,Quinn Snell,Glenn Judd +2 more
- 12 Apr 1999
Thomas Rauber,Gudula Rnger +1 more
- 11 Jun 2013
M. R. Pimple,S. R. Sathe +1 more
- 12 Feb 2011
K.D. Underwood,Ron Sass,Walter B. Ligon +2 more
- 29 Apr 2001