International Parallel Processing Symposium

Conference Tools

Papers published on a yearly basis

Papers

Proceedings Article•

A resource management architecture for metacomputing systems.

[...]

Krzysztof Czajkowski, Ian Foster, Nicholas T. Karonis, Carl Kesselman, Stuart Martin, Warren Smith, Steven Tuecke - Show less +3 more

24 Aug 1999

TL;DR: This work describes a resource management architecture that distributes the resource management problem among distinct local manager, resource broker, and resource co-allocator components and defines an extensible resource specification language to exchange information about requirements.

...read moreread less

Abstract: Metacomputing systems are intended to support remote and/or concurrent use of geographically distributed computational resources. Resource management in such systems is complicated by five concerns that do not typically arise in other situations: site autonomy and heterogeneous substrates at the resources, and application requirements for policy extensibility, co-allocation, and online control. We describe a resource management architecture that addresses these concerns. This architecture distributes the resource management problem among distinct local manager, resource broker, and resource co-allocator components and defines an extensible resource specification language to exchange information about requirements. We describe how these techniques have been implemented in the context of the Globus metacomputing toolkit and used to implement a variety of different resource management strategies. We report on our experiences applying our techniques in a large testbed, GUSTO, incorporating 15 sites, 330 computers, and 3600 processors.

...read moreread less

849 citations

Proceedings Article•

Scheduling with advanced reservations

[...]

Warren Smith, Ian Foster¹, Valerie Taylor¹•Institutions (1)

Argonne National Laboratory¹

1 Jan 2000

TL;DR: This work proposes and evaluates several algorithms for supporting advanced reservation of resources in supercomputing scheduling systems and finds that the wait times of applications submitted to the queue increases when reservations are supported and the increase depends on how reservations aresupported.

...read moreread less

Abstract: Some computational grid applications have very large resource requirements and need simultaneous access to resources from more than one parallel computer. Current scheduling systems do not provide mechanisms to gain such simultaneous access without the help of human administrators of the computer systems. In this work, we propose and evaluate several algorithms for supporting advanced reservation of resources in supercomputing scheduling systems. These advanced reservations allow users to request resources from scheduling systems at specific times. We find that the wait times of applications submitted to the queue increases when reservations are supported and the increase depends on how reservations are supported. Further, we find that the best performance is achieved when we assume that applications can be terminated and restarted, backfilling is performed, and relatively accurate run-time predictions are used.

...read moreread less

311 citations

Book Chapter•10.1007/BFB0097937•

ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

[...]

Jarek Nieplocha¹, Bryan Carpenter²•Institutions (2)

Pacific Northwest National Laboratory¹, Syracuse University²

12 Apr 1999

TL;DR: ARMCI provides one-sided communication capabilities for distributed array libraries and compiler run-time systems and supports remote memory copy, accumulate, and synchronization operations optimized for non-contiguous data transfers including strided and generalized UNIX I/O vector interfaces.

...read moreread less

Abstract: This paper introduces a new portable communication library called ARMCI. ARMCI provides one-sided communication capabilities for distributed array libraries and compiler run-time systems. It supports remote memory copy, accumulate, and synchronization operations optimized for non-contiguous data transfers including strided and generalized UNIX I/O vector interfaces. The library has been employed in the Global Arrays shared memory programming toolkit and Adlib, a Parallel Compiler Run-time Consortium run-time system.

...read moreread less

284 citations

Proceedings Article•10.1109/IPPS.1994.288305•

Queue locks on cache coherent multiprocessors

[...]

Peter S. Magnusson¹, Anders Landin¹, Erik Hagersten•Institutions (1)

Swedish Institute of Computer Science¹

1 Apr 1994

TL;DR: A method to characterize the performance of proposed queue lock algorithms, and applies it to previously published algorithms conclude that the M lock is the best overall queue lock for the class of architectures studied.

...read moreread less

Abstract: Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A key issue for execution performance of many common applications is the synchronization cost. The communication scalability of synchronization has been improved by the introduction of queue-based spin-locks instead of Test&(Test&Set). For architectures with long access latencies for global data, attention should also be paid to the number of global accesses that are involved in synchronization. We present a method to characterize the performance of proposed queue lock algorithms, and apply it to previously published algorithms. We also present two new queue locks, the LH lock and the M lock. We compare the locks in terms of performance, memory requirements, code size and required hardware support. The LH lock is the simplest of all the locks, yet requires only an atomic swap operation. The M lock is superior in terms of global accesses needed to perform synchronization and still competitive in all other criteria. We conclude that the M lock is the best overall queue lock for the class of architectures studied. >

...read moreread less

227 citations

Proceedings Article•10.1109/IPPS.1997.580853•

k-ary n-trees: high performance networks for massively parallel architectures

[...]

Fabrizio Petrini, M. Vanneschi

1 Apr 1997

TL;DR: The experimental results show that the uniform, bit reversal and transpose traffic patterns are very sensitive to the flow control strategy, and complement traffic reaches an optimal performance, with a saturation point at 97% of the capacity for all flow control strategies.

...read moreread less

Abstract: The past few years have seen a rise in popularity of massively parallel architectures that use fat-trees as their interconnection networks. In this paper we study the communication performance of a parametric family of fat-trees, the k-ary n-trees, built with constant arity switches interconnected in a regular topology. Through simulation on a 4-ary 4-tree with 256 nodes, we analyze some variants of an adaptive algorithm that utilize wormhole routing with one, two and four virtual channels. The experimental results show that the uniform, bit reversal and transpose traffic patterns are very sensitive to the flow control strategy. In all these cases, the saturation points are between 35-40% of the network capacity with one virtual channel, 55-60% with two virtual channels and around 75% with four virtual channels. The complement traffic, a representative of the class of the congestion-free communication patterns, reaches an optimal performance, with a saturation point at 97% of the capacity for all flow control strategies.

...read moreread less

220 citations

...

Expand

Year	Papers
2000	6
1999	244
1998	118
1997	116
1996	9
1995	120

Conference Tools

Papers published on a yearly basis

Papers

A resource management architecture for metacomputing systems.

Scheduling with advanced reservations

ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Queue locks on cache coherent multiprocessors

k-ary n-trees: high performance networks for massively parallel architectures

Performance Metrics