TL;DR: A model to evaluate the performance of different combinations of synchronization mechanisms and scheduling policies leads to the conclusion that gang scheduling is required for efficient fine-grain synchronization on multiprogrammed multiprocessors.
TL;DR: This paper uses detailed simulation studies to evaluate the performance of several different scheduling strategies, and shows that in situations where the number of processes exceeds thenumber of processors, regular priority-based scheduling in conjunction with busy-waiting synchronization primitives results in extremely poor processor utilization.
Abstract: Shared-memory multiprocessors are frequently used as compute servers with multiple parallel applications executing at the same time. In such environments, the efficiency of a parallel application can be significantly affected by the operating system scheduling policy. In this paper, we use detailed simulation studies to evaluate the performance of several different scheduling strategies, These include regular priority scheduling, coscheduling or gang scheduling, process control with processor partitioning, handoff scheduling, and affinity-based scheduling. We also explore tradeoffs between the use of busy-waiting and blocking synchronization primitives and their interactions with the scheduling strategies. Since effective use of caches is essential to achieving high performance, a key focus is on the impact of the scheduling strategies on the caching behavior of the applications.Our results show that in situations where the number of processes exceeds the number of processors, regular priority-based scheduling in conjunction with busy-waiting synchronization primitives results in extremely poor processor utilization. In such situations, use of blocking synchronization primitives can significantly improve performance. Process control and gang scheduling strategies are shown to offer the highest performance, and their performance is relatively independent of the synchronization method used. However, for applications that have sizable working sets that fit into the cache, process control performs better than gang scheduling. For the applications considered, the performance gains due to handoff scheduling and processor affinity are shown to be small.
TL;DR: A description is given of a novel design, using a hierarchy of controllers, that effectively controls a multiuser, multiprogrammed parallel system that allows dynamic repartitioning according to changing job requirements.
Abstract: A description is given of a novel design, using a hierarchy of controllers, that effectively controls a multiuser, multiprogrammed parallel system. Such a structure allows dynamic repartitioning according to changing job requirements. The design goals are examined, and the principles of distributed hierarchical control are presented. Control over processors is discussed. Mapping and load balancing with distributed hierarchical control are considered. Support for gang scheduling as well as availability and fault tolerance is addressed. The use of distributed hierarchical control in memory management and I/O is discussed. >
TL;DR: This paper presents an integrated strategy that combines backfilling with gang scheduling, using extensive simulations based on detailed models of realistic workloads, and the benefits of combining back filling and gang scheduling are clearly demonstrated over a spectrum of performance criteria.
Abstract: Two different approaches have been commonly used to address problems associated with space sharing scheduling strategies: (a) augmenting space sharing with backfilling, which performs out of order job scheduling; and (b) augmenting space sharing with time sharing, using a technique called coscheduling or gang scheduling. With three important experimental results-impact of priority queue order on backfilling, impact of overestimation of job execution times, and comparison of scheduling techniques-this paper presents an integrated strategy that combines backfilling with gang scheduling. Using extensive simulations based on detailed models of realistic workloads, the benefits of combining backfilling and gang scheduling are clearly demonstrated over a spectrum of performance criteria.