TL;DR: If the queue lengths at both servers are observed then the Optimal decision is to route jobs to the shorter queue, whereas if the queue lenths are not observed then it is best to alternate between queues, provided the initial distribution of the two queue sizes is the same.
Abstract: As jobs arrive they have to be routed to one of two similar exponential servers. It is shown that if the queue lengths at both servers are observed then the Optimal decision is to route jobs to the shorter queue, whereas if the queue lenths are not observed then it is best to alternate between queues, provided the initial distribution of the two queue sizes is the same. The optimality of these routing strategies is independent of the statistics of the job arrivals.
TL;DR: The typical batch queuing system schedules jobs for execution by a set of queue controls, which limits the set of scheduling policies available to a site.
Abstract: The typical batch queuing system schedules jobs for execution by a set of queue controls. The controls determine the queue from which jobs will be selected. Within each queue, jobs are typically selected in first-in, first-out (FIFO) order. This limits the set of scheduling policies available to a site.
TL;DR: It is argued that by identifying these assumptions explicitly, it is possible to reach a level of convergence in the space of job schedulers for parallel supercomputers by associating a suitable cost function with the execution of each job.
Abstract: The space of job schedulers for parallel supercomputers is rather fragmented, because different researchers tend to make different assumptions about the goals of the scheduler, the information that is available about the workload, and the operations that the scheduler may perform. We argue that by identifying these assumptions explicitly, it is possible to reach a level of convergence. For example, it is possible to unite most of the different assumptions into a common framework by associating a suitable cost function with the execution of each job. The cost function reflects knowledge about the job and the degree to which it fits the goals of the system. Given such cost functions, scheduling is done to maximize the system's profit.
TL;DR: An evaluation of five releasing mechanisms and four dispatching rules under various levels of aggregate due-date tightness, shop cost structure, and machine utilization using simulation to demonstrate the interactive nature of releasing and dispatching on shop performance.
Abstract: Controlling the flow of material on the shop floor involves releasing and dispatching jobs to meet customer due-date requirements while attempting to keep operating costs low. This report presents an evaluation of five releasing mechanisms and four dispatching rules under various levels of aggregate due-date tightness, shop cost structure, and machine utilization using simulation. The performance criteria of total shop cost, jobs on shop floor, deviation from due dates, and job queue time are collected to demonstrate the interactive nature of releasing and dispatching on shop performance.
TL;DR: A system for allocating resources in shared data and compute clusters that improves MapReduce job scheduling in three ways, relies on a proportional share mechanism that continuously allocates virtual machine resources and automatically detects and eliminates bottlenecks within a job.
Abstract: We present a system for allocating resources in shared data and compute clusters that improves MapReduce job scheduling in three ways. First, the system uses regulated and user-assigned priorities to offer different service levels to jobs and users over time. Second, the system dynamically adjusts resource allocations to fit the requirements of different job stages. Finally, the system automatically detects and eliminates bottlenecks within a job. We show experimentally using real applications that users can optimize not only job execution time but also the cost-benefit ratio or prioritization efficiency of a job using these three strategies. Our approach relies on a proportional share mechanism that continuously allocates virtual machine resources. Our experimental results show a 11-31% improvement in completion time and 4-187% improvement in prioritization efficiency for different classes of MapReduce jobs. We further show that delay intolerant users gain even more from our system.