TL;DR: In this paper, a computer system allocates processor time to multiple users by not using significant processor time on users which are waiting for an I/O operation to complete but expediting allocation of a processor to users after their respective IO operations complete.
Abstract: A computer system allocates processor time to multiple users A systems operator or other administrator specifies to the computer a share of processor time for each user The share can be absolute or relative The system executes users which are I/O bound with processor time less than their respective, specified share(s) by not using significant processor time on users which are waiting for an I/O operation to complete but expediting allocation of a processor to users after their respective I/O operations complete The system executes users which are CPU bound with processor time greater than their respective, specified share(s) based on their respective shares in relation to a sum of shares of other CPU bound users but excluding shares of I/O bound users For each user there is also a specified “soft limit”, “hard limit” or “no limit” When any hard limit user reaches its hard limit, no further allocation is made When any soft limit user reaches its soft limit, no further allocation is made to this soft limit user if any other soft limit user has yet to reach its soft limit or any other hard limit user has yet to reach its hard limit or if there are any unlimited users which can use more processor time
TL;DR: The parallelization of a mesoscale-cloud-scale numerical weather prediction model and experiments conducted to assess its performance are described, showing a significant decrease in elapsed time and increase in problem size relative to a single-workstation run.
Abstract: This paper describes the parallelization of a mesoscale-cloud-scale numerical weather prediction model and experiments conducted to assess its performance. The model used is the Advanced Regional Prediction System (ARPS), a limited-area nonhydrostatic model suitable for cloud-scale and mesoscale studies. Because models such as ARPS are usually memory and CPU bound, the motivation here is to decrease the computer time required for running the model and/or increase the size of the problem that can be run. A domain decomposition strategy using a network of workstations produced a significant decrease in elapsed time and increase in problem size relative to a single-workstation run. The performance of the resulting program is described by deprived formulas (collectively known as a performance model), which predict the execution time and speedup for different numbers of processors and problem sizes. The interprocessor communication speeds are shown to be the major obstacle to achieving full processor ...
TL;DR: A scheduling strategy named ELRAS integrated with an autonomous replication scheme (ARS) is proposed to enhance the data locality and performs consistently in the heterogeneous environment to prove its feasibility to adopt for a wide range of applications.
Abstract: MapReduce is a parallel programming model for processing the data-intensive applications in a cloud environment. The scheduler greatly influences the performance of MapReduce model while utilized in heterogeneous cluster environment. The dynamic nature of cluster environment and computing workloads affect the execution time and computational resource usage in the scheduling process. Further, data locality is essential for reducing total job execution time, cross-rack communication, and to improve the throughput. In the present work, a scheduling strategy named efficient locality and replica aware scheduling (ELRAS) integrated with an autonomous replication scheme (ARS) is proposed to enhance the data locality and performs consistently in the heterogeneous environment. ARS autonomously decides the data object to be replicated by considering its popularity and removes the replica as it is idle. The proposed approach is validated in a heterogeneous cluster environment with various realistic applications that are IO bound, CPU bound and mixed workloads. ELRAS improves the throughput by a factor about 2 as compared with the existing FIFO and it also yields near optimal data locality, reduce the execution time, and effective utilization of resources. The simplicity of ELRAS algorithm proves its feasibility to adopt for a wide range of applications.
TL;DR: It is found that the effect of interprocessor communication and fault tolerance on the response time for communication-extensive programs (I/O bound) is more than that for computation-ext extensive programs (CPU bound).
TL;DR: A self-training algorithm that uses load information including CPU load, memory usage and network traffic to decide the load of each node and combines this information with properties of each job, including CPU bound, memory bound and I/O bound features, that are extracted from the previous runs of these jobs is proposed.
Abstract: This paper proposes a self-training algorithm for load balancing in cluster computing. It uses load information including CPU load, memory usage and network traffic to decide the load of each node and combines this information with properties of each job, including CPU bound, memory bound and I/O bound features, that are extracted from the previous runs of these jobs. The proposed algorithm is compared to another algorithm that only uses the load information of each node for the purpose of load balancing. The performance evaluation results show that the proposed algorithm performs well.