TL;DR: The hierarchical architecture and weighted resource selection of the WRD algorithm improve resource inspection, flexibility, and high-bandwidth RD and resource information tables boost grid computing RD efficiency and efficacy.
Abstract: In general, parallel applications require lots of computer power and grid computing. Efficient Resource Discovery (RD) algorithms determine grid resource allocation and execution time. To improve the resource distribution and reduce grid node communication costs, this study introduces hierarchical, weighted RD (WRD). A behavioral modeling technique checks the algorithm’s accuracy and efficacy. For complete analysis and simulation, StarUML implements a WRD algorithm behavioral model. The NuSMV model checker evaluates reachability and deadlock-free RD. The WRD algorithm is assessed using key performance metrics. To evaluate resource-finding efficiency, each request’s inspected nodes are counted. The number of re-discovery operations shows the algorithm’s resource flexibility. Algorithms that find free resources with high bandwidth links are also evaluated to optimize grid resource allocation. Resource information tables could improve resource location. Resource information is stored in a table to help the algorithm allocate resources. This research seeks to develop grid computing by solving RD problems. The hierarchical architecture and weighted resource selection of the WRD algorithm improve resource inspection, flexibility, and high-bandwidth RD. Behavioral modeling and verification show the algorithm’s accuracy and grid suitability. WRD and resource information tables boost grid computing RD efficiency and efficacy. This research optimizes grid performance and resource allocation.
TL;DR: The paper proposes an application of intelligent cloud computing technology in optical communication network security of smart grid to improve the security of smart grid optical communication network by increasing spectrum utilization, improving safe channel capacity, enhancing transmission stability, and accurately identifying intrusion factors.
Abstract: In order to improve the security of smart grid optical communication network, this paper combines intelligent cloud computing technology to build a security system of optical communication network of power grid. In the aspect of improving communication security performance, multi-user access is realized by non-orthogonal power domain segmentation, and different users use different powers to add and superimpose the same spectrum resources, so as to increase spectrum utilization. At the sending end, this paper improves the safe channel capacity of users by means of pre-coding and artificial noise, and realizes the safe transmission of information. In terms of transmission stability, the cloud computing platform is used as a data processing platform, and multiple nodes are processed synchronously through optical communication state identification, which can more effectively improve the speed of optical communication state identification data. In order to test the performance of the power grid information dispatching model designed in this paper in optimizing power grid configuration and improving power grid load, simulation experiments are carried out. Through the experimental analysis, we can see that the communication method proposed in this paper can accurately identify the intrusion factors, and can effectively improve the security of smart grid optical communication network.
Georges Aad, Erlend Aakvaag, B. Abbott, Sara Abdelhameed +2908 more
22 May 2024
TL;DR: The ATLAS Google Project successfully integrated cloud resources into the distributed computing infrastructure, providing additional flexible computing capacity. A total cost of ownership analysis identified network usage as the dominant cost driver and resource bursting as an effective cost control mechanism. Further investigations are underway to improve integration and reduce network-related costs.
Abstract: The ATLAS Google Project was established as part of an ongoing evaluation of the use of commercial clouds by the ATLAS Collaboration, in anticipation of the potential future adoption of such resources by WLCG grid sites to fulfil or complement their computing pledges. Seamless integration of Google cloud resources into the worldwide ATLAS distributed computing infrastructure was achieved at large scale and for an extended period of time, and hence cloud resources are shown to be an effective mechanism to provide additional, flexible computing capacity to ATLAS. For the first time a total cost of ownership analysis has been performed, to identify the dominant cost drivers and explore effective mechanisms for cost control. Network usage significantly impacts the costs of certain ATLAS workflows, underscoring the importance of implementing such mechanisms. Resource bursting has been successfully demonstrated, whilst exposing the true cost of this type of activity. A follow-up to the project is underway to investigate methods for improving the integration of cloud resources in data-intensive distributed computing environments and reducing costs related to network connectivity, which represents the primary expense when extensively utilising cloud resources.
Abstract: In scheduling, the main factor that affects searching speed and mapping performance is the number of resources or the size of search space.In grid computing, the scheduler performance plays an essential role in the overall performance.So, it is obvious the need for scalable scheduler that can manage the growing in resources (i.e.scalable).With the assumption that each resource has its own specifications and each job has its own requirements; then searching the whole search space (all the resources) can waste plenty of scheduling time.In this paper, we propose a two-phase scheduler that uses min-min algorithm to speed up the mapping time with almost the same efficiency.The scheduler is also based on the assumption that the resources in grid computing can be classified into clusters.The scheduler tries first to schedule the jobs to the suitable cluster (i.e.first phase) and then each cluster schedule the incoming jobs to the suitable resources (i.e.second phase).The scheduler is based on multidimensional QoS to enhance the mapping as much as it can.The simulation results show that the use of two-phase strategy can support the scalable scheduler.
TL;DR: This study examines the stability impacts of interactions between grid-forming and grid-following converters in a hybrid system connected to a weak grid, analyzing self-influence and inter-influence components using Bode diagrams and modal analysis.
Abstract: This paper investigates the impacts of interactions between grid-forming and grid-following VSCs on the stability of hybrid system connected to weak grid. To begin with, the stability analysis model and the interaction analysis model of the hybrid VSCs system connected to the weak power grid are established. The self-influence and the inter-influence components are defined to quantify the interaction degree between grid-forming and grid-following VSCs. Then, combined with the Bode diagram and modal analysis method, the influence of the interaction between grid-forming and grid-following VSCs on the stability of system is analyzed by investigating self-influence and the inter-influence components with variations of operating points and control parameters. Finally, the above analysis results are verified by hardware-in-the-loop test platform.
TL;DR: This study presents a novel hybrid computational technique combining evolutionary programming and genetic algorithm to optimize hydrothermal scheduling with renewable energy integration, improving system performance, reducing costs, and minimizing environmental impact on the IEEE 30 bus system.
Abstract: Hydrothermal scheduling with the integration of renewable energy is a critical factor in the shift to sustainable power generation. Maximizing the efficiency and undependability of these conventional systems is essential to attaining energy sustainability objectives. This article presents a novel approach for scheduling of hydrothermal and renewable energy systems on the IEEE 30 bus system utilizing an enhanced hybrid computational technique by combining the evolutionary programming (EP) technique with genetic algorithm (GA). The method combines the benefits of genetic algorithms with powerful optimization mechanisms to successfully control the equality and inequality constraints in the systems. The suggested technique optimizes scheduling by considering various parameters such as energy consumption, resource availability, and environmental restrictions to improve system performance, decrease costs, and reduce the ecological effects. Simulations and case studies on the IEEE 30 bus system illustrate the EP-GA technique’s efficacy and resilience, highlighting its potential for practical deployment in real-world energy management settings.
Muhammad Ismail, Rakibuzzaman Shah, Nima Amjady, Syed Islam, Md. Sazal Miah
21 Jul 2024
TL;DR: This study examines the composite stability of a medium voltage distribution system in regional Australia, incorporating grid-forming and grid-following resources, and reveals significant variations in stability margin under different load conditions and control parameter heterogeneity.
Abstract: Modern power systems are shifting with significant penetration of inverter-based resources. Grid-forming (GFM) generators with grid-following (GFL) resources are anticipated to dominate next-generation power systems. Inverter-based resources (IBRs) are changing the dynamics of the distribution systems. Significant work has been done in the device-level solution of the GFM. However, the system-level studies in distribution systems are still limited. This paper reveals the composite stability behaviour (i.e., combined stability performance of the system with two or more stability characteristics) of a medium voltage (MV) distribution system with GFM and GFL resources. The MV network from regional Australia has been used for the study. Significant variations of the composite stability margin could be observed under the WECC composite load model in the MV distribution system. The control parameters' heterogeneity significantly impacts the composite stability of the distribution system considered for this study.
TL;DR: This paper proposes a hybrid grid control algorithm combining optimized genetic algorithm and spiking neural network controller to achieve optimal voltage control in a distributed grid, leveraging a neuromorphic system on chip with low power consumption and fast inference time.
Abstract: Due to the distributed nature of the electrical grid, intelligent, timely control of critical components such as volt-age regulators, capacitor banks and switches in a highly dynamic environment is extremely challenging. Modern day localized controllers in a distribution system such as remote ter-minal units (RTUs) and programmable logic units (PLCs) are often housed in a substation with multiple connections to several critical controls and are thus good candidates for hosting advanced localized control via deep reinforcement learning (DRL) or evolutionary learning. Additionally, it is feasible to greatly reduce the computational throughput and provide fine-grained control measurement and control using a spiking neural network (SNN) neural processing unit (NPU) architecture. In this paper we explore the use of a hybrid grid control algorithm composed of an optimized genetic algorithm (GA) and SNN controller which acquires optimal distribution system voltage control policies while a system is experiencing a cyber adver-sary's injected control noise. We discuss the computational advantages of the GA-SNN hybrid algorithm in the fine-grained autonomous control of a grid's bus voltages using a commercially available neuromorphic system on chip system (NSoC) with a mean rated power consumption of 1 mW and observed mean prediction time of 4 msecs per inference.
KaiLun Eng, Abdullah Muhammed, Sazlinah Hasan, Mohamad Afendee Mohamed
1 Jul 2024
Abstract: To realise the utmost idea of global collaborative resource sharing with Grid computing, the fundamental scheduling process is playing a critical role.However, scheduling in Grid computing environment is a well-known NP-complete problem.In this study, we propose a new extension of Great Deluge algorithm with an effective diversification strategy for the Grid scheduling problem.The proposed approach, namely BiGD, exploits two different decay rates (a linear and a non-linear decay rate of water level) to provide a better diversification strategy for exploring the solution space.The performance of the proposed algorithm has been evaluated and compared with the standard Great Deluge and Extended Great Deluge algorithm, through the GridSim simulation toolkit.Four different scheduling scenarios or cases which comprise different combination of task heterogeneity and resource heterogeneity are considered for the performance evaluation.Moreover, we have adapted all the algorithms to have same total number of evaluation for solution searching in order to ensure a fair comparison is established in the performance evaluation.The experimental simulation results show that the proposed algorithm is superior and able to produce good quality solutions compared to the other algorithms in all the problem instances.
TL;DR: Sure, here is the TLDR: The paper surveys cloud security issues and techniques using cloud computing and discusses the security issues faced in cloud computing due to the lack of security in virtual locations.
Abstract: Nowadays, we know that cloud computing is one of the most needed ways of computing in the sector of information technology. It is the services and resources which are provided to the user on the internet and network. Grid computing and distributed computing are some of the computing techniques used in current trends and are also used in industrial, academic, and research fields. Day by day new techniques are coming into the market which is subsequently spreading the use of cloud computing. As there is an increase in the use of cloud computing mechanisms, there is a high increase in security issues and challenges faced in cloud computing. The data is saved in the cloud which is a virtual location and lack of security will lead to a loss of user’s trust in the service providers. Discussing this in our paper, we have surveyed the security issues in the cloud and the countermeasures which need to be taken to reduce them. Some common aspects are taken into consideration such as multi-tenancy, elasticity, availability, etc. The paper will give insight to academicians, researchers, and professionals to learn about the security issues in the cloud and the models proposed to solve them.
Chenyu Li, Kun Xie, Xiaolong Ma, Cuo Cai, Yanchun Sun
31 May 2024
Abstract: Grid computing for resources sharing and distributed computing has been researched widely in the past. As for distributed spatial datasets, the current centralized administrative scheme may become the system performance bottleneck. This paper presents a distributed cooperative grid computing technology to facilitate complex spatial applications by collaboration among distributed spatial resources. A hierarchical spatial index and communication protocol has been designed for the collaboration, which enables a dynamical choice for the best quality nodes for specified subtasks, synchronized execution, and compensation for a failure to execute a subtask. Also, we present an approach for dynamic resource allocation and distributed transaction mechanics to ensure consistency.
Régis Hontinfinde, Ariel Kamoyedji, Mahugnon Géraud Azehoun Pazou, Marcos Thyrbus Vitouley, Roland Sèmako Honfo, Christian Akowanou
12 Apr 2024
TL;DR: The dynamic job assignment problem in grid computing is addressed in this paper. A dynamic job scheduling genetic algorithm is proposed to minimize the Makespan. Experimental results show an improvement of about 13.67% in Makespan compared to the DRMSLS-based method.
Abstract: The advent of the Software Defined Networking (SDN) and Network Function Virtualization (NFV) technologies in the field of telecommunications has led to a tremendous reduction of costs as well as an increase in terms of infrastructure flexibility. Thus, equipments that play a pivotal role in telecommunications have merely been replaced by computers and their functions have accordingly been softwarized. To further reduce costs, some telecommunications carriers are now considering using decentralized grid computing-based networks to route a part of their traffic. The user-PC computer system (UPC) system is a technology for pooling and making available the unused computing resources (CPU) of the personal computers (PC) of members belonging to the same organization. For instance, in the drug discovery, economic forecasting, seismic analysis fields that deal with computationally intensive problem solving, Grid Computing has become an essential tool for building an affordable computing environment. We have previously designed an efficient static job scheduling genetic algorithm considering CPU core utilization for the UPC system. It finds an optimal assignment that minimizes the processing time of the available set of jobs in a static scenario, i.e., jobs are all available in the beginning and all worker PCs are idle. However, in reality, jobs join and leave the system dynamically and therefore, when new jobs join the system, some workers may be busy. In this paper, we propose an extension of our previously proposed static job scheduling genetic algorithm, that considers dynamic job scheduling in the UPC system. We conducted experiments using six worker PCs and up to 51 jobs. A comparative study of the results obtained using our dynamic scheduling genetic algorithm showed an improvement of about 13.67% in Makespan compared to that of the Dynamic Randomized Multi-Start Local Search (DRMSLS)-based method.
TL;DR: This study analyzes the stability of hybrid grid-forming and grid-following VSCs connected to weak grids, establishing a stability model and investigating the impact of operating points and control loops on system stability through modal analysis and experimental verification.
Abstract: Generally, grid-following converters have current source characteristics, while grid-forming converters have voltage source characteristics. The parallel connection of the two to the AC power grid has more complex control characteristics, which can cause new stability issues. This article focuses on the stability issues of hybrid grid-forming and grid-following converters connected to weak current networks. The following work is carried out: Firstly, a stability analysis model for the hybrid converter system connected to weak current networks is established. Then, the influence of system operating points and control loops on the stability of the hybrid converter system is studied through modal analysis method. Finally, the experimental verification of the above analysis results was carried out through the integration of a grid type and a grid following hybrid converter system into a weak current network.
TL;DR: This study proposes a self-adaptive multi-instance broker scheduling algorithm (SAMiB) for grid computing, achieving a 14.93% decrease in makespan time for 2000 jobs, outperforming the iHLBA algorithm in hierarchical cluster grid environments with varying background loads and CPU speeds.
Abstract: A grid resource broker seeks to assign the appropriate jobs to the appropriate resources as part of resource management in the multi-grid environment.Multi instances of the broker system provides multiple instances of brokers to simultaneously process jobs between multiple resources in a hierarchical cluster grid environment.In this study, the multi-instance broker is developed using grid resource broker taxonomy properties.The number of broker instances to be used for each processing session is determined by calculating resources, computing power and workload.The Self-Adaptive Multi-Instance Broker Scheduling algorithm SAMiB was tested against iHLBA algorithm through four types of scenarios containing various mixes of background load and CPU speed.The SAMiB algorithm has achieved a decrease of 14.93% in makespan time for 2000 jobs, proving the suitability of the multiinstance broker concept for the hierarchical cluster grid environment.
TL;DR: Using big data, this paper introduces a power grid fault detection and early warning communication framework grounded on graph neural network models that can predict faults in the power grid, automatically calculate the most likely affected areas in the future, and timely collect communication to avoid large-scale ships with power grid faults.
Abstract: With the rapid development of smart grids, the complexity and scale of power grid systems continue to expand, and the requirements for power grid fault prediction and communication systems are also increasing. Conventional approaches to fault prediction commonly depend on human expertise and scheduled checks, which inherently suffer from drawbacks like low precision in predictions and sluggish response times. However, the advent of computer big data technologies has ushered in fresh remedies for addressing fault prediction within intelligent power grids. Thus, leveraging big data, this paper introduces a power grid fault detection and early warning communication framework grounded on graph neural network models. This model can predict faults in the power grid, automatically calculate the most likely affected areas in the future, and timely collect communication to avoid large-scale ships with power grid faults. In the validation dataset, we validated the performance of the model, and the results showed that the model not only has good fault prediction accuracy and recall, but also can take corresponding measures in a timely manner to avoid fault propagation.
TL;DR: The ROOT-based distributed analysis workflow for HL-LHC CMS uses RDataFrame and JupyterLab to improve the programming experience and significantly increase the speedup of analysis.
Abstract: The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach.
Abstract: <p>This paper represents the different computing types, their characteristics, advantages, and disadvantages of cluster computing, grid computing, utility computing, cloud computing, mobile cloud computing and fog computing. In this paper, we also discuss the technologies of mobile cloud Computing like grid computing utility computing, fog computing etc. In this paper there will be an overview for other types of computing and highlight the differences between these computing types and also discuss the future scope of these different computing technologies.</p>
TL;DR: This study explores the integration of cloud computing in power grid dispatch systems, presenting a cloud-based design that enhances data processing, real-time control, and intelligence, and demonstrates exceptional performance in experimental results.
Abstract: The power grid dispatch system, a pivotal component in smart grid advancement, directly influences the stability and reliability of the entire power grid through its operational efficiency and intelligence. Given the escalating electricity demand and the growing complexity of power grid structures, conventional dispatch systems are inadequate for modern management. Consequently, a multi-regional and collaborative computing model has emerged as a novel trend in power grid dispatch development, with cloud computing technology offering robust support. By leveraging cloud computing platforms, the dispatch system can centralize the processing and analysis of data from multiple regions, enhancing the understanding of grid operations and informing scientific decision-making. This article explores the integration of cloud computing in power grid dispatch systems and presents a design for a cloud-based dispatch system. This system facilitates real-time data collection, transmission, storage, and processing via a cloud platform, while enabling joint scheduling and collaborative computing across regions. Experimental results highlight its exceptional performance in data processing, real-time control, and intelligence, providing substantial technical backing for the power grid dispatch system.
Mohammed Elshambakey, Aya I. Maiyza, Mona S. Kashkoush, Ghada M. Fathy, Hanan A. Hassan
5 Feb 2024
TL;DR: The EN-HPCG grid utilizes Open-Source Slurm and PBS Pro schedulers to consolidate high-performance computing resources. Its architecture and capabilities are evaluated using specific high-throughput computing applications, focusing on grid-level performance metrics and site speed-up. The results suggest that the optimal scheduler choice depends on the specific hardware configuration and desired functionalities.
Abstract: Abstract Recently, Egypt has recognized the pivotal role of High-Performance Computing in advancing science and innovation. The Egyptian National High Performance Computing Grid (EN-HPCG) advocates for consolidating high-performance computing resources, creating a unified pool accessible to all universities and scientific research centers using the PBS Pro scheduler. Therefore, this paper delves into the architecture and capabilities of the EN-HPCG grid using two different workload management systems: (i) Slurm (Open-Source) and (ii) PBS Pro (Licensed). Additionally, it assesses the grid's performance in specific high-throughput computing (HTC) applications using the NAS Grid parallel benchmark (NGB). The evaluation includes grid-level performance metrics such as throughput, and the number of tasks completed as a function of time. Also, the presented methodology aims to assist potential partners in their decision-making process to join the EN-HPCG grid, with a focus on the site speed-up metric. Our results showed that it is not advisable to integrate a cluster with high-speed hardware with a cluster possessing outdated hardware when using the Slurm scheduler. In contrast, the PBS Pro scheduler takes into account online decision-making in a dynamic environment using a unified grid.
TL;DR: Researchers propose a novel approach to emulate a computing grid in a local environment, enabling controlled evaluation of system features without disrupting production operations, contributing to the field of computing grids and distributed systems.
Abstract: The necessity for complex calculations in high-energy physics and large-scale data analysis has led to the development of computing grids, such as the ALICE computing grid at CERN. These grids outperform traditional supercomputers but present challenges in directly evaluating new features, as changes can disrupt production operations and require comprehensive assessments, entailing significant time investments across all components. This paper proposes a solution to this challenge by introducing a novel approach for emulating a computing grid within a local environment. This emulation, resembling a mini clone of the original computing grid, encompasses its essential components and functionalities. Local environments provide controlled settings for emulating grid components, enabling researchers to evaluate system features without impacting production environments. This investigation contributes to the evolving field of computing grids and distributed systems, offering insights into the emulation of a computing grid in a local environment for feature evaluation.
Abstract: An efficient resource discovery mechanism is one of the fundamental requirements for grid computing systems, as it aids in resource management and scheduling of applications.Resource discovery activity involves searching for the appropriate resource types that match the user's application requirements.Classical approaches to Grid resource discovery are either centralized or hierarchical, and it becomes inefficient when the scale of Grid systems increases rapidly.On the other hand, the Peer-to-Peer (P2P) paradigm emerged as a successful model as it achieves scalability in distributed systems.Grid system using P2P technology can improve the central control of the traditional grid and restricts single point of failure.In this paper, we propose a new approach based on P2P techniques for resource discovery in grids using Hypercubic P2P Grid (HPGRID) topology connecting the grid nodes.A scalable, faulttolerant, self-configuring search algorithm is proposed as Parameterized HPGRID algorithm, using isomorphic partitioning scheme.By design, the algorithm improves the probability of reaching all the working nodes in the system, even in the presence of non-alive nodes (inaccessible, crashed or nodes loaded by heavy traffic).The scheme can adapt to a complex, heterogeneous and dynamic resources of the grid environment, and has a better scalability
TL;DR: Modeling distributed computing infrastructures for HEP applications requires large-scale simulations to evaluate various design options. New simulation capabilities are needed to handle complex networks, data access and caching patterns. Studies of accuracy and scalability are presented using HEP as a case-study.
Abstract: Predicting the performance of various infrastructure design options in complex federated infrastructures with computing sites distributed over a wide area network that support a plethora of users and workflows, such as the Worldwide LHC Computing Grid (WLCG), is not trivial. Due to the complexity and size of these infrastructures, it is not feasible to deploy experimental test-beds at large scales merely for the purpose of comparing and evaluating alternate designs. An alternative is to study the behaviours of these systems using simulation. This approach has been used successfully in the past to identify efficient and practical infrastructure designs for High Energy Physics (HEP). A prominent example is the Monarc simulation framework, which was used to study the initial structure of the WLCG. New simulation capabilities are needed to simulate large-scale heterogeneous computing systems with complex networks, data access and caching patterns. A modern tool to simulate HEP workloads that execute on distributed computing infrastructures based on the SimGrid and WRENCH simulation frameworks is outlined. Studies of its accuracy and scalability are presented using HEP as a case-study. Hypothetical adjustments to prevailing computing architectures in HEP are studied providing insights into the dynamics of a part of the WLCG and candidates for improvements.
Abstract: High Performance Computing (HPC) - also known as supercomputing - comes into play when currently available systems, from personal computers to workstations and servers, are not sufficient to provide the required result in an acceptable amount of time. Therefore, the use of HPC systems is increasingly seen as a service and supports many different work processes in science, research, development, and business.
As both the acquisition and operation of these systems are associated with significant costs, it becomes crucial to use them efficiently across institutional boundaries. This is summarized in the term Grid computing.
However, cross-institutional use and embedding in different work processes place demands on the environment - the Grid operating system - that are currently not yet met. For example, it is essential that the Grid operating system ensures that the parallel programs that perform the calculations on the HPC systems are successfully completed at a given time in order to make the results available to the users.
This thesis deals with the question of how such a quality of service (QoS) can be ensured. To this end, the specific requirements within the VRM project are developed as an example of a Grid operating system, current approaches to the management of HPC systems are examined and a new proactive, plan-based approach is developed, presented and its suitability examined and proven with the use of extensive simulations. The examinations of the different approaches, as well as the new proactive, plan-based approach, are carried out simulatively, taking into account the runtime behavior of various parallel programs. In total, more than 1.2 million simulations were performed for the examinations.
As part of the development of the proactive, plan-based approach, all fundamental questions associated with the provision of the necessary information on resource consumption were addressed and a procedure for modeling the runtime behavior was presented. It was thus demonstrated that the presented proactive, plan-based approach to the management of HPC systems meets and exceeds the requirements of a Grid operating system.