TL;DR: State Grid Economic and Technological Research Institute develops a distribution network simulation platform and flexible interconnection technology system to address load balancing and capacity issues in distribution networks, achieving over 99% load balancing and 40% power supply capacity increase in a demonstration zone.
Abstract: With the high proportion of new source loads such as distributed power sources, electric vehicles, and energy storage, as well as the rapid development of new models and formats such as microgrids, integrated energy, and virtual power plants, the distribution network is gradually transforming from a power network that simply receives and distributes electricity to users to a power network that integrates source grid load storage interaction and flexibly couples with the higher-level power grid. In recent years, State Grid Economic and Technological Research Institute has focused on original technology research and development in the field of distribution networks, fully supporting the construction of original technology source areas. It has developed a distribution network simulation calculation and analysis platform with completely independent intellectual property rights, supporting the construction of new distribution systems in multiple regions. The development of new source loads may lead to local concentration and overall imbalance, resulting in significant differences in load rates between different regions and lines of the distribution network. This often leads to issues of insufficient or unusable equipment capacity, which is not conducive to improving the overall quality and efficiency of the distribution network. The State Grid Economic and Technological Research Institute, in collaboration with the United Nations Grid Tianjin Electric Power Company, has established a flexible interconnection technology system for distribution networks, developed the world's first set of corresponding flexible interconnection power electronic equipment, and implemented it in the Beichen National Industry City Integration Demonstration Zone in Tianjin. After the project is put into operation, the load balancing degree of the demonstration zone's power grid has reached over 99%, the power supply capacity of the power grid has increased by nearly 40%, and important industrial enterprises have achieved "zero power outages".
Abstract: The conclusions of the Joint ECFA-NuPECC-APPEC (JENA) Computing Workshop, held in Bologna on June 12–14, 2023, emphasized the growing need for strategic discussions on the implementation of European federated computing at future large-scale research facilities. The workshop identified five priority areas for assessment. This report focuses on one of them, summarizing discussions on the needs and recommendations for integrating High-Performance Computing (HPC) centers with the High-Throughput Computing (HTC) systems used to manage and analyze data from large-scale experiments like the Worldwide LHC Computing Grid (WLCG). This report, written by the HPC Working Group (WG), a group of experts on HPC integration from the three JENA research areas, has been presented at the 3rd Joint ECFA-NuPECC-ApPEC Symposium, held at the Harwell Campus, Didcot, Oxfordshire (UK) on April 8-11, 2025.
Abstract: This paper describes practical experiences from a project to couple Grid and SOA technologies in smaller environments. Web services have been applied in two structurally different case studies to solve tasks with a Grid that is integrated into a SOA and vice versa. The case studies have revealed important insight on how and when to couple SOA and Grid technologies including monitoring aspects. Some interesting general rules are derived on what has to be observed when combining SOA and Grid in smaller environments. Performance and software technical analysis have been used in validating the results. They also clearly showed the benefits gained by employing SOA and Grid concepts form both a performance as well as an architectural perspective.
TL;DR: A hybrid power grid combining biomass and solar energy is optimized using a monitoring system, mathematical modeling, and numerical methods, enhancing efficiency and decision-making capabilities for control and modification of the energy system.
Abstract: The work focuses on a hybrid energy system containing a biomass-based gas generator and solar panels. To control and optimize such a multi-component system, analysis, monitoring, and modeling tools are needed. A monitoring system is proposed for the hybrid microgrid, which collects information on various characteristics. This system allows data analysis, prediction of system behavior, and decision making for control or modification of the energy system. A mathematical model was proposed for the biomass utilization component of the microgrid, and an original numerical method was applied to obtain the input data of the model. This approach allows to increase the efficiency of power generation in this unit, which directly improves the efficiency of the microgrid as a whole. Thereby, the paper proposes methods and tools aimed at analyzing, modeling and management of a multicomponent hybrid energy system.
TL;DR: This thesis optimizes resource utilization and job execution in the ALICE Grid, a global computing infrastructure for the Large Hadron Collider, by proposing new approaches for resource management, job scheduling, and middleware improvements, resulting in significant efficiency and performance gains.
Abstract: (English) The Large Hadron Collider (LHC) ALICE (A Large Ion Collider Experiment) experiment uses grid computing for its extensive data processing and analysis. The ALICE Grid is composed of 48 sites distributed globally, which provide access to over 300,000 CPU cores. This diverse environment presents unique challenges as the computing nodes are very heterogeneous in terms of hardware, resource availability and management policies. This thesis focuses on optimising resource utilisation and job execution within the ALICE Grid in the context of the evolving multicore computing paradigm. The transition from single to multicore slots, combined with the increasing prevalence of multiprocess and multithreaded workflows, requires new resource management approaches. The thesis presents a black-box analysis of the multicore experiment software framework, tracing resource usage and system calls. Multiple sources of overhead were identified, particularly concerning the large amount of short-lived processes spawned by some workflows. To address this, the JAliEn monitoring system was extended and improved to accurately account for the resource utilisation of these short-lived processes. The observations led to modifications on the internal job workflow, resulting in a 47% reduction in the number of deployed processes and a 35% decrease in overall job execution time. For tailoring job requests to the specific characteristics of the executing systems, a model is proposed to estimate job execution times. This model leverages proportionality factors from the execution times on different Grid CPU models and uses them to dynamically scale job requests. To ensure the coherent and controlled utilisation of CPU resources, two approaches are proposed. The first uses CPU pinning and adapts the core selection to the processor architecture, optimising resource allocation for specific workloads. The second uses cgroups v2 sub-partitioning features to set boundaries on job CPU utilisation. The thesis made significant contributions to popular grid batch systems by enabling support for cgroups v2. This integration allowed JAliEn to become the first grid middleware to make use of this powerful resource management technology. When a slot is sub-partitioned to run multiple jobs in parallel, careful resource orchestration is crucial. This thesis presents a module within JAliEn that ensures equitable memory resource distribution among co-executing jobs. This module implements a targeted preemption of resource-intensive jobs to prevent slot overconsumption and ensure that jobs remain within their allocated memory limits. The thesis explores whole-node slot allocations in which JAliEn manages all the resources of a node. This novel scheduling model offers great flexibility and adaptability. To maximise resource usage in whole-node slots, CPU oversubscription was introduced to allow the execution of additional jobs when the running workload does not fully use the available CPU resources. To exploit whole-node allocations and maximise resource utilisation, the thesis proposes the extension of job brokering to consider not only CPU availability but also memory and disk space. Furthermore, the job definition syntax was equipped with new parameters for users to have greater control over resource requests. To sum up, this thesis presents a set of contributions that have substantially improved the efficiency and performance of grid computing within the ALICE experiment. The thesis addresses the challenges emerging from the evolving multicore environment by optimising resource utilisation and improving middleware reliability and observability. All these contributions introduced significant advances to the capabilities of the ALICE Grid, effectively enabling a more efficient data analysis for the LHC experiment. (Català) L'experiment ALICE («A Large Ion Collider Experiment») del Gran Col·lisionador d'Hadrons (LHC) utilitza computació en grid per al seu extens processament i anàlisi de dades. La Grid d'ALICE està composta per 48 centres de computació distribuïts arreu del planeta que proporcionen accés a més de 300.000 nuclis de CPU. Aquest entorn divers presenta reptes únics, ja que els nodes de càlcul són molt heterogenis en termes de maquinari, disponibilitat de recursos i protocols de gestió. Aquesta tesi es centra en optimitzar la utilització de recursos i l'execució de tasques dins la Grid d'ALICE en el context del paradigma emergent del càlcul multinucli. La transició de slots individuals a slots multinucli, combinada amb l'augment de la prevalença de fluxos de treball multiprocés i multifil, exigeix noves aproximacions en la gestió de recursos. La tesi presenta una anàlisi de caixa negra del marc de programari de l'experiment multinucli, rastrejant l'ús de recursos i les crides del sistema. S’han identificat múltiples fonts de sobrecàrrega, especialment en relació a la gran quantitat de processos curts generats per alguns fluxos de treball. Per abordar això, s’ha ampliat i millorat el sistema de monitoratge del middleware JAliEn, el middleware de la Grid d'ALICE. Les observacions han conduït a modificacions en el flux de treball intern, que s’han traduït en una reducció del 47% en el nombre de processos desplegats i en una disminució del 35% del temps total d'execució. Per ajustar les sol·licituds de treball a les característiques específiques dels sistemes d'execució, es proposa un model per estimar les durades de les tasques. Aquest model utilitza factors de proporcionalitat dels temps d'execució en diferents models de CPU de la Grid i els utilitza per escalar dinàmicament les sol·licituds de treball. Per assegurar una utilització coherent i controlada dels recursos de CPU, es proposen dues aproximacions. La primera utilitza la seleccio i assignació dels nuclis de CPU, adaptant la selecció de nuclis a l'arquitectura del processador, i optimitzant així l'assignació de recursos per a càrregues de treball específiques. La segona utilitza les funcions de subparticionament de cgroups v2 per establir límits en l'ús de CPU. La tesi fa contribucions significatives als sistemes habituals de lot de la Grid en permetre el suport per a cgroups v2. Aquesta tesi presenta un mòdul dins de JAliEn que assegura una distribució equitativa dels recursos de memòria entre els treballs que s'executen simultàniament. Aquest mòdul implementa una preempció dirigida de treballs amb un consum més elevat per prevenir la sobreconsumició del slot i per assegurar que els treballs es mantinguin dins dels límits de memòria assignats. La tesi explora les assignacions de slot de node sencer on JAliEn gestiona tots els recursos d'un node. Per maximitzar l'ús de recursos en slots de node sencer, s’ha introduït la sobresubscripció de CPU per permetre l'execució de treballs addicionals quan els recursos de CPU disponibles no estan sent totalment utilitzats. Per explotar les assignacions de node sencer i maximitzar l'ús de recursos, la tesi proposa l'extensió de les decisions d'assignació de treballs als nodes de computació en considerant no només la disponibilitat de CPU, sinó també la memòria i l'espai de disc. A més, la sintaxi de definició de treball s’ha equipat amb nous paràmetres per a que els usuaris puguin tenir un major control sobre les sol·licituds de recursos. En resum, aquesta tesi presenta un conjunt de contribucions que milloren substancialment l'eficiència i el rendiment del càlcul en Grid dins de l'experiment ALICE. La tesi aborda els reptes emergents de l'entorn multinucli en evolució, optimitzant l'ús de recursos i millorant la fiabilitat i l'observabilitat del middleware. Totes aquestes contribucions han introduït avenços significatius en les capacitats de la Grid d'ALICE, i permeten així una anàlisi de dades més eficient per a l'experiment LHC. (Español) El experimento ALICE (A Large Ion Collider Experiment) del Gran Colisionador de Hadrones (LHC) utiliza la computación en grid para su extenso procesamiento y análisis de datos. El Grid de ALICE se compone de 48 centros de computación distribuidos por todo el mundo, que proporcionan acceso a más de 300.000 núcleos de CPU. Este entorno tan diverso presenta desafíos únicos, ya que los nodos de computación son muy heterogéneos en cuanto a su hardware, disponibilidad de recursos y políticas de gestión. Esta tesis se centra en la optimización de la utilización de los recursos y la ejecución de trabajos en la Grid de ALICE en el contexto de la evolución del paradigma informático multinúcleo. La transición de slots mononúcleo a multinúcleo, combinada con la creciente prevalencia de flujos de trabajo multiproceso y multihilo, requiere de nuevos enfoques de gestión de recursos. La tesis presenta un análisis de caja negra del software multinúcleo del experimento, rastreando el uso de recursos y las llamadas al sistema. Se identificaron múltiples fuentes de sobrecarga, en particular en relación con la gran cantidad de procesos de corta duración creados por algunos flujos de trabajo. Para solucionar este problema, se amplió y mejoró el sistema de monitorización de JAliEn, el middleware de la ALICE Grid. Las observaciones llevaron a una reducción de la cantidad de procesos desplegados en un 47% y del tiempo total de ejecución en un 35%. Para adaptar las solicitudes de los trabajos a las características específicas de los sistemas de ejecución, se propone un modelo para estimar los tiempos de ejecución. Este modelo utiliza factores de proporcionalidad de los tiempos de ejecución entre diferentes modelos de CPU de la Grid para escalar dinámicamente los tiempos solicitados. Para garantizar una utilización coherente y controlada de los recursos de CPU, se proponen dos enfoques. El primero utiliza CPU pinning y adapta la selección de núcleos a la arquitectura del procesador, optimizando la asignación de recursos para cargas de trabajo específicas. El segundo utiliza las funciones de subdivisión de cgroups v2 para establecer límites en la utilización de la CPU de los trabajos. La tesis ha realizado contribuciones significativas a los sistemas de grid por lotes más populares al permitir la compatibilidad con cgroups v2. Esta tesis presenta un módulo dentro de JAliEn que garantiza una distribución equitativa de los recursos de memoria entre los trabajos que se ejecutan en paralelo. Implementa una preemción selectiva de los trabajos que consumen más recursos para evitar un consumo excesivo de los slots y garantizar así que los trabajos se mantengan dentro de los límites de memoria asignados. La tesis explora el uso de slots de nodo completo en los que JAliEn gestiona todos los recursos de un nodo. Para maximizar la utilización de los recursos en estos slots, se introdujo la sobresuscripción de CPU para permitir la ejecución de trabajos adicionales cuando la carga de trabajo en ejecución no utiliza por completo los recursos de CPU disponibles. La tesis propone la ampliación de la planificación de trabajos teniendo en cuenta no sólo la disponibilidad de CPU, sino también la de memoria y de espacio en disco. Además, se ha dotado a la sintaxis de definición de trabajos de nuevos parámetros para que los usuarios tengan un mayor control sobre los recursos solicitados. En resumen, esta tesis presenta un conjunto de contribuciones que han mejorado sustancialmente la eficiencia y el rendimiento de la computación grid dentro del experimento ALICE. La tesis aborda los retos derivados de la evolución del entorno multinúcleo optimizando la utilización de los recursos y mejorando la fiabilidad y observabilidad del middleware. Todas estas contribuciones introdujeron avances significativos en las capacidades de la Grid de ALICE, permitiendo efectivamente un análisis de datos más eficiente para el experimento LHC.
TL;DR: This research introduces an advanced methodology leveraging deep learning models for asset identification and automation in digital grid systems, achieving 98.43% accuracy and improved recall, precision, and F1-score, enhancing grid resilience and efficiency.
Abstract: In modern digital grid systems, ensuring the stability of power operations and the accuracy of energy metering is critically dependent on effective asset identification, reconfiguration, and automated maintenance strategies. Conventional grid monitoring mechanisms like manual inspections and fixed-sensor networks also experience many difficulties, such as high costs in operation, scalability problems, and inability to present real-time, all-around information. To address these challenges, this research introduces an advanced methodology that leverages deep learning models for asset identification and automation in digital grid systems. Specifically, the research utilizes Roach Infestation Optimized Attention-based Bidirectional Long Short-Term Memory (RI-Att-BiLSTM) networks for intelligent asset identification and topology recognition in low-voltage distribution network substations. Time-series data is collected from smart meters and sensors distributed across the low-voltage distribution network. These sensors continuously capture data on electricity consumption from substations and connected consumers. The collected data is preprocessed to remove noise, outliers, and gaps, followed by normalization to scale the data for effective modeling. The RI-Att-BiLSTM-based technique reduces the dimensionality of the electricity consumption data matrix, transforming the complex topology identification task into a solvable convex optimization problem. Experimental results demonstrate that, compared to traditional methods, the proposed model achieve a superior balance between accuracy (98.43%) and higher recall, precision, and F1-score. The enhanced grid resilience provided by this methodology facilitates precise asset identification, real-time monitoring, and fault-tolerant operations, thereby contributing to more efficient and reliable digital grid systems.
TL;DR: This paper proposes a hybrid cloud-grid architecture for high-performance computing, integrating grid and cloud services to address scalability, fault tolerance, and cost minimization in heterogeneous environments, achieving up to 35.7% task solving time reduction and 81.3% resource utilization improvement.
Abstract: The modern and rapidly growing number of computational problems in the scientific and industrial applications requires not only powerful systems but also adaptive, scalable and fault tolerant high performance computing architectures. This paper presents a new HCG architecture that integrates the merits of the conventional grid computing and the novel cloud services to overcome critical issues including, resource provisioning, scheduling and allocation, failure tolerant systems, and cost minimization in distributed environments. It features an intelligent middle tier to manage resources, cloud resource shedding mechanisms for scaling, and AI-based scheduling for workload distribution. Several simulation-based performance analysis results highlight the proposed hybrid model’s effectiveness in cutting down general task solving time up to 35.7%, improving the usage of resources by up to 81.3% on an average, bringing down recovery time and operational costs by approximately 37.7% than existing standalone systems. Moreover, it provides better energy efficiency and resource reliability based on a given workload level. The results obtained here indicate that the hybrid cloud-grid framework may be considered as a viable model for future HPC solutions that are scalable, reliable, and economically feasible for tackling the requirements posed by data-intensive or near-real time applications. Future works involve deploying the model in a real environment, incorporating the model in a multi-cloud and edge setting, and incorporating more sophisticated auto management techniques that would augment the performance and flexibility of the model.
TL;DR: This study compares energy efficiency in Green Cloud Computing and Grid Computing, evaluating operational models, energy consumption, and carbon emissions. Green cloud computing offers superior energy efficiency and lower emissions, with integrated energy-aware technologies.
Abstract: The rising demand for computational resources in the digital age has brought significant attention to the environmental impact of data centers and distributed computing infrastructures. This study presents a comparative analysis of energy efficiency in Green Cloud Computing and Grid Computing, focusing on two critical aspects: carbon footprint reduction and load balancing. While cloud computing has evolved with virtualization and dynamic resource allocation to optimize energy usage, grid computing offers decentralized resource sharing with its own efficiency strategies. The concept of “green” in cloud computing introduces energy-aware architectures and renewable energy integration, aiming to minimize ecological impacts. This paper evaluates the operational models, energy consumption patterns, and carbon emission profiles of both paradigms through theoretical modeling and empirical data. Load balancing mechanisms, which directly influence energy use and system performance, are also analyzed to determine their roles in optimizing resource utilization. The findings suggest that green cloud computing, with its integrated energy-aware technologies, generally offers superior energy efficiency and lower carbon emissions compared to traditional grid computing, though each paradigm presents unique strengths in specific use cases. This research contributes to the ongoing efforts in sustainable computing by highlighting key strategies for reducing environmental impact while maintaining high performance and reliability.
TL;DR: This paper proposes a cloud-based smart grid platform using a distributed deep reinforcement learning algorithm, improving power grid management efficiency by 20% and reducing power loss by 15%, with enhanced fault prediction accuracy and response speed.
Abstract: This paper proposes a smart grid platform system based on cloud computing, combined with a distributed deep reinforcement learning (DRL) optimization algorithm, to improve the efficiency of power grid management and optimize dispatching. The system architecture adopts a hierarchical design of data layer, service layer, interface layer, presentation layer, and access layer to ensure efficient data processing and system scalability. Through the distributed computing architecture, the algorithm fully utilizes the computing power of cloud computing to optimize power grid load dispatching and fault prediction. In the experimental part, the system verifies the effectiveness of the platform in power dispatching and fault prediction through simulation tests on a typical medium-sized power grid data set. In terms of load dispatching, the algorithm based on deep reinforcement learning improves dispatching efficiency by 20% and reduces power loss by 15% compared with traditional methods. In terms of fault prediction, the recall rate of the system model reaches 92%, and the accuracy rate is 89%, which is 7% higher than the traditional SVM algorithm. Additionally, the system response speed has been significantly optimized, and the response time after the state change of power grid equipment has been shortened by 25%.
Haishun Zhang, Dan Zhao, Xin Zhang, Rimin Li, Zhiqiang Zhang, Xuetao Yang
20 Jun 2025
TL;DR: This study proposes a multi-source heterogeneous power grid topology mapping method using the Power Grid Simulation Data Management Platform, addressing direct integration challenges through hierarchical decomposition, dynamic bus node mapping, and multi-source data compatibility.
Abstract: This study proposes a multi-source heterogeneous power grid topology mapping method based on the Power Grid Simulation Data Management Platform (PSDB), aiming to solve the problem that simulation models of the same power grid from different calculation software cannot be directly integrated due to modeling differences, i.e., the inability to achieve one-to-one mapping of equipment and electrical relationships. The key approaches include: hierarchical decomposition of grid topology: The topology is first divided into a two-level network (substation-line layer, in-substation equipment and bus node layer) to address the direct mapping challenges caused by device modeling granularity differences and conduct preliminary grid analysis; Dynamic bus node mapping: Taking substations as units, bipartite graph theory and set operations are used to handle the mapping difficulties of dynamic bus nodes through bus segmentation techniques and substation internal line combinations; Multi-source data compatibility: The equipment library manages device parameters, the mode library handles grid operating states, and the scheme library accommodates model conflicts, ensuring compatibility with discrepant data. This study for the first time solves the mapping challenges of dynamically changing bus nodes using bipartite graphs and set operations, and achieves lossless conversion of full-grid models by compatibly managing multi-source heterogeneous data differences through a relational database. The proposed method provides robust technical support for grid analysis based on simulation data, ubiquitous power internet of things, and digital twin systems.