Top 58 papers presented at Parallel Computing Technologies in 2017

Showing papers presented at "Parallel Computing Technologies in 2017"

Proceedings Article•10.1109/PARCOMPTECH.2017.8068329•

Identifying pitfalls in automatic parallelization of NAS parallel benchmarks

[...]

S. Prema, R. Jehadeesan, B. K. Panigrahi

1 Feb 2017

TL;DR: The need of a user-interactive environment that highlights the problems evoked during parallelization is underlines the obligation for minimal manual intervention concerning coding changes to resolve the problematic code section and make them amenable to parallelization.

...read moreread less

Abstract: This paper provides an examination of OpenMP based auto-parallelizers and their limitations encountered during parallelization of NAS parallel benchmarks. It also elucidates the issues faced by the parallelizers during parallelization and the resolutions to overcome the problems. Compute-intensive loops are pinpointed using Gprof and the problematic loops within the hotspot area were recognized. Our work concentrates on identifying the pitfalls within the located hotspots and rendering solution in such cases. Analysis on measured speedup and its reasons are well illustrated. This paper underlines the need of a user-interactive environment that highlights the problems evoked during parallelization. It also underscores the obligation for minimal manual intervention concerning coding changes to resolve the problematic code section and make them amenable to parallelization.

...read moreread less

21 citations

Journal Article•10.1007/S00224-016-9699-8•

Atomic Read/Write Memory in Signature-Free Byzantine Asynchronous Message-Passing Systems

[...]

Achour Mostefaoui¹, Matoula Petrolia¹, Michel Raynal², Claude Jard¹•Institutions (2)

University of Nantes¹, Institut Universitaire de France²

1 May 2017

TL;DR: This article presents a signature-free distributed algorithm which builds an atomic read/write shared memory on top of a fully connected peer-to-peer n-process asynchronous message-passing system in which up to t

...read moreread less

Abstract: This article presents a signature-free distributed algorithm which builds an atomic read/write shared memory on top of a fully connected peer-to-peer n-process asynchronous message-passing system in which up to t

...read moreread less

16 citations

Book Chapter•10.1007/978-3-319-62932-2_27•

Automation Development Framework of Scalable Scientific Web Applications Based on Subject Domain Knowledge

[...]

Igor Bychkov¹, G. A. Oparin¹, Vera G. Bogdanova¹, A. A. Pashinin¹, S. A. Gorsky¹ - Show less +1 more•Institutions (1)

Russian Academy of Sciences¹

4 Sep 2017

TL;DR: An architecture and functional capabilities of automated toolkit for the service-oriented application creation based on applied programs package, and multi-agent control of this application parallel running in HDCE are described.

...read moreread less

Abstract: Currently high-performance computing technologies using computational capabilities for solving scientific, are actively improving. The purpose of our research is the development of toolkit for construction and execution of scientific service-oriented application in heterogeneous distributed computing environment (HDCE). These tools provide the access for subject domain experts to the high-capacity computing resource, using these resources without extensive knowledge of computing architecture and low-level software, and the parallel execution of the user application on the base of the service-oriented technology and multi-agent control. We describe an architecture and functional capabilities of automated toolkit for the service-oriented application creation based on applied programs package, and multi-agent control of this application parallel running in HDCE. We demonstrate an example of the creation of the web-application for parametric feedback synthesis of linear dynamic object by these tools. The offered technology allows simplifying service creation and provides new qualitative opportunities of controlling parallel high-performance computations.

...read moreread less

10 citations

Book Chapter•10.1007/978-3-319-62932-2_24•

Combining Parallelization with Overlaps and Optimization of Cache Memory Usage

[...]

S. G. Ammaev¹, L. R. Gervich¹, Boris Ya. Steinberg¹•Institutions (1)

Southern Federal University¹

4 Sep 2017

TL;DR: Gauss-Seidel algorithm optimized by modified hyperplane method is faster than non-optimized in 2.5 times and this algorithm was paralleled by the technique of data placement with overlaps and got the speedup in 28 times on 16 processors in comparison with the non- Optimized sequential algorithm.

...read moreread less

Abstract: This paper allows L. Lamport hyperplane method modified for improvement of the temporal data locality. Gauss-Seidel algorithm optimized by modified hyperplane method is faster than non-optimized in 2.5 times. This algorithm was paralleled by the technique of data placement with overlaps and we have got the speedup in 28 times on 16 processors in comparison with the non-optimized sequential algorithm.

...read moreread less

9 citations

Book Chapter•10.1007/978-3-319-62932-2_31•

Probabilistic Causal Message Ordering

[...]

Achour Mostefaoui¹, Stéphane Weiss¹•Institutions (1)

University of Nantes¹

4 Sep 2017

TL;DR: A probabilistic but efficient causal broadcast mechanism for large systems with changing membership that uses few integer timestamps is proposed.

...read moreread less

Abstract: Causal broadcast is a classical communication primitive that has been studied for more then three decades and several implementations have been proposed. The implementation of such a primitive has a non negligible cost either in terms of extra information messages have to carry or in time delays needed for the delivery of messages. It has been proved that messages need to carry a control information the size of which is linear with the size of the system. This problem has gained more interest due to new application domains such that collaborative applications are widely used and are becoming massive and social semantic web and linked-data the implementation of which needs causal ordering of messages. This paper proposes a probabilistic but efficient causal broadcast mechanism for large systems with changing membership that uses few integer timestamps.

...read moreread less

8 citations

Book Chapter•10.1007/978-3-319-62932-2_42•

Scalable Computations of GeRa Code on the Base of Software Platform INMOST

[...]

Igor N. Konshin¹, Ivan Kapyrin¹•Institutions (1)

Russian Academy of Sciences¹

4 Sep 2017

TL;DR: The analysis of scalability of GeRa code on different computer platforms from multicore laptop to Lomonosov supercomputer is presented and the comparison of parallel efficiency for different linear solvers in the INMOST framework is performed.

...read moreread less

Abstract: The hydrogeological modeling code GeRa is based on INMOST software platform, which operates with distributed mesh data and allows to assemble and solve the system of linear equations. The set of groundwater flow models with filtration, transport, and chemical processes are considered. The comparison of parallel efficiency for different linear solvers in the INMOST framework is performed. The analysis of scalability of GeRa code on different computer platforms from multicore laptop to Lomonosov supercomputer is presented.

...read moreread less

8 citations

Book Chapter•10.1007/978-3-319-62932-2_2•

Generating Maximal Domino Patterns by Cellular Automata Agents

[...]

Rolf Hoffmann¹, Dominique Désérable²•Institutions (2)

Technische Universität Darmstadt¹, Institut national des sciences appliquées²

4 Sep 2017

TL;DR: Considered is a 2D cellular automaton with moving agents that aims to find agents controlled by a Finite State Program (FSP) that can form domino patterns.

...read moreread less

Abstract: Considered is a 2D cellular automaton with moving agents. The objective is to find agents controlled by a Finite State Program (FSP) that can form domino patterns. The quality of a formed pattern is measured by the degree of order computed by counting matching \(3 \times 3\) patterns (templates). The class of domino patterns is defined by four templates. An agent reacts on its own color, the color in front, and whether it is blocked or not. It can change the color, move or not, and turn into any direction. Four FSP were evolved for multi-agent systems with 1, 2, 4 agents initially placed in the corners of the field. For a \(12 \times 12\) training field the aimed pattern could be formed with a 100% degree of order. The performance was also high with other field sizes. Livelocks are avoided by using three different variants of the evolved FSP. The degree of order usually fluctuates after reaching a certain threshold, but it can also be stable, and the agents may show the termination by running in a cycle, or by stopping their activity.

...read moreread less

8 citations

Book Chapter•10.1007/978-3-319-62932-2_32•

An Experimental Study of Workflow Scheduling Algorithms for Heterogeneous Systems

[...]

Alexey Nazarenko¹, Oleg V. Sukhoroslov¹•Institutions (1)

Russian Academy of Sciences¹

4 Sep 2017

TL;DR: The accuracy of the used network model helped to reveal drawbacks of simpler models commonly used for studying scheduling algorithms and developed open source simulation framework based on SimGrid toolkit allowed us to perform a large number of experiments in a reasonable amount of time and to ensure reproducible results.

...read moreread less

Abstract: The paper studies the efficiency of nine state-of-the-art algorithms for scheduling of workflow applications in heterogeneous computing systems (HCS). The comparison of algorithms is performed on the base of discrete-event simulation for a wide range of workflow and system configurations. The developed open source simulation framework based on SimGrid toolkit allowed us to perform a large number of experiments in a reasonable amount of time and to ensure reproducible results. The accuracy of the used network model helped to reveal drawbacks of simpler models commonly used for studying scheduling algorithms.

...read moreread less

7 citations

Book Chapter•10.1007/978-3-319-62932-2_45•

Parallel Calculation of Diameter Constrained Network Reliability

[...]

Sergei N. Nesterov, Denis A. Migov

4 Sep 2017

TL;DR: The analysis of the numerical experiments has allowed us to set some important parameters of the parallel algorithm for speeding up calculations, which are based on the well-known factoring method and on the factoring methods modification proposed by H. Cancela and L. Petingi.

...read moreread less

Abstract: The problem of network reliability calculation in case of the diameter constraint is studied. The problem of computing this characteristic is known to be NP-hard. We introduce the parallel methods, which are based on the well-known factoring method and on the factoring method modification proposed by H. Cancela and L. Petingi. The analysis of the numerical experiments has allowed us to set some important parameters of the parallel algorithm for speeding up calculations.

...read moreread less

6 citations

Book Chapter•10.1007/978-3-319-62932-2_37•

Comparison of Auction Methods for Job Scheduling with Absolute Priorities

[...]

A. V. Baranov¹, Pavel Telegin¹, Artem Tikhomirov¹•Institutions (1)

Russian Academy of Sciences¹

4 Sep 2017

TL;DR: The model of geographically distributed computing system with absolute priorities of jobs is described in the paper and the decentralized scheduling algorithm using the auction methods is designed using the first-price sealed-bid auction and the English auction.

...read moreread less

Abstract: The model of geographically distributed computing system with absolute priorities of jobs is described in the paper. Authors designed the decentralized scheduling algorithm using the auction methods. Two auction methods were researched and compared: the first-price sealed-bid auction and the English auction. The paper includes results of experimental comparison of researched auction methods.

...read moreread less

6 citations

Book Chapter•10.1007/978-3-319-62932-2_47•

Globalizer – A Parallel Software System for Solving Global Optimization Problems

[...]

Alexander Sysoyev¹, Konstantin Barkalov¹, Vladislav Sovrasov¹, Ilya Lebedev¹, Victor Gergel¹ - Show less +1 more•Institutions (1)

N. I. Lobachevsky State University of Nizhny Novgorod¹

4 Sep 2017

TL;DR: The Globalizer software system is described, which implements an approach to solving the global optimization problems using the block multistage scheme of the dimension reduction, which combines the use of Peano curve type evolvents and the multistages reduction scheme.

...read moreread less

Abstract: In this paper, we describe the Globalizer software system for solving global optimization problems. The system implements an approach to solving the global optimization problems using the block multistage scheme of the dimension reduction, which combines the use of Peano curve type evolvents and the multistage reduction scheme. The scheme allows an efficient parallelization of the computations and increasing the number of processors employed in the parallel solving of the global optimization problems many times.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_18•

Multiple-Precision Residue-Based Arithmetic Library for Parallel CPU-GPU Architectures: Data Types and Features

[...]

Konstantin Isupov, Alexander Kuvaev, Mikhail Popov, Anton Zaviyalov

4 Sep 2017

TL;DR: A new software library for multiple-precision (integer and floating-point) and extended-range computations is considered, targeted at heterogeneous CPU-GPU architectures and the use of residue number system (RNS) lies in the basis of library multiple- Precision modules.

...read moreread less

Abstract: In this paper a new software library for multiple-precision (integer and floating-point) and extended-range computations is considered. The library is targeted at heterogeneous CPU-GPU architectures. The use of residue number system (RNS), enabling effective parallelization of arithmetic operations, lies in the basis of library multiple-precision modules. The paper deals with the supported number formats and the library features. An algorithm for the selection of an RNS moduli set for a given precision of computations are also presented.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_13•

Auto-Vectorization of Loops on Intel 64 and Intel Xeon Phi: Analysis and Evaluation

[...]

Olga V. Moldovanova¹, Mikhail G. Kurnosov¹•Institutions (1)

Russian Academy of Sciences¹

4 Sep 2017

TL;DR: This work estimates speedup by running the loops in scalar and vector modes for different data types and determine loop classes which the compilers used in the study fail to vectorize.

...read moreread less

Abstract: This paper evaluates auto-vectorizing capabilities of modern optimizing compilers Intel C/C++, GCC C/C++, LLVM/Clang and PGI C/C++ on Intel 64 and Intel Xeon Phi architectures. We use the Extended Test Suite for Vectorizing Compilers consisting of 151 loops. In this work, we estimate speedup by running the loops in scalar and vector modes for different data types and determine loop classes which the compilers used in the study fail to vectorize. We use the dual CPU system (NUMA, 2 x Intel Xeon E5-2620v4, Intel Broadwell microarchitecture) with the Intel Xeon Phi 3120A co-processor for our experiments.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_11•

The DiamondTetris Algorithm for Maximum Performance Vectorized Stencil Computation

[...]

Vadim D. Levchenko¹, Anastasia Y. Perepelkina¹•Institutions (1)

Keldysh Institute of Applied Mathematics¹

4 Sep 2017

TL;DR: An algorithm from the LRnLA family, DiamondTetris, for stencil computation is constructed aimed for Many-Integrated-Core processors of the Xeon Phi family and its strong points are locality, efficient use of memory hierarchy, and, most importantly, seamless vectorization.

...read moreread less

Abstract: An algorithm from the LRnLA family, DiamondTetris, for stencil computation is constructed. It is aimed for Many-Integrated-Core processors of the Xeon Phi family. The algorithm and its implementation is described for the wave equation based simulation. Its strong points are locality, efficient use of memory hierarchy, and, most importantly, seamless vectorization. Specifically, only 1 vector rearrange operation is necessary per cell value update. The performance is estimated with the roofline model. The algorithm is implemented in code and tested on Xeon and Xeon Phi machines.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_38•

Parallel Algorithm for Solving Constrained Global Optimization Problems

[...]

Konstantin Barkalov¹, Ilya Lebedev¹•Institutions (1)

N. I. Lobachevsky State University of Nizhny Novgorod¹

4 Sep 2017

TL;DR: An experimental assessment of parallel algorithm efficiency was conducted by finding the numeric solution to several hundred randomly generated multidimensional multiextremal problems with non-convex constraints.

...read moreread less

Abstract: This work considers a parallel algorithm for solving multiextremal problems with non-convex constraints. The distinctive feature of this algorithm, which does not use penalty functions, is the separate consideration of each problem constraint. The search process can be conducted by reducing the original multidimensional problem to a number of related one-dimensional problems and solving this set of problems in parallel. An experimental assessment of parallel algorithm efficiency was conducted by finding the numeric solution to several hundred randomly generated multidimensional multiextremal problems with non-convex constraints.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_43•

Parallel Computing for Time-Consuming Multicriterial Optimization Problems

[...]

Victor Gergel¹, Evgeny Kozinov¹•Institutions (1)

N. I. Lobachevsky State University of Nizhny Novgorod¹

4 Sep 2017

TL;DR: An efficient method for parallel solving the time-consuming multicriterial optimization problems, where the optimality criteria can be multiextremal, and the computation of the criteria values can require a large amount of computations is proposed.

...read moreread less

Abstract: In the present paper, an efficient method for parallel solving the time-consuming multicriterial optimization problems, where the optimality criteria can be multiextremal, and the computation of the criteria values can require a large amount of computations, is proposed. The proposed scheme of parallel computations allows obtaining several efficient decisions of a multicriterial problem. During performing the computations, the maximum use of the search information is provided. The results of the numerical experiments have demonstrated such an approach to allow reducing the computational costs of solving the multicriterial optimization problems essentially – several tens and hundred times.

...read moreread less

Proceedings Article•

A Probabilistic Causal Message Ordering Mechanism

[...]

Achour Mostefaoui, Stéphane Weiss

23 May 2017

TL;DR: This paper proposes a probabilistic but efficient causal broadcast mechanism for large systems with changing membership that uses few integer timestamps.

...read moreread less

Abstract: Causal broadcast is a classical communication primitive that has been studied for more then three decades and several implementations have been proposed The implementation of such a primitive has a non negligible cost either in terms of extra information messages have to carry or in time delays needed for the delivery of messages It has been proved that messages need to carry a control information the size of which is linear with the size of the system This problem has gained more interest due to new application domains such that collaborative applications are widely used and are becoming massive and social semantic web and linked-data the implementation of which needs causal ordering of messagesThis paper proposes a probabilistic but efficient causal broadcast mechanism for large systems with changing membership that uses few integer timestamps

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_34•

Islands-of-Cores Approach for Harnessing SMP/NUMA Architectures in Heterogeneous Stencil Computations

[...]

Lukasz Szustak¹, Roman Wyrzykowski¹, Ondřej Jakl²•Institutions (2)

Częstochowa University of Technology¹, Academy of Sciences of the Czech Republic²

4 Sep 2017

TL;DR: This paper faces the challenge of harnessing the heterogeneous nature of SMP/NUMA communications for a complex scientific application which implements the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA), consisting of a set of heterogeneous stencil computations.

...read moreread less

Abstract: SMP/NUMA systems are powerful HPC platforms which could be applied for a wide range of real-life applications. These systems provide large capacity of shared memory, and allow using the shared-variable programming model to take advantages of shared memory for inter-process communications and synchronizations. However, as data can be physically dispersed over many nodes, the access to various data items may require significantly different times. In this paper, we face the challenge of harnessing the heterogeneous nature of SMP/NUMA communications for a complex scientific application which implements the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA), consisting of a set of heterogeneous stencil computations.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_16•

Predictive Modeling of Suffocation in Shallow Waters on a Multiprocessor Computer System

[...]

A.I. Sukhinov¹, Alla V. Nikitina¹, A. E. Chistyakov¹, Vladimir Sumbaev¹, Maksim Abramov¹, Alena Semenyakina² - Show less +2 more•Institutions (2)

Institute of Service and Entrepreneurship of DGTU¹, Southern Federal University²

4 Sep 2017

TL;DR: The model of the algal bloom, causing suffocations in shallow waters takes into account the transport of water environment; microturbulent diffusion; gravitational sedimentation of pollutants and plankton; nonlinear interaction of plankton populations; biogenic, temperature and oxygen regimes; influence of salinity.

...read moreread less

Abstract: The model of the algal bloom, causing suffocations in shallow waters takes into account the follows: the transport of water environment; microturbulent diffusion; gravitational sedimentation of pollutants and plankton; nonlinear interaction of plankton populations; biogenic, temperature and oxygen regimes; influence of salinity. The computational accuracy is significantly increased and computational time is decreased at using schemes of high order of accuracy for discretization of the model. The practical significance is the software implementation of the proposed model, the limits and prospects of it practical use are defined. Experimental software was developed based on multiprocessor computer system and intended for mathematical modeling of possible progress scenarios of shallow waters ecosystems on the example of the Azov Sea in the case of suffocation. We used decomposition methods of grid domains in parallel implementation for computationally laborious convection-diffusion problems, taking into account the architecture and parameters of multiprocessor computer system. The advantage of the developed software is also the use of hydrodynamical model including the motion equations in the three coordinate directions.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_39•

Parallelizing Metaheuristics for Optimal Design of Multiproduct Batch Plants on GPU

[...]

Andrey Borisenko¹, Sergei Gorlatch²•Institutions (2)

Tambov State Technical University¹, University of Münster²

4 Sep 2017

TL;DR: The results of the hybrid metaheuristics approach (ACO+SA) are very near to the global optimal solutions, but they are produced much faster than using the deterministic Branch-and-Bound approach.

...read moreread less

Abstract: We propose a metaheuristics-based approach to the optimal design of multi-product batch plants, with a particular application example of chemical-engineering systems. Our hybrid approach combines two metaheuristics: Ant Colony Optimization (ACO) and Simulated Annealing (SA). We develop a sequential implementation of the proposed method and we parallelize it on Graphics Processing Units (GPU) using the CUDA programming environment. We experimentally demonstrate that the results of our hybrid metaheuristic approach (ACO+SA) are very near to the global optimal solutions, but they are produced much faster than using the deterministic Branch-and-Bound approach.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_40•

The Optimization of Traffic Management for Cloud Application and Services in the Virtual Data Center

[...]

Irina Bolodurina¹, Denis Parfenov¹•Institutions (1)

Orenburg State University¹

4 Sep 2017

TL;DR: A simulation model for the traffic in software-defined networks segments of virtual data centers involved in processing user requests to cloud application and services within a network environment is developed and enables to implement the traffic management algorithm of cloud applications and optimize the access to storage systems through the effective use of data transmission channels.

...read moreread less

Abstract: Nowadays one of the problems of optimization is the control of the traffic in cloud applications and services in the network environment of virtual data center. Taking into account the multitier architecture of modern data centers, we need to pay a special attention to this task. The advantage of modern infrastructure virtualization is the possibility to use software-defined networks and software-defined data storages. However, the existing optimization of algorithmic solutions does not take into account the specific features of the heterogeneous network traffic routing with multiple application types. The task of optimizing traffic distribution for cloud applications and services can be solved by using software-defined infrastructure of virtual data centers. We have developed a simulation model for the traffic in software-defined networks segments of virtual data centers involved in processing user requests to cloud application and services within a network environment. Our model enables to implement the traffic management algorithm of cloud applications and optimize the access to storage systems through the effective use of data transmission channels. During the experimental studies, we have found that the use of our algorithm enables to decrease the response time of cloud applications and services and, therefore, increase the productivity of user requests processing and reduce the number of refusals.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_33•

PGAS Approach to Implement Mapreduce Framework Based on UPC Language

[...]

Shomanov Aday¹, Akhmed-Zaki Darkhan¹, Mansurova Madina¹•Institutions (1)

Al-Farabi University¹

4 Sep 2017

TL;DR: Over the years from its introduction Mapreduce technology proved to be very effective parallel programming technique to process large volumes of data.

...read moreread less

Abstract: Over the years from its introduction Mapreduce technology proved to be very effective parallel programming technique to process large volumes of data. One of the most prevalent implementations of Mapreduce is Hadoop framework and Google proprietary Mapreduce system.

...read moreread less

Proceedings Article•10.1109/PARCOMPTECH.2017.8068335•

Accelerated spam filtering with enhanced KMP algorithm on GPU

[...]

Venkata Krishna Pavan Kalubandi¹, M. Varalakshmi¹•Institutions (1)

VIT University¹

1 Feb 2017

TL;DR: An accelerated spam filtering mechanism that uses GPUs is presented that utilizes an enhanced version of Knuth Morris Pratt pattern matching algorithm that outperforms the serial versions up to 12x and also performs more efficiently compared to other parallel versions.

...read moreread less

Abstract: Spam filtering is one of the most important applications in email services that has become increasingly sophisticated due to the enormous usage of Internet. Traditionally, spam filters have been implemented on the CPU with a pattern matching algorithm. In this paper, an accelerated spam filtering mechanism that uses GPUs is presented. The filtering process utilizes an enhanced version of Knuth Morris Pratt pattern matching algorithm that outperforms the serial versions up to 12x and also performs more efficiently compared to other parallel versions. The parallel algorithm is to develop and advanced keyword based Naive Bayesian classifier speeds up the spam filtering up to 2 times compared to CPU.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_3•

Automated Parallelization of a Simulation Method of Elastic Wave Propagation in Media with Complex 3D Geometry Surface on High-Performance Heterogeneous Clusters

[...]

Nikita Andreevich Kataev¹, Alexander Sergeevich Kolganov², Alexander Sergeevich Kolganov¹, Pavel Titov•Institutions (2)

Keldysh Institute of Applied Mathematics¹, Moscow State University²

4 Sep 2017

TL;DR: Application of DVM and SAPFOR is considered in order to automate mapping of 3D elastic waves simulation method on high-performance heterogeneous clusters and efficiency and acceleration of the parallel program are estimated and performance of the DVMH based program is compared with a program obtained after manual parallelization using MPI programming technology.

...read moreread less

Abstract: The paper considers application of DVM and SAPFOR in order to automate mapping of 3D elastic waves simulation method on high-performance heterogeneous clusters. A distinctive feature of the proposed method is the use of a curved three-dimensional grid, which is consistent with the geometry of free surface. Usage of curved grids considerably complicates both manual and automated parallelization. Technique to map curved grid on a structured grid has been presented to solve this problem. The sequential program based on the finite difference method on a structured grid, has been parallelized using Fortran-DVMH language. Application of SAPFOR analysis tools simplified this parallelization process. Features of automated parallelization are described. Authors estimate efficiency and acceleration of the parallel program and compare performance of the DVMH based program with a program obtained after manual parallelization using MPI programming technology.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_15•

Software Implementation of Mathematical Model of Thermodynamic Processes in a Steam Turbine on High-Performance System

[...]

A.I. Sukhinov¹, A. E. Chistyakov¹, Alla V. Nikitina¹, Irina Yakovenko², Vladimir Parshukov, Nikolay Efimov, Vadim Kopitsa, Dmitriy Stepovoy - Show less +4 more•Institutions (2)

Institute of Service and Entrepreneurship of DGTU¹, Southern Federal University²

4 Sep 2017

TL;DR: The developed model takes into account the complex geometry of the steam turbine, does not require the significant changes in the processing of the design features and can be used to calculate the thermal processes other construction such as turbines.

...read moreread less

Abstract: The aim of this paper is the development of the mathematical model of thermal processes in steam turbine based on the modern information technologies and computational methods, with help of which the accuracy of calculations of thermal modes. The practical significance of the paper are: the model of thermal processes in steam turbine is proposed and implemented, the information about the temperature modes of the steam turbine is derived, limits and prospects of the proposed mathematical model is defined. The thermal processes in the turbine are characterized by a strong non-uniformity of the heat flow, which has significantly influence to the reliability and efficiency of the facility. As a rule, it the influence of these parameters on the geometry is not considered in the designing of the system that results in premature wear of the machine. The developed model takes into account the complex geometry of the steam turbine, does not require the significant changes in the processing of the design features and can be used to calculate the thermal processes other construction such as turbines. Software solution was developed for two-dimensional simulation of thermal processes in steam turbine that takes into account the occupancy control volumes.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_19•

Parallel Implementation of Cellular Automaton Model of the Carbon Corrosion Under the Influence of the Electrochemical Oxidation.

[...]

Anastasiya E. Kireeva, Karl K. Sabelfeld¹, N. V. Maltseva, E. N. Gribov¹•Institutions (1)

Novosibirsk State University¹

4 Sep 2017

TL;DR: A cellular automaton model of electrochemical oxidation of the carbon using a two-dimensional sample of the electro-conductive carbon black “Ketjenblack ES DJ 600” and efficiency of the parallel code is analyzed.

...read moreread less

Abstract: In the paper we present a cellular automaton model of electrochemical oxidation of the carbon. A two-dimensional sample of the electro-conductive carbon black “Ketjenblack ES DJ 600” is simulated. In the model the sample consists of a ring-formed granules of carbon. The carbon granules under the influence of the electrochemical oxidation are destroyed through a few successive stages. The rates of these oxidation stages are chosen to fit the simulation result with the experiment. In result of a computer simulation of carbon electrochemical oxidation the portions of surface atoms and atoms with different degree of oxidation were calculated and compared with the experimental data. In addition, a parallel implementation of the cellular automaton simulating the carbon corrosion is developed and efficiency of the parallel code is analyzed.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_20•

A Fine-Grained Parallel Particle Swarm Optimization on Many-core and Multi-core Architectures

[...]

Nadia Nedjah¹, Rogério de Moraes Calazan², Luiza de Macedo Mourelle¹•Institutions (2)

Rio de Janeiro State University¹, Brazilian Navy²

4 Sep 2017

TL;DR: A fine-grained paralellization strategy that focuses on the work done w.r.t. each of the problem dimensions and does it in parallel, which is useful in computationally demanding optimization problems wherein the objective function has a very large number of dimensions.

...read moreread less

Abstract: Particle Swarm Optimization (PSO) is a stochastic metaheuristics yet very robust. Real-world optimizations require a high computational effort to converge to a viable solution. In general, parallel PSO implementations provide good performance, but this depends on the parallelization strategy as well as the number and/or characteristics of the exploited processors. In this paper, we propose a fine-grained paralellization strategy that focuses on the work done w.r.t. each of the problem dimensions and does it in parallel. Moreover, all particles act in parallel. This strategy is useful in computationally demanding optimization problems wherein the objective function has a very large number of dimensions. We map the computation onto three different parallel high-performance multiprocessor architectures, which are based on many and multi-core architectures. The performance of the proposed strategy is evaluated for four well-known benchmarks with high-dimension and different complexity. The obtained speedups are very promising.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_7•

Fragmentation of IADE method using LuNA system

[...]

Norma Alias¹, Sergey Kireev•Institutions (1)

Universiti Teknologi Malaysia¹

4 Sep 2017

TL;DR: A performance comparison of different algorithm’s implementations including LuNA and Message Passing Interface are given and a fragmented numerical algorithm of IADE method is designed in terms of the data-flow graph.

...read moreread less

Abstract: The fragmented programming system LuNA is based on the Fragmented Programming Technology. LuNA is a platform for building automatically tunable portable libraries of parallel numerical subroutines. This paper focuses on the parallel implementation of the IADE method for solving 1D partial differential equation (PDE) of parabolic type using LuNA programming system. A fragmented numerical algorithm of IADE method is designed in terms of the data-flow graph. A performance comparison of different algorithm’s implementations including LuNA and Message Passing Interface are given.

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_1•

Experimenting with a Context-Aware Language

[...]

Chiara Bodei¹, Pierpaolo Degano¹, Gian Luigi Ferrari¹, Letterio Galletta¹•Institutions (1)

University of Pisa¹

4 Sep 2017

TL;DR: It will be shown how applications and context interactions can be better specified, analysed and controlled, with the help of some experiments done with a preliminary implementation of \(\text {ML}_\text {CoDa}\).

...read moreread less

Abstract: Contextual information plays an increasingly crucial role in concurrent applications in the times of mobility and pervasiveness of computing. Context-Oriented Programming languages explicitly treat this kind of information. They provide primitive constructs to adapt the behaviour of a program, depending on the evolution of its operational environment, which is affected by other programs hosted therein independently and unpredictably. We discuss these issues and the challenges they pose, reporting on our recent work on \(\text {ML}_\text {CoDa}\), a language specifically designed for adaptation and equipped with a clear formal semantics and analysis tools. We will show how applications and context interactions can be better specified, analysed and controlled, with the help of some experiments done with a preliminary implementation of \(\text {ML}_\text {CoDa}\).

...read moreread less

Book Chapter•10.1007/978-3-319-62932-2_48•

A novel string representation and kernel function for the comparison of I/O access patterns

[...]

Raúl Botero Torres¹, Julian M. Kunkel¹, Manuel F. Dolz¹, Thomas Ludwig¹•Institutions (1)

University of Hamburg¹

4 Sep 2017

TL;DR: A conversion to a weighted string representation is proposed in this paper, together with a novel string kernel function called Kast Spectrum Kernel, which can be promisingly applied to other similarity problems involving tree-like structured data.

...read moreread less

Abstract: Parallel I/O access patterns act as fingerprints of a parallel program. In order to extract meaningful information from these patterns, they have to be represented appropriately. Due to the fact that string objects can be easily compared using Kernel Methods, a conversion to a weighted string representation is proposed in this paper, together with a novel string kernel function called Kast Spectrum Kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using Kernel Principal Component Analysis (Kernel PCA) and Hierarchical Clustering. The evaluation showed that 2 out of 4 I/O access pattern groups were completely identified, while the other 2 conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data.

...read moreread less