TL;DR: In this article, the authors present a comprehensive analysis of SIMD-processing features and computational characteristics of three high performance architectures: two NVIDIA GPU architectures (of Pascal and Volta generations) and NEC SX-Aurora TSUBASA vector processor.
Abstract: This paper presents comprehensive analysis of main SIMD-processing features and computational characteristics of three high performance architectures: two NVIDIA GPU architectures (of Pascal and Volta generations) and NEC SX-Aurora TSUBASA vector processor. Since both these types of architectures strongly rely on using SIMD-processing features, certain similarities of data-processing principles can be found between them. However, despite having vectorised data-processing included in both NVIDIA GPU and NEC SX-Aurora TSUBASA architectures, vectorisation features of both architectures are implemented in completely different ways. These differences lead to several fundamental restrictions on classes of algorithms which can be efficiently implemented on corresponding platforms. This paper is devoted to the research of the possibility of porting various classes of programs and algorithms among the discussed architectures with a focus on utilising all vectorisation features available. However, without a detailed analysis of similar and different SIMD-processing features in these architectures, it is impossible to approach this problem. The performed analysis allowed us to identify several important examples of typical applications and algorithms. Some of them demonstrated comparable and the others showed different efficiency on NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors, including reduction operations, programs relying on frequent indirect memory accesses and data-transfers through co-processor interconnect. Moreover, the conducted analysis allows to easily extend this set of examples to approach the problem of automated porting of programs between the reviewed architectures, what we consider as an important direction of our future research.
TL;DR: This work proposes here HaraliCU, an efficient strategy for the computation of the GLCM and the extraction of an exhaustive set of the Haralick features, the most common and clinically relevant descriptors, and highlights the promising capabilities of GPUs in the clinical research.
Abstract: Image texture extraction and analysis are fundamental steps in Computer Vision. In particular, considering the biomedical field, quantitative imaging methods are increasingly gaining importance since they convey scientifically and clinically relevant information for prediction, prognosis, and treatment response assessment. In this context, radiomic approaches are fostering large-scale studies that can have a significant impact in the clinical practice. In this work, we focus on Haralick features, the most common and clinically relevant descriptors. These features are based on the Gray-Level Co-occurrence Matrix (GLCM), whose computation is considerably intensive on images characterized by a high bit-depth (e.g., 16 bits), as in the case of medical images that convey detailed visual information. We propose here HaraliCU, an efficient strategy for the computation of the GLCM and the extraction of an exhaustive set of the Haralick features. HaraliCU was conceived to exploit the parallel computation capabilities of modern Graphics Processing Units (GPUs), allowing us to achieve up to \(\sim \!20\times \) speed-up with respect to the corresponding C++ coded sequential version. Our GPU-powered solution highlights the promising capabilities of GPUs in the clinical research.
TL;DR: A replica selection is proposed focusing on popular files and affinity files to improve data availability in distributed data replica selection strategy and provides a proof that the proposed affinity replica selection has contributed towards a new dimension of replica selection scheme that incorporates the affinity and popularity of file replicas in distributed systems.
Abstract: Replication is one of the key techniques used in distributed systems to improve high data availability, data access performance and data reliability. To optimize the maximum benefits from file replication, a systems that includes replicas need a strategy for selecting and accessing suitable replicas. A replica selection strategy determines the available replicas and chooses the most access files. In most of these access frequency based solutions or popularity of files are assuming that files are independent of each other. In contrast, distributed systems such as peer-to-peer file sharing, and mobile database, files may be dependent or correlated to one another. Thus, this paper focused on the combination of popularity and affinity files as the most important parameters in selecting replicas in distributed environments. Herein, a replica selection is proposed focusing on popular files and affinity files. The idea is to improve data availability in distributed data replica selection strategy. A P2P simulator, PeerSim, is used to evaluate the performance of the dynamic replica selection strategy. The simulation results provided a proof that the proposed affinity replica selection has contributed towards a new dimension of replica selection strategy that incorporates the affinity and popularity of file replicas in distributed systems.
TL;DR: This work proposes a new approach based on conditional entropy to reduce dimensionality in incomplete information systems and shows that the proposed approach achieves better data reduction with higher accuracy for objects and dimensionality reduction in incomplete Information systems.
Abstract: Dimension reduction approach is one of the main data reduction approaches in order to reduce the storage and processing time while maintaining the integrity of the original data. A wide range of dimension reduction approaches are based on classical approaches such as PCA and Bayer’s, and machine learning approaches such as clustering, and feature selection techniques. However, many of the approaches do not consider the incomplete information systems where some attribute values are missing or incomplete. Only few studies were proposed for the problem in incomplete information systems due to its complexities, specifically on attribute selection. The most popular approaches is based on probability theory to replace missing values with the most common values, or remove the missing objects from the information systems. However, it needs to know the probability distribution of data in advance. To overcome these issues, we propose a new approach based on conditional entropy to reduce dimensionality. The results show that the proposed approach achieves better data reduction with higher accuracy for objects and dimensionality reduction in incomplete information systems.
TL;DR: A failure detector is a device (object) that provides the processes with information on failures that allows consensus to be solved in n-process asynchronous systems where up to t=n-1 processes may crash in the read/write communication model, and up to \(t
Abstract: A failure detector is a device (object) that provides the processes with information on failures. Failure detectors were introduced to enrich asynchronous systems so that it becomes possible to solve problems (or implement concurrent objects) that are otherwise impossible to solve in pure asynchronous systems where processes are prone to crash failures. The most famous failure detector (which is called “eventual leader” and denoted \(\varOmega \)) is the weakest failure detector which allows consensus to be solved in n-process asynchronous systems where up to \(t=n-1\) processes may crash in the read/write communication model, and up to \(t
TL;DR: The results of numerical simulation of supernova Ia explosions on massive parallel supercomputers by means HydroBox3D code are presented.
Abstract: In the paper a new parallel & distributed hydrodynamical code HydroBox3D for numerical simulation of supernovae Ia type explosion was described. The HydroBox3D code is created on basis of combination the adaptive nested mesh for hydrodynamical simulation of supernovae explosion and the regular mesh is second level of nested mesh for hydrodynamical simulation of nuclear reaction. The adaptive nested mesh code for shared memory architecture with using Intel Optane technology was developed. The second level of nested mesh code for Intel Xeon Phi KNL supercomputer was developed. The HydroBox3D code analysis is described. The results of numerical simulation of supernova Ia explosions on massive parallel supercomputers by means HydroBox3D code are presented.
TL;DR: Complexity in local memory has already been studied for several objects, including sets, databases and collaborative editors, but the literature has focused on a subclass of algorithms, operating in the so-called operational model, in which processes can only broadcast one message per update operation and the read operation incurs no communication.
Abstract: In large scale distributed systems, replication is essential in order to provide availability and partition tolerance. Such systems are abstracted by the wait-free model, composed of asynchronous processes that communicate by sending and receiving messages, and in which any process may crash. Complexity in local memory has already been studied for several objects, including sets, databases and collaborative editors. However, the literature has focused on a subclass of algorithms, operating in the so-called operational model, in which processes can only broadcast one message per update operation and the read operation incurs no communication.