Proceedings Article10.1109/CLUSTER.2017.65
Performance and Power Analysis of SX-ACE Using HP-X Benchmark Programs
Ryusuke Egawa,Kazuhiko Komatsu,Yoko Isobe,Toshihiro Kato,Souya Fujimoto,Hiroyuki Takizawa,Akihiro Musa,Hiroaki Kobayashi +7 more
- 01 Sep 2017
- pp 693-700
13
TL;DR: The evaluation results show that SX-ACE achieves the highest efficiencies in the HPGMG and HPCG ranking lists, which clearly indicate that the powerful vector processing mechanism with a high B/F ratio is mandatory to achieve a high sustained performance in the future HPC systems.
read more
Abstract: As the SIMD width of modern microprocessors has been widening for keeping up with the computational demand for HPC systems, recently the vector architecture comes back to spotlight. Besides, a modern vector architecture that has been keeping a large SIMD width and a high B/F ratio has survived and evolved in the HPC community. In this paper, to clarify the potential of the modern vector architecture, we present the performance and power analysis of a modern vector supercomputer SX-ACE using HP-X benchmark programs (HPL, HPCG, and HPGMG). Furthermore, the implementation and optimization of these benchmarks on SX-ACE are discussed. The evaluation results show that SX-ACE achieves the highest efficiencies in the HPGMG and HPCG ranking lists. These facts clearly indicate that the powerful vector processing mechanism with a high B/F ratio is mandatory to achieve a high sustained performance in the future HPC systems.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Exploiting the Potentials of the Second Generation SX-Aurora TSUBASA
Ryusuke Egawa,Souya Fujimoto,Tsuyoshi Yamashita,Daisuke Sasaki,Yoko Isobe,Yoichi Shimomura,Hiroyuki Takizawa +6 more
- 01 Nov 2020
TL;DR: Wang et al. as discussed by the authors discussed workload characterization by performance bottleneck analysis to determine an optimization strategy for the 2nd generation SX-Aurora TSUBASA, Type 20B, which provides an extremely high memory bandwidth of 1.53 TB/s per vector processor.
22
The HPCG benchmark: analysis, shared memory preliminary improvements and evaluation on an Arm-based platform
Daniel Ruiz,Filippo Mantovani,Marc Casas,Jesús José Labarta Mancho,Filippo Spiga +4 more
- 01 Jan 2018
TL;DR: This report evaluates the HPCG code at scale on a state-of-the-art HPC system based on Cavium ThunderX2 SoC, and introduces in this report two OpenMP parallelization methods.
14
Performance and Power Analysis of a Vector Computing System
Kazuhiko Komatsu,Akito Onodera,Erich Focht,Soya Fujimoto,Yoko Isobe,Shintaro Momose,Masayuki Sato,Hiroaki Kobayashi +7 more
- 09 Aug 2021
TL;DR: In this article, a vector processing with long vector length has been discussed and various levels of optimizations required for a large-scale vector computing system are examined such as vectorization, loop unrolling, use of cache, domain decomposition, process mapping, and problem size tuning.
12
High-Performance GraphBLAS Backend Prototype for NEC SX-Aurora TSUBASA
01 May 2022
TL;DR: In this article , the GraphBLAS backend for SX-Aurora TSUBASA vector engines is implemented and compared to the existing Vector Graph Library (VGL) based implementations.
3
Performance evaluation of parallel direct numerical simulation code on supercomputer SX-Aurora TSUBASA
Mitsuo Yokokawa,Yuji Takenaka,Takashi Ishihara,Kazuhiko Komatsu,Hiroaki Kobayashi +4 more
- 01 Apr 2023
TL;DR: In this paper , a DNS code is parallelized using pencil decomposition and optimized for the vector-type supercomputer SX-Aurora TSUBASA using a loop blocking technique.
1
References
Lockup-free instruction fetch/prefetch cache organization
David Kroft
- 12 May 1981
TL;DR: A cache organization is presented that essentially eliminates a penalty on subsequent cache references following a cache miss and has been incorporated in a cache/memory interface subsystem design, and the design has been implemented and prototyped.
HPC benchmarking: problem size matters
Vladimir Marjanovic,José Gracia,Colin W. Glass +2 more
- 13 Nov 2016
TL;DR: It is argued that an aggregate value derived from a whole range of problem sizes can significantly improve the sensitivity of a given benchmark to relevant hardware properties and thus be more representative.
Performance Modeling of the HPCG Benchmark
Vladimir Marjanovic,José Gracia,Colin W. Glass +2 more
- 16 Nov 2014
TL;DR: Discussion on introducing a new benchmark, better aligned with real-world applications and therefore the needs of real users, have increased, culminating in a highly regarded candidate: High Performance Conjugate Gradients (HPCG).