About: High Productivity Computing Systems is a research topic. Over the lifetime, 40 publications have been published within this topic receiving 1049 citations.
TL;DR: This tutorial will introduce attendees to HPCC, provide tools to examine differences in HPC architectures, and give hands-on training that will hopefully lead to better understanding of parallel environments.
Abstract: In 2003, the DARPA's High Productivity Computing Systems released the HPCC suite. It examines the performance of HPC architectures using kernels with various memory access patterns of well known computational kernels. Consequently, HPCC results bound the performance of real applications as a function of memory access characteristics and define performance boundaries of HPC architectures. The suite was intended to augment the TOP500 list and by now the results are publicly available for 6 out of 10 of the world's fastest computers. Implementations exist in most of the major high-end programming languages and environments, accompanied by countless optimization efforts. The increased publicity enjoyed by HPCC doesn't necessarily translate into deeper understanding of the performance issues that HPCC benchmarks. And so this tutorial will introduce attendees to HPCC, provide tools to examine differences in HPC architectures, and give hands-on training that will hopefully lead to better understanding of parallel environments.
TL;DR: The origins of HPF are traced through the programming languages on which it was based, leading up to the standardization effort, and the motivation underlying technical decisions that led to the set of features incorporated into the original language and its two follow-ons are reviewed.
Abstract: High Performance Fortran (HPF) is a high-level data-parallel programming system based on Fortran. The effort to standardize HPF began in 1991, at the Supercomputing Conference in Albuquerque, where a group of industry leaders asked Ken Kennedy to lead an effort to produce a common programming language for the emerging class of distributed-memory parallel computers. The proposed language would focus on data-parallel operations in a single thread of control, a strategy which was pioneered by some earlier commercial and research systems, including Thinking Machines' CM Fortran, Fortran D, and Vienna Fortran. The standardization group, called the High Performance Fortran Forum (HPFF), took a little over a year to produce a language definition that was published in January 1993 as a Rice technical report [50] and, later that same year, as an article in Scientific Programming [49]. The HPF project had created a great deal of excitement while it was underway and the release was initially well received in the community. However, over a period of several years, enthusiasm for the language waned in the United States, although it has continued to be used in Japan. This paper traces the origins of HPF through the programming languages on which it was based, leading up to the standardization effort. It reviews the motivation underlying technical decisions that led to the set of features incorporated into the original language and its two follow-ons: HPF 2 (extensions defined by a new series of HPFF meetings) and HPF/JA (the dialect that was used by Japanese manufacturers and runs on the Earth Simulator). A unique feature of this paper is its discussion and analysis of the technical and sociological mistakes made by both the language designers and the user community:, mistakes that led to the premature abandonment of the very promising approach employed in HPF. It concludes with some lessons for the future and an exploration of the influence of ideas from HPF on new languages emerging from the High Productivity Computing Systems program sponsored by DARPA.
TL;DR: Evaluating the High-Productivity Reconfigurable Computer (HPRC) approach to FPGA programming, where a commodity CPU instruction set architecture is augmented with instructions which execute on a specialised FPGa co-processor, shows that high-productivity reconfigurable computing systems outperform GPUs in applications with poor locality characteristics and low memory bandwidth requirements.
Abstract: This paper provides the first comparison of performance and energy efficiency of high productivity computing systems based on FPGA (Field-Programmable Gate Array) and GPU (Graphics Processing Unit) technologies. The search for higher performance compute solutions has recently led to great interest in heterogeneous systems containing FPGA and GPU accelerators. While these accelerators can provide significant performance improvements, they can also require much more design effort than a pure software solution, reducing programmer productivity. The CUDA system has provided a high productivity approach for programming GPUs. This paper evaluates the High-Productivity Reconfigurable Computer (HPRC) approach to FPGA programming, where a commodity CPU instruction set architecture is augmented with instructions which execute on a specialised FPGA co-processor, allowing the CPU and FPGA to co-operate closely while providing a programming model similar to that of traditional software. To compare the GPU and FPGA approaches, we select a set of established benchmarks with different memory access characteristics, and compare their performance and energy efficiency on an FPGA-based Hybrid-Core system with a GPU-based system. Our results show that while GPUs excel at streaming applications, high-productivity reconfigurable computing systems outperform GPUs in applications with poor locality characteristics and low memory bandwidth requirements.
TL;DR: It is shown that the GPU is more productive than the FPGA architecture for most of the benchmarks and it is concluded thatFPGA-based HPCS is being marginalised by GPUs.
Abstract: Heterogeneous or co-processor architectures are becoming an important component of high productivity computing systems (HPCS). In this work the performance of a GPU based HPCS is compared with the performance of a commercially available FPGA based HPC. Contrary to previous approaches that focussed on specific examples, a broader analysis is performed by considering processes at an architectural level. A set of benchmarks is employed that use different process architectures in order to exploit the benefits of each technology. These include the asynchronous pipelines common to "map" tasks, a partially synchronous tree common to "reduce" tasks and a fully synchronous, fully connected mesh. We show that the GPU is more productive than the FPGA architecture for most of the benchmarks and conclude that FPGA-based HPCS is being marginalised by GPUs.
TL;DR: The challenges facing any new language for scalable parallel computing, including the strong competition presented by MPI and the existing Partitioned Global Address Space (PGAS) Languages are described.
Abstract: We present a summary of the current state of DARPA's HPCS language project. We describe the challenges facing any new language for scalable parallel computing, including the strong competition presented by MPI and the existing Partitioned Global Address Space (PGAS) Languages. We identify some of the major features of the proposed languages, using MPI and the PGAS languages for comparison, and describe the opportunities for higher productivity along with the implementation challenges. Finally, we present the conclusions of a recent workshop in which a concrete plan for the next few years was proposed.