TL;DR: This book is the first parallel programming guide written specifically to serve working software developers, not just computer scientists, and introduces a complete, highly accessible pattern language that will help any experienced developer "think parallel"-and start writing effective parallel code almost immediately.
Abstract: The Parallel Programming Guide for Every Software Developer From grids and clusters to next-generation game consoles, parallel computing is going mainstream. Innovations such as Hyper-Threading Technology, HyperTransport Technology, and multicore microprocessors from IBM, Intel, and Sun are accelerating the movement's growth. Only one thing is missing: programmers with the skills to meet the soaring demand for parallel software. That's where Patterns for Parallel Programming comes in. It's the first parallel programming guide written specifically to serve working software developers, not just computer scientists. The authors introduce a complete, highly accessible pattern language that will help any experienced developer "think parallel"-and start writing effective parallel code almost immediately. Instead of formal theory, they deliver proven solutions to the challenges faced by parallel programmers, and pragmatic guidance for using today's parallel APIs in the real world. Coverage includes: Understanding the parallel computing landscape and the challenges faced by parallel developers Finding the concurrency in a software design problem and decomposing it into concurrent tasks Managing the use of data across tasks Creating an algorithm structure that effectively exploits the concurrency you've identified Connecting your algorithmic structures to the APIs needed to implement them Specific software constructs for implementing parallel programs Working with today's leading parallel programming environments: OpenMP, MPI, and Java Patterns have helped thousands of programmers master object-oriented development and other complex programming technologies. With this book, you will learn that they're the best way to master parallel programming too. 0321228111B08232004
TL;DR: This paper introduces a new work-stealing scheduler with compiler support for async-finish task parallelism that can accommodate both work- first and help-first scheduling policies, and provides insights on scenarios in which the help- first policy yields better results than the work-first policy and vice versa.
Abstract: Multiple programming models are emerging to address an increased need for dynamic task parallelism in applications for multicore processors and shared-address-space parallel computing. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Thread Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work stealing, as embodied in Cilk's implementation of dynamic spawn-sync parallelism, are gaining in popularity but also have inherent limitations. In this paper, we address the problem of efficient and scalable implementation of X10's async-finish task parallelism, which is more general than Cilk's spawn-sync parallelism. We introduce a new work-stealing scheduler with compiler support for async-finish task parallelism that can accommodate both work-first and help-first scheduling policies. Performance results on two different multicore SMP platforms show significant improvements due to our new work-stealing algorithm compared to the existing work-sharing scheduler for X10, and also provide insights on scenarios in which the help-first policy yields better results than the work-first policy and vice versa.
TL;DR: A unified approach to exploiting both kinds of parallelism in a single framework with an existing language is taken and implemented a parallelizing Fortran compiler for the iWarp system based on this approach.
Abstract: For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are appropriate for a parallel system. Most existing compilers focus on only one of data parallelism and task parallelism. Therefore, to achieve the desired results, the programmer must separately program the data and task parallelism. We have taken a unified approach to exploiting both kinds of parallelism in a single framework with an existing language. This approach eases the task of programming and exposes the tradeoffs between data and task parallelism to compiler. We have implemented a parallelizing Fortran compiler for the iWarp system based on this approach. We discuss the design of our compiler, and present performance results to validate our approach.
TL;DR: This work presents results obtained using STAPL for a molecular dynamics code and a particle transport code, and presents functionality to allow the user to further optimize the code and achieve additional performance gains.
Abstract: The Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni- or multi-processor systems that utilize shared or distributed memory. STAPL is implemented using simple parallel extensions of C++ that currently provide a SPMD model of parallelism, and supports nested parallelism. The library is intended to be general purpose, but emphasizes irregular programs to allow the exploitation of parallelism for applications which use dynamically linked data structures such as particle transport calculations, molecular dynamics, geometric modeling, and graph algorithms. STAPL provides several different algorithms for some library routines, and selects among them adaptively at runtime. STAPL can replace STL automatically by invoking a preprocessing translation phase. In the applications studied, the performance of translated code was within 5% of the results obtained using STAPL directly. STAPL also provides functionality to allow the user to further optimize the code and achieve additional performance gains. We present results obtained using STAPL for a molecular dynamics code and a particle transport code.
TL;DR: A survey deployed inside Microsoft in January 2007 indicates that the use of concurrency is widespread at Microsoft, and most engineers feel concurrency issues will be more of an issue going forward.
Abstract: Concurrent programming is gaining significant prominence in the software industry, especially due to the advent of multi-core architectures. In this report, we present the results of a survey deployed inside Microsoft in January 2007 to assess the state of the practice of concurrency at Microsoft. Our survey polled 10% of the Microsoft technical staff and collected data for each of the three major business units, namely Microsoft platforms and services division, mobile and embedded devices division and Microsoft business division. Our major findings indicate that the use of concurrency is widespread at Microsoft. Of our 684 respondents, over 60% of our respondent population had to deal with concurrency issues frequently (on a monthly basis). The most popular platforms for concurrent programming inside Microsoft are Win32 and CLR (Common Language Runtime), which are equally popular. Also, multi-threading and message-passing forms of concurrency appear to be equally pervasive. Concurrency bugs take on average several days to detect, reproduce, debug and fix. Most of these bugs are of high severity. Most engineers feel concurrency issues will be more of an issue going forward, and would welcome additional help in terms of language support, libraries, tools, processes and training.