TL;DR: This guide introduces you to the most important and frequently used patterns of parallel programming and provides executable code samples for them, using PPL.
Abstract: Your CPU meter shows a problem. One core is running at 100 percent, but all the other cores are idle. Your application is CPU-bound, but you are using only a fraction of the computing power of your multicore system. Is there a way to get better performance?The answer, in a nutshell, is parallel programming. Where you once would have written the kind of sequential code that is familiar to all programmers, you now find that this no longer meets your performance goals. To use your systems CPU resources efficiently, you need to split your application into pieces that can run at the same time. Of course, this is easier said than done. Parallel programming has a reputation for being the domain of experts and a minefield of subtle, hard-to-reproduce software defects. Everyone seems to have a favorite story about a parallel program that did not behave as expected because of a mysterious bug.These stories should inspire a healthy respect for the difficulty of the problems you will face in writing your own parallel programs. Fortunately, help has arrived. The Parallel Patterns Library (PPL) and the Asynchronous Agents Library introduce a new programming model for parallelism that significantly simplifies the job. Behind the scenes are sophisticated algorithms that dynamically distribute computations on multicore architectures. In addition, Microsoft Visual Studio 2010 development system includes debugging and analysis tools to support the new parallel programming model.Proven design patterns are another source of help. This guide introduces you to the most important and frequently used patterns of parallel programming and provides executable code samples for them, using PPL. When thinking about where to begin, a good place to start is to review the patterns in this book. See if your problem has any attributes that match the six patterns presented in the following chapters. If it does, delve more deeply into the relevant pattern or patterns and study the sample code.
TL;DR: This paper proposes tools and a two step methodology that target this level of abstraction and shows the flexibility of the approach by analyzing three applications, including a client-server benchmark that uses a parallel_for nested within a parallel pipeline.
Abstract: Performing modeling and visualization of task-based parallel algorithms is challenging. Libraries such as Intel Threading Building Blocks (TBB) and Microsoft's Parallel Patterns Library provide high-level algorithms that are implemented using low-level tasks. Current tools present performance at this lower level. Developers like to tune and debug at the same level as the coding abstraction, so in this paper we propose tools and a two step methodology that target this level of abstraction. In the first step, the system level metrics of utilization and overhead are collected to determine if performance is acceptable. If a problem is suspected, the second step of our methodology projects these metrics on to the algorithms contained in the application. Using these projections many common performance issues can be quickly diagnosed. We demonstrate our methodology using a prototype implementation that is integrated with the Intel Threading Building Blocks library. We show the flexibility of the approach by analyzing three applications, including a client-server benchmark that uses a parallel_for nested within a parallel pipeline.
TL;DR: Over the last several years, Microsoft and Intel have collaborated to produce a set of common libraries known as the Parallel Patterns Library (PPL) by Microsoft and the Threading Building Blocks (TBB) by Intel.
Abstract: Over the last several years, Microsoft and Intel have collaborated to produce a set of common libraries known as the Parallel Patterns Library (PPL) by Microsoft and the Threading Building Blocks (TBB) by Intel. The two libraries have been a part of the commercial products shipped by Microsoft and Intel. Additionally, the paper is informed by Intel’s experience with Cilk Plus, an extension to C++ included in the Intel C++ compiler in the Intel Composer XE product.
TL;DR: Groovy Parallel Patterns as mentioned in this paper provides a collection of processes that can be plugged together to form a variety of parallel architectures and is intrinsically its own DSL, which enables effective refinement of solutions between process networks which can be checked also using formal methods.
Abstract: A novel parallel patterns library, Groovy Parallel Patterns, is presented which, from the outset, has been designed to exploit more general process parallelism than the usual data and task parallel architectures. The library executes on a standard Java Virtual Machine. The library provides a collection of processes that can be plugged together to form a variety of parallel architectures and is intrinsically its own DSL. A network of processes is guaranteed to be deadlock and livelock free and terminate correctly and this is proved by the use of formal methods. Error capture and a basic logging mechanism have been incorporated. The library enables effective refinement of solutions between process networks which can be checked also using formal methods. A library user is only required to create the required methods as pieces of sequential code, typically taken from extant sequential solutions, which can then be invoked by the processes as required. The utility of the library is demonstrated by several examples including; Monte Carlo Methods, Concordance, Jacobi solutions, N-body problems and Mandelbrot, which is implemented on both a multicore processor and a workstation cluster. The examples are analysed for speedup and efficiency, which show good and consistent performance improvement up to the number of available processor cores and workstations.