TL;DR: An important class of programs for sharedmemory architectures is discussed and how they can be mapped to the LogP machine and a constant factor delay with respect to the optimal LogP execution time can be guaranteed.
Abstract: Currently, many parallel algorithms are defined for sharedmemory architectures. The prefered machine model for designing these algorithms is the PRAM. However, this model does not take into account properties of existing architectures. Recently, Culler et al. defined the LogP machine model which better reflects the behaviour of massively parallel computers. We discuss an important class of programs for sharedmemory architectures and show how they can be mapped to the LogP machine. We define this class and show how to compute the mapping at compile time. For this mapping a constant factor delay with respect to the optimal LogP execution time can be guaranteed.
TL;DR: This work shows transformations of a subclass of PRAM-programs leading to efficient LogP programs and gives upper bounds for executing them on the LogP machine and defines the classes of coarse and fine grained LogP Programs.
Abstract: In sequential computing the step from programming in machine code to programming in machine independent high level languages has been done for decades. Although high level programming languages are available for parallel machines today’s parallel programs highly depend on the archit ectures they are intended to run on. Designing efficient parallel programs is a difficult task that can be performed by specialists only. Porting those programs to other parallel architectures is nearly impossible without a considerable loss of performance. Abstract machine models for parallel computing like the PRAM-model are accepted by theoreticians but have no practical relevance since these models don’t take into account properties of existing architectures. However, the PRAM is easy to program. Recently, Culler et al. defined the LogP machine model which better reflects the behaviour of massively parallel computers. In this work, we show transformations of a subclass of PRAM-programs leading to efficient LogP programs and give upper bounds for executing them on the LogP machine. Therefore, we first briefly summarize the transformations from PRAM to LogP programs. Second, we extend the LogP machine model by a set of machine instructions. Third, we define the classes of coarse and fine grained LogP programs. The former class of programs can be executed within the factor two of the optimum. The latter class of programs has an upper time bound for execution that is a little worse. Finally, we show how to decide statically which strategy is promising for a given program optimization problem.
TL;DR: It is shown that for inverse tree-like task graphs (which include inverse trees) optimal linear schedules can be found in polynomial time when g t - o is constant, and the minimal computation time of a task is at least g t- o (no matter whether the trees are coarse grained or not), and that it is an NP-complete problem to find optimal (restricted) schedules for inverse trees even when g = 0.
TL;DR: This work generalizes scheduling techniques known for oblivious algorithms to iterative algorithms and proves bounds for the execution time of such algorithms in terms of the optimum.
Abstract: Usually, scheduling algorithms are designed for task-graphs. Task-graphs model oblivious algorithms, but not iterative algorithms where the number of iterations is unknown (e.g. while-loops). We generalize scheduling techniques known for oblivious algorithms to iterative algorithms. We prove bounds for the execution time of such algorithms in terms of the optimum.