Using processor affinity in loop scheduling on shared-memory multiprocessors
Evangelos P. Markatos,Thomas J. LeBlanc +1 more
- 01 Dec 1992
- pp 104-113
199
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Patent
Apparatus and method for improved CPU affinity in a multiprocessor system
Robert A. Alfieri
- 26 Jan 1994
TL;DR: The thread group structure maintains collective timeslice and CPU accounting for all threads in the group, each individual thread has a local scheduling priority for scheduling among the threads in its group as discussed by the authors.
289
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
John A. Stratton,Sam S. Stone,Wen-mei W. Hwu +2 more
- 28 Nov 2008
TL;DR: A framework called MCUDA is described, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs and argues that CUDA can be an effective data-parallel programming model for more than just GPU architectures.
240
Locality and Loop Scheduling on NUMA Multiprocessors
Hui Li,Sudarsan Tandri,Michael Stumm,Kenneth C. Sevcik +3 more
- 16 Aug 1993
TL;DR: An improtant issue in the parallel execution of loops is how to partition and schedule the loops onto the available processors and fail to take locality into account and therefore perform poorly on parallel systems with non-uniform memory access times.
117
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
John A. Stratton,Vinod Grover,Jaydeep Marathe,Bastiaan Aarts,Michael Murphy,Ziang Hu,Wen-mei W. Hwu +6 more
- 24 Apr 2010
TL;DR: Techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms are described, and reasonable restrictions on the synchronization model enable significant optimizations and performance improvements over a baseline approach.
91
The effectiveness of multiple hardware contexts
Radhika Thekkath,Susan J. Eggers +1 more
- 01 Nov 1994
TL;DR: The usefulness of multiple hardware contexts depends on: program data locality, cache organization and degree of multiprocessing, and the ability of an additional processor to exploit program parallelism.
82
References
Allocating Independent Subtasks on Parallel Processors
Clyde P. Kruskal,A. Weiss +1 more
TL;DR: It is shown that allocating an equal number of subtasks to each processor all at once has good efficiency, as a consequence of a rather general theorem which shows how some consequences of the central limit theorem hold even when one cannot prove that thecentral limit theorem applies.
403
The performance implications of thread management alternatives for shared-memory multiprocessors
TL;DR: An Ethernet-style backoff algorithm is presented that largely eliminates the effect of normal methods of critical resource waiting, and can be used to to improve throughput, and in some circumstances to avoid locking, improving latency as well.
201
NUMA policies and their relation to memory architecture
William J. Bolosky,Michael L. Scott,Robert P. Fitzgerald,Robert J. Fowler,Alan L. Cox +4 more
- 01 Apr 1991
TL;DR: This work uses an off-line, optimal cost policy as a baseline against which to compare on-line policies, and uses it as a policyinsensitive tool for evaluating architectural design alternatives.
Experimental comparison of memory management policies for NUMA multiprocessors
TL;DR: The results show that there are memory management policies implemented in the system that can improve the performance of programs written using the simpler uniform memory access (UMA) programming model, and there appears to be no single policy that can be considered the best over a set of test applications.
123
Analysis of task migration in shared-memory multiprocessor scheduling
Mark S. Squillante,Randolph Nelson +1 more
- 02 Apr 1991
TL;DR: The potential for significant improvements in system performance and the potential for unstable behavior under migratory scheduling policies are illustrated, and optimal policy thresholds are provided that yield the best performance and avoid this form of processor thrashing.
75