Using processor affinity in loop scheduling on shared-memory multiprocessors

doi:10.5555/147877.147919

Open AccessProceedings Article10.5555/147877.147919

Using processor affinity in loop scheduling on shared-memory multiprocessors

Evangelos P. Markatos, +1 more

- 01 Dec 1992

- pp 104-113

199

TL;DR: The authors propose a loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and colocate loop iterations with the necessary data and conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Patent

Apparatus and method for improved CPU affinity in a multiprocessor system

Robert A. Alfieri

- 26 Jan 1994

TL;DR: The thread group structure maintains collective timeslice and CPU accounting for all threads in the group, each individual thread has a local scheduling priority for scheduling among the threads in its group as discussed by the authors.

...read moreread less

289

Book Chapter•10.1007/978-3-540-89740-8_2

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

John A. Stratton, +2 more

- 28 Nov 2008

TL;DR: A framework called MCUDA is described, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs and argues that CUDA can be an effective data-parallel programming model for more than just GPU architectures.

...read moreread less

240

Proceedings Article•10.1109/ICPP.1993.112

Locality and Loop Scheduling on NUMA Multiprocessors

Hui Li, +3 more

- 16 Aug 1993

TL;DR: An improtant issue in the parallel execution of loops is how to partition and schedule the loops onto the available processors and fail to take locality into account and therefore perform poorly on parallel systems with non-uniform memory access times.

...read moreread less

117

Proceedings Article•10.1145/1772954.1772971

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

John A. Stratton, +6 more

- 24 Apr 2010

TL;DR: Techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms are described, and reasonable restrictions on the synchronization model enable significant optimizations and performance improvements over a baseline approach.

...read moreread less

91

Proceedings Article•10.1145/195473.195583

The effectiveness of multiple hardware contexts

Radhika Thekkath, +1 more

- 01 Nov 1994

TL;DR: The usefulness of multiple hardware contexts depends on: program data locality, cache organization and degree of multiprocessing, and the ability of an additional processor to exploit program parallelism.

...read moreread less

82

...

Expand

References

Journal Article•10.1109/TSE.1985.231547

Allocating Independent Subtasks on Parallel Processors

Clyde P. Kruskal, +1 more

- 01 Oct 1985

- IEEE Transactions on Software Engineerin...

TL;DR: It is shown that allocating an equal number of subtasks to each processor all at once has good efficiency, as a consequence of a rather general theorem which shows how some consequences of the central limit theorem hold even when one cannot prove that thecentral limit theorem applies.

...read moreread less

403

Journal Article•10.1109/12.40843

The performance implications of thread management alternatives for shared-memory multiprocessors

Thomas Anderson, +2 more

- 01 Dec 1989

- IEEE Transactions on Computers

TL;DR: An Ethernet-style backoff algorithm is presented that largely eliminates the effect of normal methods of critical resource waiting, and can be used to to improve throughput, and in some circumstances to avoid locking, improving latency as well.

...read moreread less

201

•Proceedings Article•10.1145/106972.106994

NUMA policies and their relation to memory architecture

William J. Bolosky, +4 more

- 01 Apr 1991

TL;DR: This work uses an off-line, optimal cost policy as a baseline against which to compare on-line policies, and uses it as a policyinsensitive tool for evaluating architectural design alternatives.

...read moreread less

123

Journal Article•10.1145/118544.118546

Experimental comparison of memory management policies for NUMA multiprocessors

Richard P. LaRowe, +1 more

- 01 Nov 1991

- ACM Transactions on Computer Systems

TL;DR: The results show that there are memory management policies implemented in the system that can improve the performance of programs written using the simpler uniform memory access (UMA) programming model, and there appears to be no single policy that can be considered the best over a set of test applications.

...read moreread less

123

Proceedings Article•10.1145/107971.107987

Analysis of task migration in shared-memory multiprocessor scheduling

Mark S. Squillante, +1 more

- 02 Apr 1991

TL;DR: The potential for significant improvements in system performance and the potential for unstable behavior under migratory scheduling policies are illustrated, and optimal policy thresholds are provided that yield the best performance and avoid this form of processor thrashing.

...read moreread less

75

...

Expand

Using processor affinity in loop scheduling on shared-memory multiprocessors

Chat with Paper

AI Agents for this Paper

Citations

Apparatus and method for improved CPU affinity in a multiprocessor system

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

Locality and Loop Scheduling on NUMA Multiprocessors

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

The effectiveness of multiple hardware contexts

References

Allocating Independent Subtasks on Parallel Processors

The performance implications of thread management alternatives for shared-memory multiprocessors

NUMA policies and their relation to memory architecture

Experimental comparison of memory management policies for NUMA multiprocessors

Analysis of task migration in shared-memory multiprocessor scheduling

Related Papers (5)

Using processor affinity in loop scheduling on shared-memory multiprocessors

Allocating Independent Subtasks on Parallel Processors

An optimal scheduling scheme for tiling in distributed systems

Memory partitioning and scheduling co-optimization in behavioral synthesis

Avoiding memory contention on tightly coupled multiprocessors