Proceedings Article10.1145/977091.977152
Predictable performance in SMT processors
Francisco J. Cazorla,Peter M. W. Knijnenburg,Rizos Sakellariou,Enrique Fernández,Alex Ramirez,Mateo Valero +5 more
- 14 Apr 2004
- pp 433-443
TL;DR: This paper proposes a novel kind of collaboration between the OS and the SMT hardware that enables the OS to enforce that a high priority thread runs at a specific fraction of its full speed, and presents an extensive evaluation, that shows that this mechanism gives the required performance in more than 97% of all cases considered.
read more
Abstract: Current instruction fetch policies in SMT processors are oriented towards optimization of overall throughput and/or fairness. However, they provide no control over how individual threads are executed, leading to performance unpredictability, since the IPC of a thread depends on the workload it is executed in and on the fetch policy used.From the point of view of the Operating System (OS), it is the job scheduler that determines how jobs are executed. However, when the OS runs on an SMT processor, the job scheduler cannot guarantee execution time constraints of any job due to this performance unpredictability.In this paper we propose a novel kind of collaboration between the OS and the SMT hardware that enables the OS to enforce that a high priority thread runs at a specific fraction of its full speed. We present an extensive evaluation using many different workloads, that shows that this mechanism gives the required performance in more than 97% of all cases considered, and even more than 99% for the less extreme cases. At the same time, our mechanism does not need to trade off predictability against overall throughput, as it maximizes the IPC of the remaining low priority threads, giving 94% on average (and 97.5% on average for the less extreme cases) of the throughput obtained using instruction fetch policies oriented toward throughput maximization, such as icount.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler
Alexandra Fedorova,Margo Seltzer +1 more
- 15 Sep 2007
TL;DR: A new operating system scheduling algorithm that improves performance isolation on chip multiprocessors (CMP) by ensuring that the application runs as quickly as it would under fair cache allocation, regardless of how the cache is actually allocated.
QoS for high-performance SMT processors in embedded systems
Francisco J. Cazorla,Alex Ramirez,Mateo Valero,Peter M. W. Knijnenburg,Rizos Sakellariou,Enrique Fernández +5 more
TL;DR: This work presents a resource management scheme that eliminates a major cause of performance unpredictability in SMTs, making them suitable for many types of embedded systems.
Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs
Karthik T. Sundararajan,Vasileios Porpodas,Timothy M. Jones,Nigel Topham,Björn Franke +4 more
- 25 Feb 2012
TL;DR: Cooperative Partitioning is presented, a runtime partitioning scheme that reduces both dynamic and static energy while maintaining high performance and maintains high performance while transferring ways five times faster than an existing state-of-the-art technique.
Scheduling algorithms for effective thread pairing on hybrid multiprocessors
R.L. McGregor,Christos D. Antonopoulos,Dimitrios S. Nikolopoulos +2 more
- 04 Apr 2005
TL;DR: New scheduling policies that use run-time performance information to identify the best mix of threads to run across processors and within each processor are introduced.
References
Simultaneous multithreading: maximizing on-chip parallelism
Dean M. Tullsen,Susan J. Eggers,Henry M. Levy +2 more
- 01 May 1995
TL;DR: Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi-threading, and is an attractive alternative to single-chip multiprocessors.
Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor
Dean M. Tullsen,Susan J. Eggers,Joel Emer,Henry M. Levy,Jack L. Lo,Rebecca L. Stamm +5 more
- 01 May 1996
TL;DR: This paper presents an architecture for simultaneous multithreading that minimizes the architectural impact on the conventional superscalar design, has minimal performance impact on a single thread executing alone, and achieves significant throughput gains when running multiple threads.
Increasing superscalar performance through multistreaming
Wayne Yamamoto,Mario Nemirovsky +1 more
- 27 Jun 1995
131
Performance study of a multithreaded superscalar microprocessor
M. Gulati,Nader Bagherzadeh +1 more
- 03 Feb 1996
TL;DR: This paper describes a technique for improving the performance of a superscalar processor through multithreading and shows that it is possible to provide support for multiple streams with minimal extra hardware, yet achieving significant performance gain across a range of benchmarks.
87
Contention on 2nd Level Cache May Limit the Effectiveness of Simultaneous Multithreading
Sébastien Hily,André Seznec +1 more
- 01 Jan 1997
TL;DR: This work investigates issues involving the behavior of the memory hierarchy with SMT and shows that ignoring L2 cache contention leads to strongly over-estimate the performance one can expect and may lead to incorrect conclusions.
54