Speculative multithreading

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/223982.224449•

Simultaneous multithreading: maximizing on-chip parallelism

[...]

Dean M. Tullsen¹, Susan J. Eggers¹, Henry M. Levy¹•Institutions (1)

University of Washington¹

1 May 1995

TL;DR: Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi-threading, and is an attractive alternative to single-chip multiprocessors.

...read moreread less

Abstract: This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with alternative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing architectures. Our results show that both (single-threaded) superscalar and fine-grain multithreaded architectures are limited their ability to utilize the resources of a wide-issue processor. Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multithreading. We evaluate several cache configurations made possible by this type of organization and evaluate tradeoffs between them. We also show that simultaneous multithreading is an attractive alternative to single-chip multiprocessors; simultaneous multithreaded processors with a variety of organizations outperform corresponding conventional multiprocessors with similar execution resources.While simultaneous multithreading has excellent potential to increase processor utilization, it can add substantial complexity to the design. We examine many of these complexities and evaluate alternative organizations in the design space.

...read moreread less

1,773 citations

Proceedings Article•10.1145/232973.232993•

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

[...]

Dean M. Tullsen¹, Susan J. Eggers¹, Joel Emer, Henry M. Levy¹, Jack L. Lo¹, Rebecca L. Stamm - Show less +2 more•Institutions (1)

University of Washington¹

1 May 1996

TL;DR: This paper presents an architecture for simultaneous multithreading that minimizes the architectural impact on the conventional superscalar design, has minimal performance impact on a single thread executing alone, and achieves significant throughput gains when running multiple threads.

...read moreread less

Abstract: Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.

...read moreread less

854 citations

Proceedings Article•10.1145/291069.291020•

Data speculation support for a chip multiprocessor

[...]

Lance Hammond¹, Mark Willey¹, Kunle Olukotun¹•Institutions (1)

Stanford University¹

1 Oct 1998

TL;DR: Overall, thread-level speculation still appears to be a promising approach for expanding the class of applications that can be automatically parallelized, but more hardware intensive implementations for managing speculation control are required to achieve performance improvements on a wide class of integer applications.

...read moreread less

Abstract: Thread-level speculation is a technique that enables parallel execution of sequential applications on a multiprocessor. This paper describes the complete implementation of the support for threadlevel speculation on the Hydra chip multiprocessor (CMP). The support consists of a number of software speculation control handlers and modifications to the shared secondary cache memory system of the CMP This support is evaluated using five representative integer applications. Our results show that the speculative support is only able to improve performance when there is a substantial amount of medium--grained loop-level parallelism in the application. When the granularity of parallelism is too small or there is little inherent parallelism in the application, the overhead of the software handlers overwhelms any potential performance benefits from speculative-thread parallelism. Overall, thread-level speculation still appears to be a promising approach for expanding the class of applications that can be automatically parallelized, but more hardware intensive implementations for managing speculation control are required to achieve performance improvements on a wide class of integer applications.

...read moreread less

430 citations

Journal Article•10.1145/641865.641867•

A survey of processors with explicit multithreading

[...]

Theo Ungerer¹, Borut Robič², Jurij Šilc³•Institutions (3)

University of Augsburg¹, University of Ljubljana², Jožef Stefan Institute³

01 Mar 2003-ACM Computing Surveys

TL;DR: This survey paper explains and classifies the explicit multithreaded techniques in research and in commercial microprocessors.

...read moreread less

Abstract: Hardware multithreading is becoming a generally applied technique in the next generation of microprocessors. Several multithreaded processors are announced by industry or already into production in the areas of high-performance microprocessors, media, and network processors.A multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline. The contexts of two or more threads of control are often stored in separate on-chip register sets. Unused instruction slots, which arise from latencies during the pipelined execution of single-threaded programs by a contemporary microprocessor, are filled by instructions of other threads within a multithreaded processor. The execution units are multiplexed between the thread contexts that are loaded in the register sets.Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple threads each cycle. Simultaneous multithreaded processors combine the multithreading technique with a wide-issue superscalar processor to utilize a larger part of the issue bandwidth by issuing instructions from different threads simultaneously.Explicit multithreaded processors are multithreaded processors that apply processes or operating system threads in their hardware thread slots. These processors optimize the throughput of multiprogramming workloads rather than single-thread performance. We distinguish these processors from implicit multithreaded processors that utilize thread-level speculation by speculatively executing compiler- or machine-generated threads of control that are part of a single sequential program.This survey paper explains and classifies the explicit multithreading techniques in research and in commercial microprocessors.

...read moreread less

329 citations

Proceedings Article•10.1109/HPCA.1998.650559•

Speculative versioning cache

[...]

S. Gopal¹, T. N. Vijaykumar¹, James E. Smith¹, Gurindar S. Sohi¹•Institutions (1)

University of Wisconsin-Madison¹

31 Jan 1998

TL;DR: This proposal uses distributed caches to eliminate the latency and bandwidth problems of the ARB and conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches.

...read moreread less

Abstract: Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences can be overcome by memory dependence speculation which enables a load or store to be speculatively executed before the addresses of all preceding loads and stores are known. Furthermore, multiple speculative stores to a memory location create multiple speculative versions of the location. Program order among the speculative versions must be tracked to maintain sequential semantics. A previously proposed approach, the address resolution buffer (ARB) uses a centralized buffer to support speculative versions. Our proposal, called the speculative versioning cache (SVC), uses distributed caches to eliminate the latency and bandwidth problems of the ARB. The SVC conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches. A preliminary evaluation for the multiscalar architecture shows that hit latency is an important factor affecting performance, and private cache solutions trade-off hit rate for hit latency.

...read moreread less

321 citations

...

Expand

Year	Papers
2025	7
2024	12
2023	11
2022	18
2021	3
2020	6

Topic Tools

Papers published on a yearly basis

Papers

Simultaneous multithreading: maximizing on-chip parallelism

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

Data speculation support for a chip multiprocessor

A survey of processors with explicit multithreading

Speculative versioning cache

Related Topics (5)

Performance Metrics