Hyper-threading

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/223982.224449•

Simultaneous multithreading: maximizing on-chip parallelism

[...]

Dean M. Tullsen¹, Susan J. Eggers¹, Henry M. Levy¹•Institutions (1)

University of Washington¹

1 May 1995

TL;DR: Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi-threading, and is an attractive alternative to single-chip multiprocessors.

...read moreread less

Abstract: This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with alternative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing architectures. Our results show that both (single-threaded) superscalar and fine-grain multithreaded architectures are limited their ability to utilize the resources of a wide-issue processor. Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multithreading. We evaluate several cache configurations made possible by this type of organization and evaluate tradeoffs between them. We also show that simultaneous multithreading is an attractive alternative to single-chip multiprocessors; simultaneous multithreaded processors with a variety of organizations outperform corresponding conventional multiprocessors with similar execution resources.While simultaneous multithreading has excellent potential to increase processor utilization, it can add substantial complexity to the design. We examine many of these complexities and evaluate alternative organizations in the design space.

...read moreread less

1,773 citations

Proceedings Article•10.1145/232973.232993•

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

[...]

Dean M. Tullsen¹, Susan J. Eggers¹, Joel Emer, Henry M. Levy¹, Jack L. Lo¹, Rebecca L. Stamm - Show less +2 more•Institutions (1)

University of Washington¹

1 May 1996

TL;DR: This paper presents an architecture for simultaneous multithreading that minimizes the architectural impact on the conventional superscalar design, has minimal performance impact on a single thread executing alone, and achieves significant throughput gains when running multiple threads.

...read moreread less

Abstract: Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.

...read moreread less

854 citations

Journal Article•10.1145/641865.641867•

A survey of processors with explicit multithreading

[...]

Theo Ungerer¹, Borut Robič², Jurij Šilc³•Institutions (3)

University of Augsburg¹, University of Ljubljana², Jožef Stefan Institute³

01 Mar 2003-ACM Computing Surveys

TL;DR: This survey paper explains and classifies the explicit multithreaded techniques in research and in commercial microprocessors.

...read moreread less

Abstract: Hardware multithreading is becoming a generally applied technique in the next generation of microprocessors. Several multithreaded processors are announced by industry or already into production in the areas of high-performance microprocessors, media, and network processors.A multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline. The contexts of two or more threads of control are often stored in separate on-chip register sets. Unused instruction slots, which arise from latencies during the pipelined execution of single-threaded programs by a contemporary microprocessor, are filled by instructions of other threads within a multithreaded processor. The execution units are multiplexed between the thread contexts that are loaded in the register sets.Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple threads each cycle. Simultaneous multithreaded processors combine the multithreading technique with a wide-issue superscalar processor to utilize a larger part of the issue bandwidth by issuing instructions from different threads simultaneously.Explicit multithreaded processors are multithreaded processors that apply processes or operating system threads in their hardware thread slots. These processors optimize the throughput of multiprogramming workloads rather than single-thread performance. We distinguish these processors from implicit multithreaded processors that utilize thread-level speculation by speculatively executing compiler- or machine-generated threads of control that are part of a single sequential program.This survey paper explains and classifies the explicit multithreading techniques in research and in commercial microprocessors.

...read moreread less

329 citations

Book•

Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition

[...]

Jim Jeffers, James Reinders, Avinash Sodani

31 May 2016

TL;DR: This book is an all-in-one source of information for programming the Second-Generation Intel Xeon Phi product family also called Knights Landing and provides detailed and timely Knights Landingspecific details, programming advice, and real-world examples.

...read moreread less

Abstract: This book is an all-in-one source of information for programming the Second-Generation Intel Xeon Phi product family also called Knights Landing. The authors provide detailed and timely Knights Landingspecific details, programming advice, and real-world examples. The authors distill their years of Xeon Phi programming experience coupled with insights from many expert customers Intel Field Engineers, Application Engineers, and Technical Consulting Engineers to create this authoritative book on the essentials of programming for Intel Xeon Phi products. Intel Xeon Phi Processor High-Performance Programming is useful even before you ever program a system with an Intel Xeon Phi processor. To help ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi processors, or other high-performance microprocessors. Applying these techniques will generally increase your program performance on any system and prepare you better for Intel Xeon Phi processors. A practical guide to the essentials for programming Intel Xeon Phi processors Definitive coverage of the Knights Landing architecture Presents best practices for portable, high-performance computing and a familiar and proven threads and vectors programming model Includes real world code examples that highlight usages of the unique aspects of this new highly parallel and high-performance computational productCovers use of MCDRAM, AVX-512, Intel Omni-Path fabric, many-cores (up to 72), and many threads (4 per core)Covers software developer tools, libraries and programming modelsCovers using Knights Landing as a processor and a coprocessor

...read moreread less

256 citations

Proceedings Article•10.5555/956417.956550•

IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems

[...]

Leonid Baraz, Tevi Devor, Orna Etzion, Shalom Goldenberg, Alex Skaletsky, Yun Wang, Yigel Zemach - Show less +3 more

3 Dec 2003

TL;DR: IA-32 EL as mentioned in this paper is a software that will ship with Itanium-based operating systems and will convert IA-32 instructions into Itanium instructions via dynamic translation via two-phase translation.

...read moreread less

Abstract: IA-32 execution layer (IA-32 EL) is a new technology that executes IA-32 applications on Intel Itanium processor family systems. Currently, support for IA-32 applications on Itanium-based platforms is achieved using hardware circuitry on the Itanium processors. This capability will be enhanced with IA-32 EL - software that will ship with Itanium-based operating systems and will convert IA-32 instructions into Itanium instructions via dynamic translation. In this paper, we describe aspects of the IA-32 execution layer technology, including the general two-phase translation architecture and the usage of a single translator for multiple operating systems. The paper provides details of some of the technical challenges such as precise exception, emulation of FP, MMX, and Intel streaming SIMD extension instructions, and misalignment handling. Finally, the paper presents some performance results.

...read moreread less

198 citations

...

Expand

Topic Tools

Papers published on a yearly basis

Papers

Simultaneous multithreading: maximizing on-chip parallelism

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

A survey of processors with explicit multithreading

Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition

IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems

Related Topics (5)

Performance Metrics

No. of papers in the topic in previous years
Year	Papers
2020	1
2019	2
2017	17
2016	24
2015	31
2014	29