Topic

Runahead

About: Runahead is a research topic. Over the lifetime, 100 publications have been published within this topic receiving 9935 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/232973.232993•

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

[...]

Dean M. Tullsen¹, Susan J. Eggers¹, Joel Emer, Henry M. Levy¹, Jack L. Lo¹, Rebecca L. Stamm - Show less +2 more•Institutions (1)

University of Washington¹

1 May 1996

TL;DR: This paper presents an architecture for simultaneous multithreading that minimizes the architectural impact on the conventional superscalar design, has minimal performance impact on a single thread executing alone, and achieves significant throughput gains when running multiple threads.

...read moreread less

Abstract: Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.

...read moreread less

854 citations

Proceedings Article•10.1145/264107.264207•

Prefetching using Markov predictors

[...]

Doug Joseph¹, Dirk Grunwald²•Institutions (2)

IBM¹, University of Colorado Boulder²

1 May 1997

TL;DR: The Markov prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing computer designs and reduces the overall execution stalls due to instruction and data memory operations by an average of 54% for various commercial benchmarks while only using two thirds the memory of a demand-fetch cache organization.

...read moreread less

Abstract: Prefetching is one approach to reducing the latency of memory operations in modern computer systems. In this paper, we describe the Markov prefetcher. This prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing computer designs. The Markov prefetcher is distinguished by prefetching multiple reference predictions from the memory subsystem, and then prioritizing the delivery of those references to the processor.This design results in a prefetching system that provides good coverage, is accurate and produces timely results that can be effectively used by the processor. In our cycle-level simulations, the Markov Prefetcher reduces the overall execution stalls due to instruction and data memory operations by an average of 54% for various commercial benchmarks while only using two thirds the memory of a demand-fetch cache organization.

...read moreread less

679 citations

Proceedings Article•10.1109/HPCA.2003.1183532•

Runahead execution: an alternative to very large instruction windows for out-of-order processors

[...]

Onur Mutlu, Jared Stark¹, Christopher B. Wilkerson², Yale N. Patt²•Institutions (2)

University of Texas at Austin¹, Intel²

8 Feb 2003

TL;DR: This paper proposes runahead execution as an effective way to increase memory latency tolerance in an out-of-order processor without requiring an unreasonably large instruction window.

...read moreread less

Abstract: Today's high performance processors tolerate long latency operations by means of out-of-order execution. However, as latencies increase, the size of the instruction window must increase even faster if we are to continue to tolerate these latencies. We have already reached the point where the size of an instruction window that can handle these latencies is prohibitively large in terms of both design complexity and power consumption. And, the problem is getting worse. This paper proposes runahead execution as an effective way to increase memory latency tolerance in an out-of-order processor without requiring an unreasonably large instruction window. Runahead execution unblocks the instruction window blocked by long latency operations allowing the processor to execute far ahead in the program path. This results in data being prefetched into caches long before it is needed. On a machine model based on the Intel/spl reg/ Pentium/spl reg/ processor, having a 128-entry instruction window, adding runahead execution improves the IPC (instructions per cycle) by 22% across a wide range of memory intensive applications. Also, for the same machine model, runahead execution combined with a 128-entry window performs within 1% of a machine with no runahead execution and a 384-entry instruction window.

...read moreread less

552 citations

Proceedings Article•10.1145/125826.125932•

An effective on-chip preloading scheme to reduce data access penalty

[...]

Jean-Loup Baer¹, Tien-Fu Chen¹•Institutions (1)

University of Washington¹

1 Aug 1991

TL;DR: In this article, a new hardware prefetching scheme based on the prediction of the execution of the instruction stream and associated operand references is proposed. But this scheme requires the use of a reference prediction table and its associated logic.

...read moreread less

Abstract: Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blockIookahead technique, or compiler-directed, with insertions of non-blocking prefetch instructions. We introduce a new hardware scheme based on the prediction of the execution of the instruction stream and associated operand references. It consists of a reference prediction table and a look-ahead program counter and its associated logic. With this scheme, data with regular access patterns is preloaded, independently of the stride size, and preloading of data with irregular access patterns is prevented. We evaluate our design through trace driven simulation by comparing it with a pure data cache approach under three different memory access models. Our experiments show that this scheme is very effective for reducing the data access penalty for scientific programs and that is has moderate success for other applications.

...read moreread less

499 citations

Journal Article•10.1145/356989.357013•

Slipstream processors: improving both performance and fault tolerance

[...]

Karthik Sundaramoorthy¹, Zach Purser¹, Eric Rotenburg¹•Institutions (1)

North Carolina State University¹

12 Nov 2000

TL;DR: This work proposes creating a shorter but otherwise equivalent version of the original program by removing ineffectual computation and computation related to highly-predictable control flow by running concurrently with the full program on a chip multiprocessor or simultaneous multithreaded processor.

...read moreread less

Abstract: Processors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the original program by removing ineffectual computation and computation related to highly-predictable control flow. The shortened program is run concurrently with the full program on a chip multiprocessor simultaneous multithreaded processor, with two key advantages:1) Improved single-program performance. The shorter program speculatively runs ahead of the full program and supplies the full program with control and data flow outcomes. The full program executes efficiently due to the communicated outcomes, at the same time validating the speculative, shorter program. The two programs combined run faster than the original program alone. Detailed simulations of an example implementation show an average improvement of 7% for the SPEC95 integer benchmarks.2) Fault tolerance. The shorter program is a subset of the full program and this partial-redundancy is transparently leveraged for detecting and recovering from transient hardware faults.

...read moreread less

323 citations

...

Expand

Performance Metrics

100

Papers

826

Citations

No. of papers in the topic in previous years
Year	Papers
2021	3
2020	5
2019	1
2018	1
2017	2
2016	6

Runahead

Topic Tools

Papers published on a yearly basis

Papers

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

Prefetching using Markov predictors

Runahead execution: an alternative to very large instruction windows for out-of-order processors

An effective on-chip preloading scheme to reduce data access penalty

Slipstream processors: improving both performance and fault tolerance

Related Topics (5)

Performance Metrics