Topic

Microarchitecture

About: Microarchitecture is a research topic. Over the lifetime, 4218 publications have been published within this topic receiving 102706 citations. The topic is also known as: computer organization.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/1669112.1669172•

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

[...]

Sheng Li¹, Jung Ho Ahn², Richard Strong³, Jay B. Brockman¹, Dean M. Tullsen³, Norman P. Jouppi⁴ - Show less +2 more•Institutions (4)

University of Notre Dame¹, Seoul National University², University of California, San Diego³, Hewlett-Packard⁴

12 Dec 2009

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.

...read moreread less

Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

...read moreread less

2,657 citations

Journal Article•10.1145/1347375.1347389•

The worst-case execution-time problem—overview of methods and survey of tools

[...]

Reinhard Wilhelm¹, Jakob Engblom, Andreas Ermedahl², Niklas Holsti, Stephan Thesing¹, David Whalley³, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra⁴, Frank Mueller⁵, Isabelle Puaut, Peter Puschner, Jan Staschulat⁶, Per Stenström⁷ - Show less +11 more•Institutions (7)

Saarland University¹, Mälardalen University College², Florida State University³, National University of Singapore⁴, North Carolina State University⁵, Braunschweig University of Technology⁶, Chalmers University of Technology⁷

08 May 2008-ACM Transactions in Embedded Computing Systems

TL;DR: Different approaches to the determination of upper bounds on execution times are described and several commercially available tools1 and research prototypes are surveyed.

...read moreread less

Abstract: The determination of upper bounds on execution times, commonly called worst-case execution times (WCETs), is a necessary step in the development and validation process for hard real-time systems. This problem is hard if the underlying processor architecture has components, such as caches, pipelines, branch prediction, and other speculative components. This article describes different approaches to this problem and surveys several commercially available tools1 and research prototypes.

...read moreread less

2,105 citations

Journal Article•10.1109/2.982917•

SimpleScalar: an infrastructure for computer system modeling

[...]

Todd Austin¹, Eric D. Larson¹, Daniel J. Ernst¹•Institutions (1)

University of Michigan¹

01 Feb 2002-IEEE Computer

TL;DR: The SimpleScalar tool set provides an infrastructure for simulation and architectural modeling that can model a variety of platforms ranging from simple unpipelined processors to detailed dynamically scheduled microarchitectures with multiple-level memory hierarchies.

...read moreread less

Abstract: Designers can execute programs on software models to validate a proposed hardware design's performance and correctness, while programmers can use these models to develop and test software before the real hardware becomes available. Three critical requirements drive the implementation of a software model: performance, flexibility, and detail. Performance determines the amount of workload the model can exercise given the machine resources available for simulation. Flexibility indicates how well the model is structured to simplify modification, permitting design variants or even completely different designs to be modeled with ease. Detail defines the level of abstraction used to implement the model's components. The SimpleScalar tool set provides an infrastructure for simulation and architectural modeling. It can model a variety of platforms ranging from simple unpipelined processors to detailed dynamically scheduled microarchitectures with multiple-level memory hierarchies. SimpleScalar simulators reproduce computing device operations by executing all program instructions using an interpreter. The tool set's instruction interpreters also support several popular instruction sets, including Alpha, PPC, x86, and ARM.

...read moreread less

1,804 citations

Proceedings Article•10.1109/ISPASS.2009.4919648•

Analyzing CUDA workloads using a detailed GPU simulator

[...]

Ali Bakhoda¹, George L. Yuan¹, Wilson W. L. Fung¹, Henry Wong¹, Tor M. Aamodt¹ - Show less +1 more•Institutions (1)

University of British Columbia¹

26 Apr 2009

TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.

...read moreread less

Abstract: Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow's manycore processors, whether those are GPUs or otherwise. The combination of multiple, multithreaded, SIMD cores makes studying these GPUs useful in understanding tradeoffs among memory, data, and thread level parallelism. While modern GPUs offer orders of magnitude more raw computing power than contemporary CPUs, many important applications, even those with abundant data level parallelism, do not achieve peak performance. This paper characterizes several non-graphics applications written in NVIDIA's CUDA programming model by running them on a novel detailed microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set. For this study, we selected twelve non-trivial CUDA applications demonstrating varying levels of performance improvement on GPU hardware (versus a CPU-only sequential version of the application). We study the performance of these applications on our GPU performance simulator with configurations comparable to contemporary high-end graphics cards. We characterize the performance impact of several microarchitecture design choices including choice of interconnect topology, use of caches, design of memory controller, parallel workload distribution mechanisms, and memory request coalescing hardware. Two observations we make are (1) that for the applications we study, performance is more sensitive to interconnect bisection bandwidth rather than latency, and (2) that, for some applications, running fewer threads concurrently than on-chip resources might otherwise allow can improve performance by reducing contention in the memory system.

...read moreread less

1,803 citations

Proceedings Article•10.1109/SP.2019.00002•

Spectre Attacks: Exploiting Speculative Execution

[...]

Paul C. Kocher, Jann Horn¹, Anders Fogh, Daniel Genkin², Daniel Gruss³, Werner Haas, Mike Hamburg⁴, Moritz Lipp³, Stefan Mangard³, Thomas Prescher, Michael Schwarz³, Yuval Yarom⁵ - Show less +8 more•Institutions (5)

Google¹, University of Pennsylvania², Graz University of Technology³, Cryptography Research⁴, University of Adelaide⁵

19 May 2019

TL;DR: Spectre as mentioned in this paper is a side channel attack that can leak the victim's confidential information via side channel to the adversary. And it can read arbitrary memory from a victim's process.

...read moreread less

Abstract: Modern processors use branch prediction and speculative execution to maximize performance. For example, if the destination of a branch depends on a memory value that is in the process of being read, CPUs will try to guess the destination and attempt to execute ahead. When the memory value finally arrives, the CPU either discards or commits the speculative computation. Speculative logic is unfaithful in how it executes, can access the victim's memory and registers, and can perform operations with measurable side effects. Spectre attacks involve inducing a victim to speculatively perform operations that would not occur during correct program execution and which leak the victim's confidential information via a side channel to the adversary. This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim's process. More broadly, the paper shows that speculative execution implementations violate the security assumptions underpinning numerous software security mechanisms, including operating system process separation, containerization, just-in-time (JIT) compilation, and countermeasures to cache timing and side-channel attacks. These attacks represent a serious threat to actual systems since vulnerable speculative execution capabilities are found in microprocessors from Intel, AMD, and ARM that are used in billions of devices. While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruction set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.

...read moreread less

1,720 citations

...

Expand

Performance Metrics

4,589

Papers

38,128

Citations

No. of papers in the topic in previous years
Year	Papers
2026	1
2025	32
2024	71
2023	91
2022	149
2021	112

Microarchitecture

Topic Tools

Papers published on a yearly basis

Papers

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

The worst-case execution-time problem—overview of methods and survey of tools

SimpleScalar: an infrastructure for computer system modeling

Analyzing CUDA workloads using a detailed GPU simulator

Spectre Attacks: Exploiting Speculative Execution

Related Topics (5)

Performance Metrics