Dataflow: A Complement to Superscalar
Mihai Budiu,Pedro V. Artigas,Seth Copen Goldstein +2 more
- 20 Mar 2005
- Vol. 11, Iss: 5, pp 177-186
TL;DR: This paper analyzes the performance of a class of static dataflow machines on integer media and control-intensive programs and explains why a dataflow machine, even with unlimited resources, does not always outperform a superscalar processor on general-purpose codes.
read more
Abstract: There has been a resurgence of interest in dataflow architectures, because of their potential for exploiting parallelism with low overhead. In this paper we analyze the performance of a class of static dataflow machines on integer media and control-intensive programs and we explain why a dataflow machine, even with unlimited resources, does not always outperform a superscalar processor on general-purpose codes, under the assumption that both machines take the same time to execute basic operations. We compare a program-specific dataflow machine with unlimited parallelism to a superscalar processor running the same program. While the dataflow machines provide very good performance on most data-parallel programs, we show that the dataflow machine cannot always take advantage of the available parallelism. Using the dynamic critical path we investigate the mechanisms used by superscalar processors to provide a performance advantage and their impact on a dataflow model
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Dynamically Scheduled High-level Synthesis
Lana Josipovic,Radhika Ghosal,Paolo Ienne +2 more
- 15 Feb 2018
TL;DR: This work shows that high-level synthesis of dynamically scheduled circuits is perfectly feasible by describing the implementation of a prototype synthesizer which generates a particular form of latency-insensitive synchronous circuits.
118
Exploring the potential of heterogeneous von neumann/dataflow execution models
Tony Nowatzki,Vinay Gangadhar,Karthikeyan Sankaralingam +2 more
- 13 Jun 2015
TL;DR: It is made the observation that if both out-of-order and explicit-dataflow were available in one processor, many types of GPP cores can benefit from dynamically switching during certain phases of an application's lifetime.
A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms
Jian Weng,Sihao Liu,Zhengrong Wang,Vidushi Dadu,Tony Nowatzki +4 more
- 01 Feb 2020
TL;DR: This work develops a novel execution model, inductive dataflow, where inductive dependence patterns and memory access patterns (streams) are first-order primitives, and develops a hybrid spatial architecture combining systolic and tagged dataflow execution to attain high utilization at low energy and area cost.
70
Buffer Placement and Sizing for High-Performance Dataflow Circuits
Lana Josipovic,Shabnam Sheikhha,Andrea Guerrieri,Paolo Ienne,Jordi Cortadella +4 more
- 23 Feb 2020
TL;DR: This work shows how to strategically place buffers into a dataflow circuit to optimize its performance and extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing.
42
Performance and power of cache-based reconfigurable computing
Andrew Putnam,Susan J. Eggers,Dave Bennett,Eric F. Dellinger,Jeffrey M. Mason,Henry E. Styles,Prasanna Sundararajan,Ralph D. Wittig +7 more
- 20 Jun 2009
TL;DR: The analyses and optimizations of the CHiMPS compiler that construct many-cache caches are presented, showing a performance advantage of 7.8x over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater.
38
References
The future of wires
R. Ho,Ken Mai,Mark Horowitz +2 more
- 01 Apr 2001
TL;DR: Wires that shorten in length as technologies scale have delays that either track gate delays or grow slowly relative to gate delays, which is good news since these "local" wires dominate chip wiring.
An open graph visualization system and its applications to software engineering
TL;DR: A package of practical tools and libraries for manipulating graphs and their drawings that includes stream and event interfaces for graph operations, high-quality static and dynamic layout algorithms, and the ability to handle sizable graphs is described.
1.3K
Multiscalar processors
Gurindar S. Sohi,Scott E. Breach,T. N. Vijaykumar +2 more
- 01 May 1995
TL;DR: The philosophy of the multiscalar paradigm, the structure ofMultiscalar programs, and the hardware architecture of a multiscalars processor are presented.
929
Complexity-effective superscalar processors
Subbarao Palacharla,Norman P. Jouppi,James E. Smith +2 more
- 01 May 1997
TL;DR: A microarchitecture that simplifies wakeup and selection logic is proposed and discussed, which will help minimize performance degradation due to slow bypasses in future wide-issue machines.
Limits of instruction-level parallelism
David W. Wall
- 01 Apr 1991
TL;DR: The results of simulations of 18 different test programs under 375 different models of available parallelism analysis are presented, showing how simulations based on instruction traces can model techniques at the limits of feasibility and even beyond.
Related Papers (5)
Jordi Cortadella,Michael Kishinevsky,Bill Grundmann +2 more
- 24 Jul 2006
John L. Hennessy,David A. Patterson +1 more
- 01 Dec 1989