Book Chapter10.1007/978-3-642-11515-8_9
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors
Paul M. Carpenter,Alex Ramirez,Eduard Ayguadé +2 more
- 25 Jan 2010
- pp 96-110
9
TL;DR: A feedback-directed algorithm that determines the size of each communication buffer, based on i) the stream program that has been mapped onto processors, ii) feedback from an earlier execution, and iii) the memory constraints is proposed, which has significantly better performance and latency.
read more
Abstract: Stream programming is a promising way to expose concurrency to the compiler. A stream program is built from kernels that communicate only via point-to-point streams. The stream compiler statically allocates these kernels to processors, applying blocking, fission and fusion transformations. The compiler determines the sizes of the communication buffers, which affects performance since local memories can be small.
In this paper, we propose a feedback-directed algorithm that determines the size of each communication buffer, based on i) the stream program that has been mapped onto processors, ii) feedback from an earlier execution, and iii) the memory constraints. The algorithm exposes a trade-off between throughput and latency. It is general, in that it applies to stream programs with unstructured stream graphs, and it supports variable execution times and communication rates.
We show results for the StreamIt benchmarks and random graphs. For the StreamIt benchmarks, throughput is optimal after the first iteration. For random graphs with stochastic computation times, throughput is within 3% of optimal after four iterations. Compared with the previous general algorithm, by Basten and Hoogerbrugge, our algorithm has significantly better performance and latency.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Exploring Trade-Offs inBuffer Requirements and Throughput Constraints forSynchronous Dataflow Graphs*
Sander Stuijk
- 01 Jan 2006
TL;DR: This work presents exact techniques to chart the Pareto space of throughput and storage tradeoffs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint.
166
ACOTES Project: Advanced Compiler Technologies for Embedded Streaming
Harm Munk,Eduard Ayguadé,Cédric Bastoul,Paul M. Carpenter,Zbigniew Chamski,Albert Cohen,Marco Cornero,Philippe Dumont,Marc Duranton,Mohammed Fellahi,Roger Ferrer,Razya Ladelsky,Menno Lindwer,Xavier Martorell,Cupertino Miranda,Dorit Nuzman,Andrea C. Ornstein,Antoniu Pop,Sebastian Pop,Louis-Noël Pouchet,Alex Ramirez,David Rodenas,Erven Rohou,Ira Rosen,Uzi Shvadron,Konrad Trifunovic,Ayal Zaks +26 more
TL;DR: The outcomes of the ACOTES project are presented, a 3-year collaborative work of industrial and academic partners, and the use of Advanced Compiler Technologies that were developed to support Embedded Streaming are advocated.
Optimizing explicit data transfers for data parallel applications on the cell architecture
Selma Saidi,Pranav Tendulkar,Thierry Lepley,Oded Maler +3 more
- 26 Jan 2012
TL;DR: This paper considers data-parallelizable programs that use the well-known double buffering technique to bring the data from the off-chip slow memory to the local memory of the cores via a DMA (direct memory access) mechanism, and derives optimal and near optimal values for the number of blocks that should be clustered in a single DMA command.
Distributed Memory Allocation Technique for Synchronous Dataflow Graphs
Karol Desnos,Maxime Pelcat,Jean-Francois Nezan,Slaheddine Aridhi +3 more
- 25 Oct 2016
TL;DR: A new distributed memory allocation technique for applications modeled with Synchronous Dataflow (SDF) graphs that enables a single MEG to be split into separate MEGs, each of which is associated with a memory bank accessible only by one core of the architecture.
The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors
Paul M. Carpenter,Alex Ramirez,Eduard Ayguadé +2 more
- 21 Jul 2009
TL;DR: This work presents a machine description and performance model for an iterative stream compilation flow, which represents the stream program running on a heterogeneous multiprocessor system with distributed or shared memory.
5
References
•Book
Computer Architecture: A Quantitative Approach
John L. Hennessy,David A. Patterson +1 more
- 01 Dec 1989
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
12.6K
Fibonacci heaps and their uses in improved network optimization algorithms
TL;DR: Using F-heaps, a new data structure for implementing heaps that extends the binomial queues proposed by Vuillemin and studied further by Brown, the improved bound for minimum spanning trees is the most striking.
Synchronous data flow
Edward A. Lee,David G. Messerschmitt +1 more
- 01 Sep 1987
TL;DR: A preliminary SDF software system for automatically generating assembly language code for DSP microcomputers is described, and two new efficiency techniques are introduced, static buffering and an extension to SDF to efficiently implement conditionals.
2K
Fibonacci Heaps And Their Uses In Improved Network Optimization Algorithms
Michael L. Fredman,Robert E. Tarjan +1 more
- 24 Oct 1984
TL;DR: The structure, Fibonacci heaps (abbreviated F-heaps), extends the binomial queues proposed by Vuillemin and studied further by Brown to obtain improved running times for several network optimization algorithms.
1.7K
•Book
Computer Architecture, Fifth Edition: A Quantitative Approach
John L. Hennessy,David A. Patterson +1 more
- 29 Sep 2011
TL;DR: The Fifth Edition of Computer Architecture focuses on this dramatic shift in the ways in which software and technology in the "cloud" are accessed by cell phones, tablets, laptops, and other mobile computing devices.
1K