Software data-triggered threads
Hung-Wei Tseng,Dean M. Tullsen +1 more
- 19 Oct 2012
- Vol. 47, Iss: 10, pp 703-716
TL;DR: This work proposes a pure software solution that supports the DTT model without any hardware support, using a prototype compiler and runtime libraries running on top of existing machines to improve the performance of serial C SPEC benchmarks.
read more
Abstract: The data-triggered threads (DTT) programming and execution model can increase parallelism and eliminate redundant computation. However, the initial proposal requires significant architecture support, which impedes existing applications and architectures from taking advantage of this model. This work proposes a pure software solution that supports the DTT model without any hardware support. This research uses a prototype compiler and runtime libraries running on top of existing machines. Several enhancements to the initial software implementation are presented, which further improve the performance.The software runtime system improves the performance of serial C SPEC benchmarks by 15% on a Nehalem processor, but by over 7X over the full suite of single-thread applications. It is shown that the DTT model can work in conjunction with traditional parallelism. The DTT model provides up to 64X speedup over parallel applications exploiting traditional parallelism.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
iThreads: A Threading Library for Parallel Incremental Computation
Pramod Bhatotia,Pedro Fonseca,Umut A. Acar,Bjorn B. Brandenburg,Rodrigo Rodrigues +4 more
- 14 Mar 2015
TL;DR: iThreads is described, a threading library for parallel incremental computation that supports unmodified shared-memory multithreaded programs and can be used as a replacement for pthreads by a simple exchange of dynamically linked libraries, without even recompiling the application code.
Incremental Parallel and Distributed Systems
Pramod Bhatotia
- 01 Jan 2015
TL;DR: This thesis presents incremental parallel and distributed systems that enable existing real-world applications to automatically benefit from efficient incremental updates, and shows that significant performance can be achieved for existing applications without requiring any additional effort from programmers.
DaSH: a benchmark suite for hybrid dataflow and shared memory programming models: with comparative evaluation of three hybrid dataflow models
Vladimir Gajinov,Srđan Stipić,Igor Erić,Osman Unsal,Eduard Ayguadé,Adrian Cristal +5 more
- 20 May 2014
TL;DR: This paper presents DaSH - the first comprehensive benchmark suite for hybrid dataflow and shared memory programming models, and uses DaSH to evaluate three different hybrid data flow models, identify their advantages and shortcomings, and motivate further research on their characteristics.
21
Compile-Time Silent-Store Elimination for Energy Efficiency: an Analytic Evaluation for Non-Volatile Cache Memory
Rabab Bouziane,Erven Rohou,Abdoulaye Gamatié +2 more
- 22 Jan 2018
TL;DR: This paper implements a code optimization in LLVM for reducing so-called silent stores, i.e., store instruction instances that write to memory values that were already present there, which makes this optimization portable over any architecture supporting LLVM.
Patent
System and Method for Implementing Constrained Data-Driven Parallelism
Virendra J. Marathe,Yosef Lev,Victor Luchangco +2 more
- 07 Oct 2013
TL;DR: In this paper, a task group is defined and a task may be added to the same task group as the given task, and a deferred keyword may control whether a task is to be executed in the current execution phase or its execution is deferred to a subsequent execution phase for the task group.
11
References
Simultaneous multithreading: maximizing on-chip parallelism
Dean M. Tullsen,Susan J. Eggers,Henry M. Levy +2 more
- 01 May 1995
TL;DR: Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi-threading, and is an attractive alternative to single-chip multiprocessors.
The implementation of the Cilk-5 multithreaded language
Matteo Frigo,Charles E. Leiserson,Keith H. Randall +2 more
- 01 May 1998
TL;DR: Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler are presented.
“Memo” Functions and Machine Learning
TL;DR: A simple but effective rote-learning facility can be provided within the framework of a suitable programming language to improve the efficiency of computer programs during execution.
609
Executing a program on the MIT tagged-token dataflow architecture
Arvind,Rishiyur S. Nikhil +1 more
TL;DR: An overview of current thinking on dataflow architecture is provided by describing example Id programs, their compilation to dataflow graphs, and their execution on the TTDA, a multiprocessor architecture.
508
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Harish Patil,Robert Cohn,Mark J. Charney,Rajiv Kapoor,Andrew Y. Sun,Anand Karunanidhi +5 more
- 04 Dec 2004
TL;DR: This work uses the well-known SimPoint methodology to find representative portions of an application to simulate, and develops a toolkit that automatically detects PinPoints, validates whether they are representative using hardware performance counters, and generates traces for large Itanium® programs.
Related Papers (5)
Hung-Wei Tseng,Dean M. Tullsen +1 more
- 12 Feb 2011
Matthew A. Hammer,Umut A. Acar,Yan Chen +2 more
- 15 Jun 2009
Guoyang Chen,Xipeng Shen +1 more
- 05 Dec 2015