Proceedings Article10.1145/2541940.2541989
Challenging the "embarrassingly sequential": parallelizing finite state machine-based computations through principled speculation
Zhijia Zhao,Bo Wu,Xipeng Shen +2 more
- 24 Feb 2014
- Vol. 42, Iss: 1, pp 543-558
TL;DR: This paper offers the first disciplined way to exploit application-specific information to inform speculations for parallelization, and presents a probabilistic model that captures the relations between speculative executions and the properties of the target FSM and its inputs.
read more
Abstract: Finite-State Machine (FSM) applications are important for many domains. But FSM computation is inherently sequential, making such applications notoriously difficult to parallelize. Most prior methods address the problem through speculations on simple heuristics, offering limited applicability and inconsistent speedups. This paper provides some principled understanding of FSM parallelization, and offers the first disciplined way to exploit application-specific information to inform speculations for parallelization. Through a series of rigorous analysis, it presents a probabilistic model that captures the relations between speculative executions and the properties of the target FSM and its inputs. With the formulation, it proposes two model-based speculation schemes that automatically customize themselves with the suitable configurations to maximize the parallelization benefits. This rigorous treatment yields near-linear speedup on applications that state-of-the-art techniques can barely accelerate.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing
Amir Hossein Nodehi Sabet,Junqiao Qiu,Zhijia Zhao +2 more
- 19 Mar 2018
TL;DR: Inspired by the question, Tigr is introduced -- a graph transformation framework that can effectively reduce the irregularity of real-world graphs with correctness guarantees for a wide range of graph analytics.
94
A Survey on Thread-Level Speculation Techniques
TL;DR: This work introduces the technique, presents a taxonomy of TLS solutions, and summarizes and put into perspective the most relevant advances in this field.
Grammar-aware Parallelization for Scalable XPath Querying
Lin Jiang,Zhijia Zhao +1 more
- 26 Jan 2017
TL;DR: GAP leverages static analysis to infer feasible execution paths for specific con- texts based on the grammar of the semi-structured data and reduces the execution paths from all paths to a minimum, therefore maximizing the parallelization efficiency and scalability.
12
In-/Near-Memory Computing
TL;DR: This book provides a structured introduction of the key concepts and techniques that enable in-/near-memory computing.
11
Space-efficient multi-versioning for input-adaptive feedback-driven program optimizations
Mingzhou Zhou,Xipeng Shen,Yaoqing Gao,Graham Yiu +3 more
- 15 Oct 2014
TL;DR: This study proves selecting the best set of versions under a space constraint is NP-complete and proposes a heuristic algorithm named CHoGS which yields near optimal results in quadratic time.
References
•Book
Compilers: Principles, Techniques, and Tools
Alfred V. Aho,Ravi Sethi,Jeffrey D. Ullman +2 more
- 01 Jan 1986
TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.
9.7K
Transactional memory: architectural support for lock-free data structures
Maurice Herlihy,J. Eliot B. Moss +1 more
- 01 May 1993
TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
The Landscape of Parallel Computing Research: A View from Berkeley
Krste Asanovic,Ras Bodik,Bryan Catanzaro,Joseph Gebis,Parry Husbands,Kurt Keutzer,David A. Patterson,William Plishker,John Shalf,Samuel Williams,Katherine Yelick +10 more
- 18 Dec 2006
TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.
X10: an object-oriented approach to non-uniform cluster computing
Philippe Charles,Christian Grothoff,Vijay Saraswat,Christopher Michael Donawa,Allan H. Kielstra,Kemal Ebcioglu,Christoph von Praun,Vivek Sarkar +7 more
- 12 Oct 2005
TL;DR: A modern object-oriented programming language, X10, is designed for high performance, high productivity programming of NUCC systems and an overview of the X10 programming model and language, experience with the reference implementation, and results from some initial productivity comparisons between the X 10 and Java™ languages are presented.
The implementation of the Cilk-5 multithreaded language
Matteo Frigo,Charles E. Leiserson,Keith H. Randall +2 more
- 01 May 1998
TL;DR: Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler are presented.
Related Papers (5)
Zhijia Zhao,Xipeng Shen +1 more
- 14 Mar 2015
Prakash Prabhu,Ganesan Ramalingam,Kapil Vaswani +2 more
- 05 Jun 2010