Exploring processor parallelism: Estimation methods and optimization strategies
Roel Jordans,Rosilde Corvino,Lech Jozwiak,Henk Corporaal +3 more
- 08 Apr 2013
- Vol. 4, Iss: 2, pp 18-23
TL;DR: In this article, the issue-width of an application specific VLIW issue is automatically selected based on a force-based parallelism measure, which is capable of estimating the required issuewidth within 3% on average.
read more
Abstract: Former research on automatic exploration of ASIP architectures mostly focused on either the internal memory hierarchy, or the addition of complex custom operations to RISC based architectures. This paper focuses on VLIW architectures and, more specifically, on automating the selection of an application specific VLIW issue-width. An accurate and efficient issue-width estimation strongly influences all the important processor properties (e.g. processing speed, silicon area, and power consumption). We first compare different methods for estimating the required issue-width, and subsequently introduce a new force-based parallelism measure which is capable of estimating the required issue-width within 3% on average. Moreover, we show that we can quickly estimate the latency-parallelism Pareto-front of an example ECG application with less than 10% error using our issue-width estimations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
ASAM: Automatic architecture synthesis and application mapping
Lech Jozwiak,Menno Lindwer,Rosilde Corvino,Paolo Meloni,Laura Micconi,Jan Madsen,Erkan Diken,Deepak Gangadharan,Roel Jordans,Sebastiano Pomata,Paul Pop,Giuseppe Tuveri,Luigi Raffo,Giuseppe Notarangelo +13 more
TL;DR: An overview of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program is presented, which system, design, and electronic design automation (EDA) concepts seem to be adequate to address the challenges and solve the problems.
30
Embedded Computing Technology for Highly-demanding Cyber-physical Systems
TL;DR: The embedded computing technology needed for the modern complex and highly-demanding mobile and autonomous CPS is discussed.
28
Advanced mobile and wearable systems
TL;DR: The huge heterogeneous area of these systems, and serious issues and challenges in their design are considered, and the embedded computing and design technologies needed to adequately address the issues and overcome the challenges in order to satisfy the stringent requirements of the modern mobile systems are discussed.
24
ASAM: Automatic Architecture Synthesis and Application Mapping
Lech Jozwiak,Menno Lindwer,Rosilde Corvino,Paolo Meloni,Laura Micconi,Jan Madsen,Erkan Diken,Deepak Gangadharan,Roel Jordans,Sebastiano Pomata,Paul Pop,Giuseppe Tuveri,Luigi Raffo +12 more
- 05 Sep 2012
TL;DR: An over-view of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program is presented and which system, design, and electronic design automation concepts seem to be adequate to resolve the problems and address the challenges.
Construction and exploitation of VLIW ASIPs with heterogeneous vector-widths
TL;DR: The use of heterogeneous vector widths and a method to explore the heterogeneousvector widths for VLIW ASIPs are proposed and the associated design automation tools are explained.
13
References
Dynamic dependency analysis of ordinary programs
Todd Austin,Gurindar S. Sohi +1 more
- 01 Apr 1992
TL;DR: This paper presents a methodology for constructing the dynamic execution graph that characterizes the execution of an ordinary program (an application program written in an imperatibve language such as C or FORTRAN) from a serial execution trace of the program and uses the methodology to study parallelism in the SPEC benchmarks.
On the limits of program parallelism and its smoothability
Kevin B. Theobald,Guang R. Gao,Laurie Hendren +2 more
- 10 Dec 1992
TL;DR: A new study of instruction-level parallelism is reported, which examines aspects not covered in previous studies, including the effects of various memory reuse policies and long-latency operations, and the results achieved when large benchmarks are allowed to run to completion.
79
Improving software pipelining with unroll-and-jam
Steve Carr,Chen Ding,Philip Sweany +2 more
- 03 Jan 1996
TL;DR: It is demonstrated how unroll-and-jam can significantly improve the initiation interval in a software-pipelined loop.
Efficient DAG construction and heuristic calculation for instruction scheduling
Mark Smotherman,Sanjay Krishnamurthy,P. S. Aravind,David Hunnicutt +3 more
- 01 Sep 1991
TL;DR: This paper explores the efficiency of three DAG construction algorithms and survey 26 proposed heuristics and their methods of calculation and shows the tablebuilding algorithms to be extremely efficient for programs with large basic blocks and yet appropriately handle the problem of retaining important transitive arcs.
50
Automatic selection of application-specific reconfigurable processor extensions
Christophe Wolinski,Krzysztof Kuchcinski +1 more
- 10 Mar 2008
TL;DR: Experimental results show that the presented method provides high coverage of application graphs with small number of patterns and ensures high application execution speed-up both for sequential and parallel application execution with processor extensions implementing selected patterns.
Related Papers (5)
Hillery C. Hunter,Jaime H. Moreno +1 more
- 30 Oct 2003
Jesmin Jahan Tithi,Neal Crago,Joel Emer +2 more
- 23 Mar 2014