Exploiting tightly-coupled cores
Daniel Bates,Alex Bradbury,Andreas Koltes,Robert Mullins +3 more
- 15 Jul 2013
- Vol. 80, Iss: 1, pp 103-120
TL;DR: This paper focuses on the design of a single 8-core tile, conceived as the building block for a larger many-core system, and explores the tile’s ability to support a range of parallelisation opportunities and detail the control and communication mechanisms needed to exploit each cores’ resources in a flexible manner.
read more
Abstract: The individual processors of a chip-multiprocessor traditionally have rigid boundaries. Inter-core communication is only possible via memory and control over a core's resources is localised. Specialisation necessary to meet today's challenging energy targets is typically provided through the provision of a range of processor types and accelerators. An alternative approach is to permit specialisation by tailoring the way a large number of homogeneous cores are used. The approach here is to relax processor boundaries, create a richer mix of inter-core communication mechanisms and provide finer-grain control over, and access to, the resources of each core. We evaluate one such design, called Loki, that aims to support specialisation in software on a homogeneous many-core architecture. We focus on the design of a single 8-core tile, conceived as the building block for a larger many-core system. We explore the tile's ability to support a range of parallelisation opportunities and detail the control and communication mechanisms needed to exploit each core's resources in a flexible manner. Performance and a detailed breakdown of energy usage is provided for a range of benchmarks and configurations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Always-on Vision Processing Unit for Mobile Applications
Brendan Barry,Cormac Brick,Fergal Connor,David Donohoe,David Moloney,Richard Richmond,Martin O'Riordan,Vasile Toma +7 more
TL;DR: The vision processing unit incorporates parallelism, instruction set architecture, and microarchitectural features to provide highly sustainable performance efficiency across a range of computational-Imaging and computer vision applications, including those with low latency requirements on the order of milliseconds.
138
Accelerating Database Systems Using FPGAs: A Survey
Philippos Papaphilippou,Wayne Luk +1 more
- 01 Aug 2018
TL;DR: This survey presents a systematic review of research relating to accelerating analytical database systems using FPGAs, including studies of database acceleration frameworks and accelerator implementations for various database operators.
Statistical Access Interval Prediction for Tightly Coupled Memory Systems
Robert Wittig,Mattis Hasler,Emil Matus,Gerhard Fettweis +3 more
- 01 Apr 2019
TL;DR: It is argued that most memory transaction of embedded processors can be reliably predicted in the time domain, therefore, preallocation of shared resources can be used to avoid collisions in the memory system.
9
Access Interval Prediction for Tightly Coupled Memory Systems.
Robert Wittig,Friedrich Pauls,Emil Matus,Gerhard Fettweis +3 more
- 07 Jul 2019
TL;DR: A method for memory Access Interval Prediction is introduced that minimizes conflicts by predicting the interval between two consecutive memory accesses, thereby significantly reducing the number of access conflicts.
7
Accelerating control-flow intensive code in spatial hardware
Ali Mustafa Zaidi
- 01 Jan 2015
TL;DR: This work demonstrates that it is possible to use custom and/or reconfigurable hardware in heterogeneous systems to improve the efficiency of frequently executed sequential code, without compromising performance relative to an energy inefficient out-of-order superscalar processor.
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
An Analysis of Variance Test for Normality (Complete Samples)
S. S. Shapiro,M. B. Wilk +1 more
TL;DR: In this article, a new statistical procedure for testing a complete sample for normality is introduced, which is obtained by dividing the square of an appropriate linear combination of the sample order statistics by the usual symmetric estimate of variance.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
The gem5 simulator
Nathan Binkert,Bradford M. Beckmann,Gabriel Black,Steven K. Reinhardt,Ali G. Saidi,Arkaprava Basu,Joel Hestness,Derek R. Hower,Tushar Krishna,Somayeh Sardashti,Rathijit Sen,Korey Sewell,Muhammad Shoaib,Nilay Vaish,Mark D. Hill,Darien Wood +15 more
TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
MiBench: A free, commercially representative embedded benchmark suite
Matthew R. Guthaus,Jeff Ringenberg,Daniel J. Ernst,Todd Austin,Trevor Mudge,Richard B. Brown +5 more
- 02 Dec 2001
TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.
3.7K