Time squeezing for tiny devices
Yuanbo Fan,Simone Campanoni,Russ Joseph +2 more
- 22 Jun 2019
- pp 657-670
TL;DR: This paper describes compiler and architecture co-design that opens new opportunities for timing slack that are otherwise impossible, and introduces novel mechanisms in the hardware and in the compiler that work together to improve the benefit of circuit-level timing speculation by effectively squeezing time during execution.
read more
Abstract: Dynamic timing slack has emerged as a compelling opportunity for eliminating inefficiency in ultra-low power embedded systems. This slack arises when all the signals have propagated through logic paths well in advance of the clock signal. When it is properly identified, the system can exploit this unused cycle time for energy savings. In this paper, we describe compiler and architecture co-design that opens new opportunities for timing slack that are otherwise impossible. Through cross-layer optimization, we introduce novel mechanisms in the hardware and in the compiler that work together to improve the benefit of circuit-level timing speculation by effectively squeezing time during execution. This approach is particularly well-suited to tiny embedded devices. Our evaluation on a gate-level model of a complete processor shows that our co-design saves (on average) 40.5% of the original energy consumption (additional 16.5% compared to the existing clock scheduling technique) across 13 workloads while retaining transparency to developers.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Dynamic Timing Enhanced DNN Accelerator With Compute-Adaptive Elastic Clock Chain Technique
Tianyu Jia,Yuhao Ju,Jie Gu +2 more
TL;DR: In this paper, an elastic clock chain scheme was proposed to provide a flexible multi-domain clock management scheme for in situ compute adaptability for deep neural network (DNN) accelerators.
16
An Adaptive Clock Scheme Exploiting Instruction-Based Dynamic Timing Slack for a GPGPU Architecture
TL;DR: An adaptive clock scheme to exploit instruction-based dynamic timing slack (DTS) for a general-purpose graphics processor unit (GPGPU) architecture and an elastic pipeline clocking scheme is developed to redistribute the timing margin across pipeline stages for machine learning computations.
9
NOELLE Offers Empowering LLVM Extensions
02 Apr 2022
TL;DR: NOELLE as mentioned in this paper extends abstractions and functionalities provided by LLVM enabling advanced, program-wide code analyses and transformations, and shows the power of NOELLE by presenting a diverse set of 11 custom tools built upon it.
6
Introducing the pseudorandom value generator selection in the compilation toolchain
Michael Leonard,Simone Campanoni +1 more
- 22 Feb 2020
TL;DR: This work builds PRV Jeeves, the first fully automatic PRVG selector and provides the first deep study into the tradeoffs among the PRVGs in the C++ standard, finding no silver bullet for all programs and architectures.
3
Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis
TL;DR: This work proposes a near-data processing and better-than-worst-case co-design methodology to efficiently move the instruction execution to the DRAM side and, at the same time, to allow the pipeline to operate at higher clock frequencies compared to the worst-case approach.
2
References
Internet of Things for Smart Cities
TL;DR: This paper will present and discuss the technical solutions and best-practice guidelines adopted in the Padova Smart City project, a proof-of-concept deployment of an IoT island in the city of Padova, Italy, performed in collaboration with the city municipality.
LLVM: a compilation framework for lifelong program analysis & transformation
Chris Lattner,Vikram Adve +1 more
- 20 Mar 2004
TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.
The gem5 simulator
Nathan Binkert,Bradford M. Beckmann,Gabriel Black,Steven K. Reinhardt,Ali G. Saidi,Arkaprava Basu,Joel Hestness,Derek R. Hower,Tushar Krishna,Somayeh Sardashti,Rathijit Sen,Korey Sewell,Muhammad Shoaib,Nilay Vaish,Mark D. Hill,Darien Wood +15 more
TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
MiBench: A free, commercially representative embedded benchmark suite
Matthew R. Guthaus,Jeff Ringenberg,Daniel J. Ernst,Todd Austin,Trevor Mudge,Richard B. Brown +5 more
- 02 Dec 2001
TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.
3.7K
•Journal Article
Internet of Things for Smart Cities
Sneha A. Dalvi,Dr.M.Z. Shaikh +1 more
TL;DR: This paper focuses specifically to an urban IoT systems that, while still being quite a broad category, are characterized by their specific application domain and are designed to support the Smart City vision.
3.6K