Journal Article10.1145/3508391
TensorRT-Based Framework and Optimization Methodology for Deep Learning Inference on Jetson Boards
Eunji Jeong,Jangryul Kim,Soonhoi Ha +2 more
77
TL;DR: TensorRT as mentioned in this paper is a framework supporting various optimization parameters to accelerate a deep learning application targeted on an NVIDIA Jetson embedded platform with heterogeneous processors, including multi-threading, pipelining, buffer assignment, and network duplication.
read more
Abstract: As deep learning inference applications are increasing in embedded devices, an embedded device tends to equip neural processing units (NPUs) in addition to a multi-core CPU and a GPU. NVIDIA Jetson AGX Xavier is an example. For fast and efficient development of deep learning applications, TensorRT is provided as the SDK for high-performance inference, including an optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. Like most deep learning frameworks, TensorRT assumes that the inference is executed on a single processing element, GPU or NPU, not both. In this article, we present a TensorRT-based framework supporting various optimization parameters to accelerate a deep learning application targeted on an NVIDIA Jetson embedded platform with heterogeneous processors, including multi-threading, pipelining, buffer assignment, and network duplication. Since the design space of allocating layers to diverse processing elements and optimizing other parameters is huge, we devise a parameter optimization methodology that consists of a heuristic for balancing pipeline stages among heterogeneous processors and fine-tuning the process for optimizing parameters. With nine real-life benchmarks, we could achieve 101%~680% performance improvement and up to 55% energy reduction over the baseline inference using a GPU only.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
YOLO-GD: A Deep Learning-Based Object Detection Algorithm for Empty-Dish Recycling Robots
TL;DR: A deep learning-based object detection algorithm for empty-dish recycling robots to automatically recycle dishes in restaurants and canteens, etc, using a lightweight object detection model YOLO-GD.
Optimization of Edge Resources for Deep Learning Application with Batch and Model Management
TL;DR: The result shows that the proposed method can optimize the usage of edge resources for real-time video analysis deep learning applications by modifying the batch size of the input of an inference application.
YOLOv5-R: lightweight real-time detection based on improved YOLOv5
11 Jun 2022
TL;DR: YOLOv5-R as mentioned in this paper is a real-time detection algorithm based on channel attention (ECA) module and depthwise separable convolution (DSC) module.
15
Investigating hardware and software aspects in the energy consumption of machine learning: A green AI‐centric analysis
TL;DR: In this article , the authors present an up-to-date revision of the literature and assess it through experiments and evaluate the use of ARM-based single-board computers for training Machine Learning algorithms.
13
YOLOv5-MS: Real-Time Multi-Surveillance Pedestrian Target Detection Model for Smart Cities
Fangzheng Song,Peng Li +1 more
TL;DR: The empirical findings from the internally developed smart city dataset unveil YOLOv5-MS’s impressive 96.5% mAP value, indicating a significant 2.0% advancement over YOLov5s, and the average inference speed demonstrates a notable 21.3% increase.
11
References
•Posted Content
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.
25.9K
•Posted Content
YOLOv3: An Incremental Improvement.
Joseph Redmon,Ali Farhadi +1 more
TL;DR: The authors present some updates to YOLO!
17.8K
YOLO9000: Better, Faster, Stronger
Joseph Redmon,Ali Farhadi +1 more
- 21 Jul 2017
TL;DR: YOLO9000 as discussed by the authors is a state-of-the-art real-time object detection system that can detect over 9000 object categories in real time using a novel multi-scale training method, offering an easy tradeoff between speed and accuracy.
•Posted Content
TensorFlow: A system for large-scale machine learning
Martín Abadi,Paul Barham,Jianmin Chen,Zhifeng Chen,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Geoffrey Irving,Michael Isard,Manjunath Kudlur,Josh Levenberg,Rajat Monga,Sherry Moore,Derek G. Murray,Benoit Steiner,Paul A. Tucker,Vijay K. Vasudevan,Pete Warden,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +21 more
TL;DR: The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated.
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
- 03 Nov 2014
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.