TensorRT-Based Framework and Optimization Methodology for Deep Learning Inference on Jetson Boards

doi:10.1145/3508391

Journal Article10.1145/3508391

TensorRT-Based Framework and Optimization Methodology for Deep Learning Inference on Jetson Boards

Eunji Jeong, +2 more

- 26 Jan 2022

- ACM Transactions in Embedded Computing S...

- Vol. 21, Iss: 5, pp 1-26

77

TL;DR: TensorRT as mentioned in this paper is a framework supporting various optimization parameters to accelerate a deep learning application targeted on an NVIDIA Jetson embedded platform with heterogeneous processors, including multi-threading, pipelining, buffer assignment, and network duplication.

Abstract: As deep learning inference applications are increasing in embedded devices, an embedded device tends to equip neural processing units (NPUs) in addition to a multi-core CPU and a GPU. NVIDIA Jetson AGX Xavier is an example. For fast and efficient development of deep learning applications, TensorRT is provided as the SDK for high-performance inference, including an optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. Like most deep learning frameworks, TensorRT assumes that the inference is executed on a single processing element, GPU or NPU, not both. In this article, we present a TensorRT-based framework supporting various optimization parameters to accelerate a deep learning application targeted on an NVIDIA Jetson embedded platform with heterogeneous processors, including multi-threading, pipelining, buffer assignment, and network duplication. Since the design space of allocating layers to diverse processing elements and optimizing other parameters is huge, we devise a parameter optimization methodology that consists of a heuristic for balancing pipeline stages among heterogeneous processors and fine-tuning the process for optimizing parameters. With nine real-life benchmarks, we could achieve 101%~680% performance improvement and up to 55% energy reduction over the baseline inference using a GPU only.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.3390/machines10050294

YOLO-GD: A Deep Learning-Based Object Detection Algorithm for Empty-Dish Recycling Robots

Xuebin Yue, +4 more

- 22 Apr 2022

- Machines

TL;DR: A deep learning-based object detection algorithm for empty-dish recycling robots to automatically recycle dishes in restaurants and canteens, etc, using a lightweight object detection model YOLO-GD.

...read moreread less

57

•Journal Article•10.3390/s22176717

Optimization of Edge Resources for Deep Learning Application with Batch and Model Management

Seung Woo Kum, +3 more

- 01 Sep 2022

- Sensors

TL;DR: The result shows that the proposed method can optimize the usage of edge resources for real-time video analysis deep learning applications by modifying the batch size of the input of an inference application.

...read moreread less

15

Journal Article•10.1117/1.jei.31.3.033033

YOLOv5-R: lightweight real-time detection based on improved YOLOv5

11 Jun 2022

- Journal of Electronic Imaging

TL;DR: YOLOv5-R as mentioned in this paper is a real-time detection algorithm based on channel attention (ECA) module and depthwise separable convolution (DSC) module.

...read moreread less

15

Journal Article•10.1002/cpe.7825

Investigating hardware and software aspects in the energy consumption of machine learning: A green AI‐centric analysis

André Yokoyama, +3 more

- 01 Jun 2023

- Concurrency and Computation: Practice an...

TL;DR: In this article , the authors present an up-to-date revision of the literature and assess it through experiments and evaluate the use of ARM-based single-board computers for training Machine Learning algorithms.

...read moreread less

13

Journal Article•10.3390/biomimetics8060480

YOLOv5-MS: Real-Time Multi-Surveillance Pedestrian Target Detection Model for Smart Cities

Fangzheng Song, +1 more

- 09 Oct 2023

- Biomimetics

TL;DR: The empirical findings from the internally developed smart city dataset unveil YOLOv5-MS’s impressive 96.5% mAP value, indicating a significant 2.0% advancement over YOLov5s, and the average inference speed demonstrates a notable 21.3% increase.

...read moreread less

11

...

Expand

References

•Posted Content

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

- 03 Dec 2019

- arXiv: Learning

TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.

...read moreread less

25.9K

•Posted Content

YOLOv3: An Incremental Improvement.

Joseph Redmon, +1 more

- 08 Apr 2018

- arXiv: Computer Vision and Pattern Recog...

TL;DR: The authors present some updates to YOLO!

...read moreread less

17.8K

•Proceedings Article•10.1109/CVPR.2017.690

YOLO9000: Better, Faster, Stronger

Joseph Redmon, +1 more

- 21 Jul 2017

TL;DR: YOLO9000 as discussed by the authors is a state-of-the-art real-time object detection system that can detect over 9000 object categories in real time using a novel multi-scale training method, offering an easy tradeoff between speed and accuracy.

...read moreread less

16.7K

Proceedings Article•10.1145/2647868.2654889

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

- 03 Nov 2014

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

14.9K