Deep Learning Inference Parallelization on Heterogeneous Processors with TensorRT

doi:10.1109/LES.2021.3087707

Journal Article10.1109/LES.2021.3087707

Deep Learning Inference Parallelization on Heterogeneous Processors with TensorRT

EunJin Jeong, +4 more

- 09 Jun 2021

- IEEE Embedded Systems Letters

- pp 1-1

53

TL;DR: This letter proposes a parallelization methodology to maximize the throughput of a single DL application using both GPU and NPU by exploiting various types of parallelism on TensorRT.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/3508391

TensorRT-Based Framework and Optimization Methodology for Deep Learning Inference on Jetson Boards

Eunji Jeong, +2 more

- 26 Jan 2022

- ACM Transactions in Embedded Computing S...

TL;DR: TensorRT as mentioned in this paper is a framework supporting various optimization parameters to accelerate a deep learning application targeted on an NVIDIA Jetson embedded platform with heterogeneous processors, including multi-threading, pipelining, buffer assignment, and network duplication.

...read moreread less

77

•Journal Article•10.1109/JSTARS.2021.3120009

Light-YOLOv4: An Edge-Device Oriented Target Detection Method for Remote Sensing Images

Xiaojie Ma, +5 more

- 14 Oct 2021

- IEEE Journal of Selected Topics in Appli...

TL;DR: Light-YOLOv4 as discussed by the authors performs sparsity training by applying L1 regularization to the channel scaling factors, so the less important channels and layers can be found and prune the network to reduce the network's width and depth.

...read moreread less

66

•Journal Article•10.3390/machines10050294

YOLO-GD: A Deep Learning-Based Object Detection Algorithm for Empty-Dish Recycling Robots

Xuebin Yue, +4 more

- 22 Apr 2022

- Machines

TL;DR: A deep learning-based object detection algorithm for empty-dish recycling robots to automatically recycle dishes in restaurants and canteens, etc, using a lightweight object detection model YOLO-GD.

...read moreread less

57

Journal Article•10.1016/j.autcon.2022.104595

Automatic defogging, deblurring, and real-time segmentation system for sewer pipeline defects

Duo Ma, +5 more

- 01 Dec 2022

- Automation in Construction

TL;DR: Wang et al. as discussed by the authors used an attention-based algorithm for defogging and a generative adversarial network (GAN) for deblurring to improve the sharpness of pipeline images, which achieved the highest mean average precision (mAP) of 92.65% and the fastest speed of 41.23 frames per second (fps).

...read moreread less

43

•Journal Article•10.1007/s10462-022-10221-5

An architecture-level analysis on deep learning models for low-impact computations

Hengyi Li, +5 more

- 26 Jun 2022

- Artificial Intelligence Review

TL;DR: In this paper , the authors conduct a series of experiments to make a thorough study for the inference workload of prominent state-of-the-art DNN architectures on a single-instruction-multiple-data (SIMD) CPU platform, as well as with widely applicable scopes for multiple hardware platforms.

...read moreread less

39

...

Expand

References

10.1109/ipsn.2016.7460664

DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices.

Nicholas D. Lane, +6 more

TL;DR: DeepX, a software accelerator, optimizes deep learning inference on mobile devices by decomposing models into unit-blocks and scaling resources, reducing memory, computation, and energy usage, enabling efficient execution of large-scale models on modern mobile processors.

...read moreread less