Efficient Multi-GPU Memory Management for Deep Learning Acceleration

doi:10.1109/FAS-W.2018.00023

Proceedings Article10.1109/FAS-W.2018.00023

Efficient Multi-GPU Memory Management for Deep Learning Acceleration

Youngrang Kim, +4 more

- 01 Sep 2018

- pp 37-43

14

TL;DR: A new optimized memory management scheme that can improve the overall GPU memory utilization in multi-GPU systems for deep learning application acceleration and an intelligent prefetching algorithm that can achieve the highest processing throughput while sustaining a large min-batch size are proposed.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/ACCESS.2020.3039858

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Maurizio Capra, +5 more

- 24 Nov 2020

- IEEE Access

TL;DR: This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process.

...read moreread less

180

Journal Article•10.1007/S11227-020-03325-8

A systematic literature review on hardware implementation of artificial intelligence algorithms

Manar Abu Talib, +3 more

- 01 Feb 2021

- The Journal of Supercomputing

TL;DR: This work presents a systematic literature review that focuses on exploring the available hardware accelerators for the AI and ML tools, using FPGAs, GPUs and ASICs to accelerate computationally intensive tasks.

...read moreread less

132

•Journal Article•10.1109/access.2022.3229767

Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

01 Jan 2022

- IEEE Access

TL;DR: In this paper , a detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNNs is discussed, and a comparative study based on factors like power, area, and throughput is also made on the various accelerators discussed.

...read moreread less

50

•Journal Article•10.3390/ELECTRONICS10080952

GPU-Based Embedded Intelligence Architectures and Applications

Li Minn Ang, +1 more

- 16 Apr 2021

- Electronics

TL;DR: This paper gives a comprehensive review and representative studies of the emerging and current paradigms for GPU-based EI with the focus on the architecture, technologies and applications.

...read moreread less

17

•Journal Article•10.3390/APP112110377

Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training

Hyeonseong Choi, +1 more

- 04 Nov 2021

- Applied Sciences

TL;DR: In this paper, the authors proposed a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training.

...read moreread less

13

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K