Proceedings Article10.1109/FAS-W.2018.00023
Efficient Multi-GPU Memory Management for Deep Learning Acceleration
Youngrang Kim,Jaehwan Lee,Jik-Soo Kim,Hyunseung Jei,Hongchan Roh +4 more
- 01 Sep 2018
- pp 37-43
14
TL;DR: A new optimized memory management scheme that can improve the overall GPU memory utilization in multi-GPU systems for deep learning application acceleration and an intelligent prefetching algorithm that can achieve the highest processing throughput while sustaining a large min-batch size are proposed.
read more
Abstract: In this paper, we propose a new optimized memory management scheme that can improve the overall GPU memory utilization in multi-GPU systems for deep learning application acceleration. We extend the Nvidia's vDNN concept (a hybrid utilization of GPU and CPU memories) in a multi-GPU environment by effectively addressing PCIe-bus contention problems. In addition, we designed and implemented an intelligent prefetching algorithm (from CPU memory to GPU) that can achieve the highest processing throughput while sustaining a large min-batch size. For evaluation, we have implemented our memory usage optimization scheme on Tensorflow, the well-known machine learning library from Google, and performed extensive experiments in a multi-GPU testbed. Our evaluation results show that the proposed scheme can increase the mini-batch size by up to 60%, and improve the training throughput by up to 46.6% in a multi-GPU system.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead
Maurizio Capra,Beatrice Bussolino,Alberto Marchisio,Guido Masera,Maurizio Martina,Muhammad Shafique +5 more
TL;DR: This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process.
A systematic literature review on hardware implementation of artificial intelligence algorithms
TL;DR: This work presents a systematic literature review that focuses on exploring the available hardware accelerators for the AI and ML tools, using FPGAs, GPUs and ASICs to accelerate computationally intensive tasks.
132
Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey
01 Jan 2022
TL;DR: In this paper , a detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNNs is discussed, and a comparative study based on factors like power, area, and throughput is also made on the various accelerators discussed.
GPU-Based Embedded Intelligence Architectures and Applications
Li Minn Ang,Kah Phooi Seng +1 more
TL;DR: This paper gives a comprehensive review and representative studies of the emerging and current paradigms for GPU-based EI with the focus on the architecture, technologies and applications.
17
Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
Hyeonseong Choi,Jaehwan Lee +1 more
TL;DR: In this paper, the authors proposed a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training.
13
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Posted Content
Deep Residual Learning for Image Recognition
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
117.9K
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.