Effective and Efficient Dropout for Deep Convolutional Neural Networks.

Open AccessPosted Content

Effective and Efficient Dropout for Deep Convolutional Neural Networks.

- 06 Apr 2019

69

TL;DR: The order of the dropout operations are proposed to be adjusted to address the conflict between the conventional dropout and the batch normalization operation after it, and other structurally more suited dropout variants are examined and introduced for more efficient and effective regularization for CNNs.

Abstract: Convolutional Neural networks (CNNs) based applications have become ubiquitous, where proper regularization is greatly needed. To prevent large neural network models from overfitting, dropout has been widely used as an efficient regularization technique in practice. However, many recent works show that the standard dropout is ineffective or even detrimental to the training of CNNs. In this paper, we revisit this issue and examine various dropout variants in an attempt to improve existing dropout-based regularization techniques for CNNs. We attribute the failure of standard dropout to the conflict between the stochasticity of dropout and its following Batch Normalization (BN), and propose to reduce the conflict by placing dropout operations right before the convolutional operation instead of BN, or totally address this issue by replacing BN with Group Normalization (GN). We further introduce a structurally more suited dropout variant Drop-Conv2d, which provides more efficient and effective regularization for deep CNNs. These dropout variants can be readily integrated into the building blocks of CNNs and implemented in existing deep learning platforms. Extensive experiments on benchmark datasets including CIFAR, SVHN and ImageNet are conducted to compare the existing building blocks and the proposed ones with dropout training. Results show that our building blocks improve over state-of-the-art CNNs significantly, which is mainly due to the better regularization and implicit model ensemble effect.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.3390/APP10051897

The Influence of the Activation Function in a Convolution Neural Network Model of Facial Expression Recognition

Yingying Wang, +3 more

- 10 Mar 2020

- Applied Sciences

TL;DR: The Experimental results on two public facial expression databases show that the convolutional neural network based on the improved activation function has a better performance than most-of-the-art activation functions.

...read moreread less

289

•Posted Content•10.1145/1122445.1122456

Reaction or Speculation: Building Computational Support for Users in Catching-Up Series Based on an Emerging Media Consumption Phenomenon

Riku Arakawa, +1 more

- 12 Feb 2021

- arXiv: Human-Computer Interaction

TL;DR: In this paper, a series of studies were conducted to understand how people engage with speculation during media consumption and designed two prototypes for supporting catching-up users based on their quantitative analysis of Twitter data in regard to reaction-and speculation-based media consumption.

...read moreread less

268

Proceedings Article•10.1145/3357223.3362707

BigDL: A Distributed Deep Learning Framework for Big Data

Jason Dai, +19 more

- 16 Apr 2018

- arXiv: Distributed, Parallel, and Cluste...

TL;DR: This paper presents BigDL, a distributed deep learning framework for Apache Spark that allows deep learning applications to run on the Apache Hadoop/Spark cluster so as to directly process the production data, and as a part of the end-to-end data analysis pipeline for deployment and management.

...read moreread less

185

•Posted Content•10.1145/1122445.1122456

SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis

Naili Xing, +9 more

- 03 Aug 2021

- arXiv: Learning

TL;DR: SINGA-Easy as discussed by the authors is a new deep learning framework that provides distributed hyper-parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation.

...read moreread less

106

•Journal Article•10.1088/2632-2153/ABD614

Review: Deep Learning in Electron Microscopy

Jeffrey M. Ede

- 17 Sep 2020

- arXiv: Image and Video Processing

TL;DR: In this paper, a review of deep learning in electron microscopy is presented, with a focus on hardware and software needed to get started with deep learning and interface with electron microscopes.

...read moreread less

105

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K