Open AccessPosted Content
Effective and Efficient Dropout for Deep Convolutional Neural Networks.
TL;DR: The order of the dropout operations are proposed to be adjusted to address the conflict between the conventional dropout and the batch normalization operation after it, and other structurally more suited dropout variants are examined and introduced for more efficient and effective regularization for CNNs.
read more
Abstract: Convolutional Neural networks (CNNs) based applications have become ubiquitous, where proper regularization is greatly needed. To prevent large neural network models from overfitting, dropout has been widely used as an efficient regularization technique in practice. However, many recent works show that the standard dropout is ineffective or even detrimental to the training of CNNs. In this paper, we revisit this issue and examine various dropout variants in an attempt to improve existing dropout-based regularization techniques for CNNs. We attribute the failure of standard dropout to the conflict between the stochasticity of dropout and its following Batch Normalization (BN), and propose to reduce the conflict by placing dropout operations right before the convolutional operation instead of BN, or totally address this issue by replacing BN with Group Normalization (GN). We further introduce a structurally more suited dropout variant Drop-Conv2d, which provides more efficient and effective regularization for deep CNNs. These dropout variants can be readily integrated into the building blocks of CNNs and implemented in existing deep learning platforms. Extensive experiments on benchmark datasets including CIFAR, SVHN and ImageNet are conducted to compare the existing building blocks and the proposed ones with dropout training. Results show that our building blocks improve over state-of-the-art CNNs significantly, which is mainly due to the better regularization and implicit model ensemble effect.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The Influence of the Activation Function in a Convolution Neural Network Model of Facial Expression Recognition
TL;DR: The Experimental results on two public facial expression databases show that the convolutional neural network based on the improved activation function has a better performance than most-of-the-art activation functions.
289
Reaction or Speculation: Building Computational Support for Users in Catching-Up Series Based on an Emerging Media Consumption Phenomenon
Riku Arakawa,Hiromu Yakura +1 more
TL;DR: In this paper, a series of studies were conducted to understand how people engage with speculation during media consumption and designed two prototypes for supporting catching-up users based on their quantitative analysis of Twitter data in regard to reaction-and speculation-based media consumption.
268
BigDL: A Distributed Deep Learning Framework for Big Data
Jason Dai,Yiheng Wang,Xin Qiu,Ding Ding,Yao Zhang,Yanzhang Wang,Xianyan Jia,Cherry Li Zhang,Yan Wan,Zhichao Li,Jiao Wang,Shengsheng Huang,Zhongyuan Wu,Yang Wang,Yuhao Yang,Bowen She,Dongjie Shi,Qi Lu,Kai Huang,Guoqiong Song +19 more
TL;DR: This paper presents BigDL, a distributed deep learning framework for Apache Spark that allows deep learning applications to run on the Apache Hadoop/Spark cluster so as to directly process the production data, and as a part of the end-to-end data analysis pipeline for deployment and management.
185
SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis
Naili Xing,Sai Ho Yeung,Cheng-Hao Cai,Teck Khim Ng,Wei Wang,Kaiyuan Yang,Nan Yang,Meihui Zhang,Gang Chen,Beng Chin Ooi +9 more
TL;DR: SINGA-Easy as discussed by the authors is a new deep learning framework that provides distributed hyper-parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation.
106
Review: Deep Learning in Electron Microscopy
TL;DR: In this paper, a review of deep learning in electron microscopy is presented, with a focus on hardware and software needed to get started with deep learning and interface with electron microscopes.
105
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Related Papers (5)
[...]