Open AccessPosted Content
Ensemble Distribution Distillation
TL;DR: In this article, a prior network is proposed to distill the distribution of the predictions from an ensemble, rather than just the average prediction, into a single model, which is useful for uncertainty estimation.
read more
Abstract: Ensembles of models often yield improvements in system performance. These ensemble approaches have also been empirically shown to yield robust measures of uncertainty, and are capable of distinguishing between different \emph{forms} of uncertainty. However, ensembles come at a computational and memory cost which may be prohibitive for many applications. There has been significant work done on the distillation of an ensemble into a single model. Such approaches decrease computational cost and allow a single model to achieve an accuracy comparable to that of an ensemble. However, information about the \emph{diversity} of the ensemble, which can yield estimates of different forms of uncertainty, is lost. This work considers the novel task of \emph{Ensemble Distribution Distillation} (EnD$^2$) --- distilling the distribution of the predictions from an ensemble, rather than just the average prediction, into a single model. EnD$^2$ enables a single model to retain both the improved classification performance of ensemble distillation as well as information about the diversity of the ensemble, which is useful for uncertainty estimation. A solution for EnD$^2$ based on Prior Networks, a class of models which allow a single neural network to explicitly model a distribution over output distributions, is proposed in this work. The properties of EnD$^2$ are investigated on both an artificial dataset, and on the CIFAR-10, CIFAR-100 and TinyImageNet datasets, where it is shown that EnD$^2$ can approach the classification performance of an ensemble, and outperforms both standard DNNs and Ensemble Distillation on the tasks of misclassification and out-of-distribution input detection.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
Moloud Abdar,Farhad Pourpanah,Sadiq Hussain,Dana Rezazadegan,Li Liu,Mohammad Ghavamzadeh,Paul Fieguth,Xiaochun Cao,Abbas Khosravi,U. Rajendra Acharya,U. Rajendra Acharya,U. Rajendra Acharya,Vladimir Makarenkov,Saeid Nahavandi +13 more
TL;DR: This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning (RL), and outlines a few important applications of UZ methods.
1.6K
Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks.
Lin Wang,Kuk-Jin Yoon +1 more
TL;DR: This paper provides a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically used for vision tasks and systematically analyzes the research status of KD in vision applications.
522
Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data
Yen-Chang Hsu,Yilin Shen,Hongxia Jin,Zsolt Kira +3 more
- 14 Jun 2020
TL;DR: The authors decompose confidence scoring as well as a modified input pre-processing method, and show that both of these significantly help in detection performance, and provide an analysis of when ODIN-like strategies do or do not work.
•Posted Content
Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data
TL;DR: This work bases its work on a popular method ODIN, proposing two strategies for freeing it from the needs of tuning with OoD data, while improving its OoD detection performance, and proposing to decompose confidence scoring as well as a modified input pre-processing method.
363
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
51.9K
•Proceedings Article
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov,Kai Chen,Greg S. Corrado,Jeffrey Dean +3 more
- 16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
27.5K
Fast R-CNN
Ross Girshick
- 07 Dec 2015
TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.