Ensemble Distribution Distillation

Open AccessPosted Content

Ensemble Distribution Distillation

- 30 Apr 2019

208

TL;DR: In this article, a prior network is proposed to distill the distribution of the predictions from an ensemble, rather than just the average prediction, into a single model, which is useful for uncertainty estimation.

Abstract: Ensembles of models often yield improvements in system performance. These ensemble approaches have also been empirically shown to yield robust measures of uncertainty, and are capable of distinguishing between different \emph{forms} of uncertainty. However, ensembles come at a computational and memory cost which may be prohibitive for many applications. There has been significant work done on the distillation of an ensemble into a single model. Such approaches decrease computational cost and allow a single model to achieve an accuracy comparable to that of an ensemble. However, information about the \emph{diversity} of the ensemble, which can yield estimates of different forms of uncertainty, is lost. This work considers the novel task of \emph{Ensemble Distribution Distillation} (EnD$^2$) --- distilling the distribution of the predictions from an ensemble, rather than just the average prediction, into a single model. EnD$^2$ enables a single model to retain both the improved classification performance of ensemble distillation as well as information about the diversity of the ensemble, which is useful for uncertainty estimation. A solution for EnD$^2$ based on Prior Networks, a class of models which allow a single neural network to explicitly model a distribution over output distributions, is proposed in this work. The properties of EnD$^2$ are investigated on both an artificial dataset, and on the CIFAR-10, CIFAR-100 and TinyImageNet datasets, where it is shown that EnD$^2$ can approach the classification performance of an ensemble, and outperforms both standard DNNs and Ensemble Distillation on the tasks of misclassification and out-of-distribution input detection.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/242224.242229

Machine learning

Thomas G. Dietterich

- 01 Dec 1996

- ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

14K

•Journal Article•10.1016/J.INFFUS.2021.05.008

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

Moloud Abdar, +13 more

- 12 Nov 2020

- arXiv: Learning

TL;DR: This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning (RL), and outlines a few important applications of UZ methods.

...read moreread less

1.6K

•Journal Article•10.1109/TPAMI.2021.3055564

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks.

Lin Wang, +1 more

- 29 Jan 2021

- IEEE Transactions on Pattern Analysis an...

TL;DR: This paper provides a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically used for vision tasks and systematically analyzes the research status of KD in vision applications.

...read moreread less

522

•Proceedings Article•10.1109/CVPR42600.2020.01096

Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data

Yen-Chang Hsu, +3 more

- 14 Jun 2020

TL;DR: The authors decompose confidence scoring as well as a modified input pre-processing method, and show that both of these significantly help in detection performance, and provide an analysis of when ODIN-like strategies do or do not work.

...read moreread less

402

•Posted Content

Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data

Yen-Chang Hsu, +3 more

- 26 Feb 2020

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work bases its work on a popular method ODIN, proposing two strategies for freeing it from the needs of tuning with OoD data, while improving its OoD detection performance, and proposing to decompose confidence scoring as well as a modified input pre-processing method.

...read moreread less

363

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

51.9K

•Proceedings Article

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

27.5K

Proceedings Article•10.1109/ICCV.2015.169

Fast R-CNN

Ross Girshick

- 07 Dec 2015

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.

...read moreread less

24.1K

...

Expand