TL;DR: In this paper, an end-to-end trainable model for image compression based on variational autoencoders is proposed, which incorporates a hyperprior to effectively capture spatial dependencies in the latent representation.
Abstract: We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.
TL;DR: In this paper, a hierarchical extension of the model class is proposed, using a multivariate normal distribution of person-level parameters with the mean and covariance matrix to be estimated from the data.
Abstract: Multinomial processing tree models are widely used in many areas of psychology. A hierarchical extension of the model class is proposed, using a multivariate normal distribution of person-level parameters with the mean and covariance matrix to be estimated from the data. The hierarchical model allows one to take variability between persons into account and to assess parameter correlations. The model is estimated using Bayesian methods with weakly informative hyperprior distribution and a Gibbs sampler based on two steps of data augmentation. Estimation, model checks, and hypotheses tests are discussed. The new method is illustrated using a real data set, and its performance is evaluated in a simulation study.
TL;DR: A Bayesian point of view is taken and how to construct priors on decision tree ensembles that are capable of adapting to sparsity in the predictors by placing a sparsity-inducing Dirichlet hyperprior on the splitting proportions of the regression tree prior is shown.
Abstract: Decision tree ensembles are an extremely popular tool for obtaining high-quality predictions in nonparametric regression problems. Unmodified, however, many commonly used decision tree ensemble methods do not adapt to sparsity in the regime in which the number of predictors is larger than the number of observations. A recent stream of research concerns the construction of decision tree ensembles that are motivated by a generative probabilistic model, the most influential method being the Bayesian additive regression trees (BART) framework. In this article, we take a Bayesian point of view on this problem and show how to construct priors on decision tree ensembles that are capable of adapting to sparsity in the predictors by placing a sparsity-inducing Dirichlet hyperprior on the splitting proportions of the regression tree prior. We characterize the asymptotic distribution of the number of predictors included in the model and show how this prior can be easily incorporated into existing Markov chai...
TL;DR: It is suggested that hyperpriors for the precision parameters should be selected according to the type of IGMRF used, and hyperprior values can be selected to give the same degree of smoothness, a priori, by mapping the random precision to the marginal standard deviation of the IGMRFs and recalculate hyperpriours used for different models.
Abstract: In Bayesian hierarchical regression models, intrinsic Gaussian Markov random fields (IGMRFs) are commonly applied to model underlying spatial or temporal dependency structures. IGMRFs have a scaled precision matrix that reflects the neighbourhood structure of the model, while the scaling is represented as a random precision parameter. The hyperprior chosen for the precision parameter influences the degree of smoothness of the resulting field and this can have a strong effect on posterior results. We suggest that hyperpriors for the precision parameters should be selected according to the type of IGMRF used. Also, hyperpriors for different types of IGMRFs can be selected to give the same degree of smoothness, a priori. This is achieved by mapping the random precision to the marginal standard deviation of the IGMRF and recalculate hyperpriors used for different models. Also, the parameters of the hyperprior can be interpreted in terms of the marginal standard deviation. The given ideas are demonstrated by analysing two different types of spatial data in R-INLA , including a district-level analysis of survival data and the analysis of a spatial point pattern discretized to a grid.
TL;DR: In this paper, the authors propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E3BM) to achieve robust predictions, where each training epoch has a Bayes model whose parameters are specifically learned and deployed.
Abstract: Few-shot learning aims to train efficient predictive models with a few examples. The lack of training data leads to poor models that perform high-variance or low-confidence predictions. In this paper, we propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E\(^3\)BM) to achieve robust predictions. “Epoch-wise” means that each training epoch has a Bayes model whose parameters are specifically learned and deployed. “Empirical” means that the hyperparameters, e.g., used for learning and ensembling the epoch-wise models, are generated by hyperprior learners conditional on task-specific data. We introduce four kinds of hyperprior learners by considering inductive vs. transductive, and epoch-dependent vs. epoch-independent, in the paradigm of meta-learning. We conduct extensive experiments for five-class few-shot tasks on three challenging benchmarks: miniImageNet, tieredImageNet, and FC100, and achieve top performance using the epoch-dependent transductive hyperprior learner, which captures the richest information. Our ablation study shows that both “epoch-wise ensemble” and “empirical” encourage high efficiency and robustness in the model performance (Our code is open-sourced at https://gitlab.mpi-klsb.mpg.de/yaoyaoliu/e3bm).