Projection predictive model selection for Gaussian processes

doi:10.1109/MLSP.2016.7738829

Open AccessProceedings Article10.1109/MLSP.2016.7738829

Projection predictive model selection for Gaussian processes

Juho Piironen, +1 more

- 01 Sep 2016

- pp 1-6

40

TL;DR: This article proposed a method for simplification of Gaussian process models by projecting the information contained in the full encompassing model and selecting a reduced number of variables based on their predictive relevance, which is useful for improving explainability of the models, reducing the future measurement costs and reducing the computation time for making new predictions.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 2.2. Illustration of the imposed priors on the model complexity for different choices of sparsity hyperparameters when p = 50. Left graph shows the histograms of prior draws for meff (Eq. (2.13)) for the horseshoe prior with p

Figure 3.2. Illustration of projective selection. The training data has n = 100 observations with 1000 features out of which 100 are relevant but correlated with each other and therefore carry similar information (the rest are completely irrelevant). Left plot shows the mean log predictive density (MLPD) and right plot the predictive mean squared error (MSE) as a function of features selected, both evaluated on an independent test set of 1000 observations (vertical lines denote one standard error bars). The reference model (dashed horizontal) is obtained from Bayesian linear regression using the first 5 principal components. The projection (black) is the single point projection with L1-search (Eq. (3.6)) but the predictions are computed without any penalization. Results for Lasso (gray) are shown for comparison.

Table 2.1. Example prior distributions for the regression coefficients β j that can be expressed as scale mixtures of Gaussians. The middle column gives the conditional prior for β j given the hyperparameters, and the last column gives the hyperprior. All hyperparameters for which prior is not specified (τ, ν, π and c) are assumed to be given, although in practice these can be given hyperpriors as well. Symbol c is purposely used both in regularized horseshoe and spike-and-slab as it serves for the same purpose in both cases. For the inverse-gamma distribution, parameters a and b denote the shape and scale, respectively, and also for the exponential distribution, b denotes the scale.

Figure 2.1. Priors densities imposed on the shrinkage factor (2.11) for different prior choices p(β j) (see Table 2.1). For Gaussian and spike-and-slab, the prior contains mass only at some discrete values depicted by the thick vertical bars. For all priors except spikeand-slab, black denotes the density when p nσ−1τ= 1 and grey denotespnσ−1τ= 0.3.

Figure 3.1. Illustration of a typical difficulty encountered with correlated features. The model is the simple linear regression (2.1) without intercept and assuming the noise variance σ2 is known. Visualized are the likelihood, prior (horseshoe with τ = 1) and posterior densities for the regression coefficients β1 and β2 for a random data realization with n= 50 observations when the features x1 and x2 have a correlation of ρ = 0.8 (see the text for more details). The likelihood for both coefficients being zero is small, but the data provides little evidence whether both or only one of them is nonzero. A sparsifying prior such as the horseshoe results in a multimodal posterior but does not help in solving the feature selection problem.

Citations

•Journal Article•10.1214/17-BA1091

Using stacking to average Bayesian predictive distributions

Yuling Yao, +3 more

- 06 Apr 2017

- arXiv: Methodology

TL;DR: This work takes the idea of stacking from the point estimation literature and generalizes to the combination of predictive distributions, extending the utility function to any proper scoring rule, using Pareto smoothed importance sampling to efficiently compute the required leave-one-out posterior distributions and regularization to get more stability.

...read moreread less

159

•Journal Article•10.1214/20-EJS1711

Projective Inference in High-dimensional Problems: Prediction and Feature Selection

Juho Piironen, +2 more

- 04 Oct 2018

- arXiv: Machine Learning

TL;DR: In this paper, a two-stage approach is proposed to construct a possibly non-sparse model that predicts well, and then find a minimal subset of features that characterize the predictions.

...read moreread less

67

•Proceedings Article

Variable selection for Gaussian processes via sensitivity analysis of the posterior predictive distribution

Topi Paananen, +3 more

- 16 Apr 2019

TL;DR: This article proposed two variable selection methods for Gaussian process models that utilize the predictions of a full model in the vicinity of the training points and thereby rank the variables based on their predictive relevance.

...read moreread less

46

•Journal Article•10.1214/18-AOAS1222

Variable prioritization in nonlinear black box methods: a genetic association case study

Lorin Crawford, +3 more

- 01 Jun 2019

- The Annals of Applied Statistics

TL;DR: Methodologically, the "RelATive cEntrality" (RATE) measure is developed to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data.

...read moreread less

37

•Posted Content

Projection Predictive Inference for Generalized Linear and Additive Multilevel Models

Alejandro Catalina, +2 more

- 14 Oct 2020

- arXiv: Methodology

TL;DR: The simulative and real-word experiments demonstrate that the projection predictive inference method can drastically reduce the model complexity required to reach reference predictive performance and achieve good frequency properties.

...read moreread less

29

...

Expand

References

Journal Article•10.1111/J.2517-6161.1996.TB02080.X

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996

- Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

45.4K

Gaussian Processes For Machine Learning

Tanja Hueber

- 01 Jan 2016

TL;DR: The gaussian processes for machine learning is universally compatible with any devices to read, and is available in the digital library an online access to it is set as public so you can get it instantly.

...read moreread less

10K

•Journal Article•10.1214/AOS/1176347963

Multivariate Adaptive Regression Splines

Jerome H. Friedman

- 01 Mar 1991

- Annals of Statistics

TL;DR: In this article, a new method is presented for flexible regression modeling of high dimensional data, which takes the form of an expansion in product spline basis functions, where the number of basis functions as well as the parameters associated with each one (product degree and knot locations) are automatically determined by the data.

...read moreread less

7.9K

•Book

Bayesian learning for neural networks

Geoffrey E. Hinton, +1 more

- 01 Jan 1995

TL;DR: Bayesian Learning for Neural Networks shows that Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional neural network learning methods.

...read moreread less

4.8K

Journal Article•10.1080/01621459.1993.10476353

Variable selection via Gibbs sampling

Edward I. George, +1 more

- 01 Sep 1993

- Journal of the American Statistical Asso...

TL;DR: In this paper, the Gibbs sampler is used to indirectly sample from the multinomial posterior distribution on the set of possible subset choices to identify the promising subsets by their more frequent appearance in the Gibbs sample.

...read moreread less

3.1K

...

Expand

Projection predictive model selection for Gaussian processes

Chat with Paper

AI Agents for this Paper

Figures

Citations

Using stacking to average Bayesian predictive distributions

Projective Inference in High-dimensional Problems: Prediction and Feature Selection

Variable selection for Gaussian processes via sensitivity analysis of the posterior predictive distribution

Variable prioritization in nonlinear black box methods: a genetic association case study

Projection Predictive Inference for Generalized Linear and Additive Multilevel Models

References

Regression Shrinkage and Selection via the Lasso

Gaussian Processes For Machine Learning

Multivariate Adaptive Regression Splines

Bayesian learning for neural networks

Variable selection via Gibbs sampling

Related Papers (5)

Comparison of Bayesian predictive methods for model selection

Model selection for Gaussian processes utilizing sensitivity of posterior predictive distribution

Optimal predictive model selection

The horseshoe estimator for sparse signals

Regression Shrinkage and Selection via the Lasso