Journal Article10.48550/arXiv.2209.04722
Batch Bayesian Optimization via Particle Gradient Flows
3
TL;DR: This work reformulates batch BO as an optimisation problem over the space of probability measures and constructs a new acquisition function based on multipoint expected improvement which is convex over thespace of probability Measures.
read more
Abstract: Bayesian Optimisation (BO) methods seek to find global optima of objective functions which are only available as a black-box or are expensive to evaluate. Such methods construct a surrogate model for the objective function, quantifying the uncertainty in that surrogate through Bayesian inference. Objective evaluations are sequentially determined by maximising an acquisition function at each step. However, this ancilliary optimisation problem can be highly non-trivial to solve, due to the non-convexity of the acquisition function, particularly in the case of batch Bayesian optimisation, where multiple points are selected in every step. In this work we reformulate batch BO as an optimisation problem over the space of probability measures. We construct a new acquisition function based on multipoint expected improvement which is convex over the space of probability measures. Practical schemes for solving this ‘inner’ optimisation problem arise naturally as gradient flows of this objective function. We demonstrate the efficacy of this new method on different benchmark functions and compare with state-of-the-art batch BO methods.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Gaussian Process regression over discrete probability measures: on the non-stationarity relation between Euclidean and Wasserstein Squared Exponential Kernels
TL;DR: In this article , the authors propose a solution to solve the problem of the problem: REINFORCE/RESUME 7, 2019. . . . , . . ) .
Wasserstein enabled Bayesian optimization of composite functions
TL;DR: In this paper , the authors propose to map the original problem into a space of discrete probability distributions endowed with a Wasserstein metric, where the input of the Gaussian process is given by discrete probability distribution.
Bayesian optimization over the probability simplex
TL;DR: These results show that embedding the Bayesian optimization process in the probability simplex enables an effective algorithm whose performance over standard Bayesian optimize improves with the increase of problem dimensionality.
References
A comparison of three methods for selecting values of input variables in the analysis of output from a computer code
TL;DR: In this paper, two sampling plans are examined as alternatives to simple random sampling in Monte Carlo studies and they are shown to be improvements over simple sampling with respect to variance for a class of estimators which includes the sample mean and the empirical distribution function.
10.3K
Gaussian processes in machine learning
TL;DR: In this paper, the authors give a basic introduction to Gaussian Process regression models and present the simple equations for incorporating training data and examine how to learn the hyperparameters using the marginal likelihood.
Finite-time Analysis of the Multiarmed Bandit Problem
TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Efficient Global Optimization of Expensive Black-Box Functions
TL;DR: This paper introduces the reader to a response surface methodology that is especially good at modeling the nonlinear, multimodal functions that often occur in engineering and shows how these approximating functions can be used to construct an efficient global optimization algorithm with a credible stopping rule.
•Proceedings Article
Random Features for Large-Scale Kernel Machines
Ali Rahimi,Benjamin Recht +1 more
- 03 Dec 2007
TL;DR: Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.