Distributed Adaptive Sampling for Kernel Matrix Approximation

Open AccessPosted Content

Distributed Adaptive Sampling for Kernel Matrix Approximation

- 27 Mar 2018

21

TL;DR: SQUEAK as discussed by the authors is the first RLS sampling algorithm for kernel approximation that does not require constructing the whole kernel matrix, and it runs in linear time in a single pass over the dataset w.r.t.

Abstract: Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $\mathbf{K}_n$ requires at least $\mathcal{O}(n^2)$ time and space for $n$ samples. Recent works show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for $\mathbf{K}_n$. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK is the first RLS sampling algorithm that never constructs the whole matrix $\mathbf{K}_n$, runs in linear time $\widetilde{\mathcal{O}}(nd_{eff}(\gamma)^3)$ w.r.t. $n$, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK that linearly scales across multiple machines, achieving similar accuracy in as little as $\widetilde{\mathcal{O}}(\log(n)d_{eff}(\gamma)^3)$ time.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

On Fast Leverage Score Sampling and Optimal Learning

Alessandro Rudi, +3 more

- 31 Oct 2018

- arXiv: Machine Learning

TL;DR: In this article, leverage score sampling for positive definite matrices defined by a kernel is studied and a leverage score sample sampling algorithm for kernel ridge regression is proposed. But, performing leverage scores sampling is a challenge in its own right requiring further approximations.

...read moreread less

74

•Proceedings Article

An Iterative, Sketching-based Framework for Ridge Regression

Agniva Chowdhury, +2 more

- 03 Jul 2018

TL;DR: It is proved that accurate approximations can be achieved by a sample whose size depends on the degrees of freedom of the ridge-regression problem rather than the dimensions of the design matrix, which is a fundamental and wellunderstood primitive of randomized linear algebra.

...read moreread less

48

•Proceedings Article

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

Daniele Calandriello, +4 more

- 25 Jun 2019

TL;DR: BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.

...read moreread less

48

•Posted Content

Convergence of Sparse Variational Inference in Gaussian Processes Regression

David R. Burt, +2 more

- 01 Aug 2020

- arXiv: Machine Learning

TL;DR: It is shown that the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with M needs to grow with N to ensure high quality approximations.

...read moreread less

43

•Proceedings Article

Improved large-scale graph learning through ridge spectral sparsification

Daniele Calandriello, +4 more

- 01 Jan 2018

TL;DR: By constructing a spectrally-similar graph, this paper is able to bound the error induced by the sparsifica-tion for a variety of downstream tasks (e.g., SSL), and empirically validate the theoretical guarantees on Amazon co-purchase graph and compare to the state-of-the-art heuristics.

...read moreread less

36

...

Expand

References

•Book

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Carl Edward Rasmussen, +1 more

- 01 Dec 2005

TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and includes detailed algorithms for supervised-learning problem for both regression and classification.

...read moreread less

3.1K

Book Chapter•10.1007/BFB0020217

Kernel Principal Component Analysis

Bernhard Schölkopf, +2 more

- 08 Oct 1997

TL;DR: A new method for performing a nonlinear form of Principal Component Analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.

...read moreread less

2.6K

•Journal Article•10.5555/1046920.1194909

A Unifying View of Sparse Approximate Gaussian Process Regression

Joaquin Quiñonero-Candela, +1 more

- 01 Dec 2005

- Journal of Machine Learning Research

TL;DR: A new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression, relies on expressing the effective prior which the methods are using, and highlights the relationship between existing methods.

...read moreread less

2.4K

•Book

Stochastic Dominance: Investment Decision Making under Uncertainty

Haim Levy

- 25 Nov 2010

TL;DR: In this article, the authors present algorithms for stochastic dominance with specific distributions and apply them to different types of risk measures, such as expected utility theory, risk measures and diversification.

...read moreread less

809

•Proceedings Article•10.5555/299094.299113

Kernel principal component analysis

Bernhard Schölkopf, +2 more

- 08 Feb 1999

TL;DR: In this paper, a nonlinear form of principal component analysis (PCA) is proposed to perform polynomial feature extraction in high-dimensional feature spaces, related to input space by some nonlinear map; for instance, the space of all possible d-pixel products in images.

...read moreread less

438

...

Expand

Distributed Adaptive Sampling for Kernel Matrix Approximation

Chat with Paper

AI Agents for this Paper

Citations

On Fast Leverage Score Sampling and Optimal Learning

An Iterative, Sketching-based Framework for Ridge Regression

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

Convergence of Sparse Variational Inference in Gaussian Processes Regression

Improved large-scale graph learning through ridge spectral sparsification

References

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Kernel Principal Component Analysis

A Unifying View of Sparse Approximate Gaussian Process Regression

Stochastic Dominance: Investment Decision Making under Uncertainty

Kernel principal component analysis

Related Papers (5)

Fast kernel matrix-vector multiplication with application to Gaussian process learning

Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling

Memory efficient kernel approximation

Approximate multiple kernel learning with least-angle regression

Adaptive Explicit Kernel Minkowski Weighted K-means