Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond

Open AccessPosted Content

Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond

- 23 Apr 2020

113

TL;DR: This survey systematically review the work on random features from the past ten years, and discusses the relationship between random features and modern over-parameterized deep neural networks, including the use of random features in the analysis DNNs as well as the gaps between current theoretical and empirical results.

Abstract: Random features is one of the most popular techniques to speed up kernel methods in large-scale problems. Related works have been recognized by the NeurIPS Test-of-Time award in 2017 and the ICML Best Paper Finalist in 2019. The body of work on random features has grown rapidly, and hence it is desirable to have a comprehensive overview on this topic explaining the connections among various algorithms and theoretical results. In this survey, we systematically review the work on random features from the past ten years. First, the motivations, characteristics and contributions of representative random features based algorithms are summarized according to their sampling schemes, learning procedures, variance reduction properties and how they exploit training data. Second, we review theoretical results that center around the following key question: how many random features are needed to ensure a high approximation quality or no loss in the empirical/expected risks of the learned estimator. Third, we provide a comprehensive evaluation of popular random features based algorithms on several large-scale benchmark datasets and discuss their approximation quality and prediction performance for classification. Last, we discuss the relationship between random features and modern over-parameterized deep neural networks (DNNs), including the use of high dimensional random features in the analysis of DNNs as well as the gaps between current theoretical and empirical results. This survey may serve as a gentle introduction to this topic, and as a users' guide for practitioners interested in applying the representative algorithms and understanding theoretical results under various technical assumptions. We hope that this survey will facilitate discussion on the open problems in this topic, and more importantly, shed light on future research directions.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.2307/3610639

Harmonic Analysis and the Theory of Probability. By S. Bochner pp. 176. 35s. 1955. (California University Press and Cambridge University Press)

J. L. B. Cooper

- 01 May 1957

- The Mathematical Gazette

531

Report•10.3386/w30217

The Virtue of Complexity in Return Prediction

Bryan T. Kelly, +2 more

- 01 Jul 2022

- Social Science Research Network

TL;DR: This work empirically document the virtue of complexity in U.S. equity market return prediction and establishes the rationale for modeling expected returns through machine learning.

...read moreread less

47

•Journal Article•10.1016/j.neunet.2022.07.029

Attention-based Random Forest and Contamination Model

Lev V. Utkin, +1 more

- 08 Jan 2022

- Neural Networks

TL;DR: In this article , an attention-based random forest (ABRF) model is proposed to assign attention weights with trainable parameters to decision trees in a specific way, where the attention weights depend on the distance between an instance which falls into a corresponding leaf of a tree, and training instances which fall in the same leaf.

...read moreread less

30

•Journal Article•10.1109/tit.2022.3217698

Universality Laws for High-Dimensional Learning With Random Features

01 Mar 2023

- IEEE Transactions on Information Theory

TL;DR: In this article , it was shown that a random feature model with nonlinear activation function is asymptotically equivalent to a surrogate linear Gaussian model with a matching covariance matrix.

...read moreread less

28

•Posted Content

Global Convergence and Induced Kernels of Gradient-Based Meta-Learning with Neural Nets.

Haoxiang Wang, +2 more

- 25 Jun 2020

TL;DR: It is proved that GBML is equivalent to a functional gradient descent operation that explicitly propagates experience from the past tasks to new ones and a new kernel-based meta-learning approach is developed that outperforms GBML with standard DNNs on the Omniglot dataset when the number of past tasks for meta-training is small.

...read moreread less

17

...

Expand

References

Proceedings Article•10.1109/CVPR.2009.5206848

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

- 20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

75.9K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

•Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- 06 Jul 2015

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

43.7K

Gradient-based learning applied to document recognition

Yann LeCun, +7 more

- 01 Jan 2001

TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.

...read moreread less

32.7K

•Dissertation

Learning Multiple Layers of Features from Tiny Images

Alex Krizhevsky

- 01 Jan 2009

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

23.7K