Open AccessPosted Content
Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond
TL;DR: This survey systematically review the work on random features from the past ten years, and discusses the relationship between random features and modern over-parameterized deep neural networks, including the use of random features in the analysis DNNs as well as the gaps between current theoretical and empirical results.
read more
Abstract: Random features is one of the most popular techniques to speed up kernel methods in large-scale problems. Related works have been recognized by the NeurIPS Test-of-Time award in 2017 and the ICML Best Paper Finalist in 2019. The body of work on random features has grown rapidly, and hence it is desirable to have a comprehensive overview on this topic explaining the connections among various algorithms and theoretical results. In this survey, we systematically review the work on random features from the past ten years. First, the motivations, characteristics and contributions of representative random features based algorithms are summarized according to their sampling schemes, learning procedures, variance reduction properties and how they exploit training data. Second, we review theoretical results that center around the following key question: how many random features are needed to ensure a high approximation quality or no loss in the empirical/expected risks of the learned estimator. Third, we provide a comprehensive evaluation of popular random features based algorithms on several large-scale benchmark datasets and discuss their approximation quality and prediction performance for classification. Last, we discuss the relationship between random features and modern over-parameterized deep neural networks (DNNs), including the use of high dimensional random features in the analysis of DNNs as well as the gaps between current theoretical and empirical results. This survey may serve as a gentle introduction to this topic, and as a users' guide for practitioners interested in applying the representative algorithms and understanding theoretical results under various technical assumptions. We hope that this survey will facilitate discussion on the open problems in this topic, and more importantly, shed light on future research directions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The Virtue of Complexity in Return Prediction
TL;DR: This work empirically document the virtue of complexity in U.S. equity market return prediction and establishes the rationale for modeling expected returns through machine learning.
47
Attention-based Random Forest and Contamination Model
TL;DR: In this article , an attention-based random forest (ABRF) model is proposed to assign attention weights with trainable parameters to decision trees in a specific way, where the attention weights depend on the distance between an instance which falls into a corresponding leaf of a tree, and training instances which fall in the same leaf.
30
Universality Laws for High-Dimensional Learning With Random Features
TL;DR: In this article , it was shown that a random feature model with nonlinear activation function is asymptotically equivalent to a surrogate linear Gaussian model with a matching covariance matrix.
28
•Posted Content
Global Convergence and Induced Kernels of Gradient-Based Meta-Learning with Neural Nets.
Haoxiang Wang,Ruoyu Sun,Bo Li +2 more
- 25 Jun 2020
TL;DR: It is proved that GBML is equivalent to a functional gradient descent operation that explicitly propagates experience from the past tasks to new ones and a new kernel-based meta-learning approach is developed that outperforms GBML with standard DNNs on the Omniglot dataset when the number of past tasks for meta-training is small.
17
References
ImageNet: A large-scale hierarchical image database
Jia Deng,Wei Dong,Richard Socher,Li-Jia Li,Kai Li,Li Fei-Fei +5 more
- 20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
- 01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
53.5K
•Proceedings Article
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
- 06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner,Patrick Haffner +7 more
- 01 Jan 2001
TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
32.7K
•Dissertation
Learning Multiple Layers of Features from Tiny Images
Alex Krizhevsky
- 01 Jan 2009
TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.
Related Papers (5)
Ali Rahimi,Benjamin Recht +1 more
- 03 Dec 2007
Alessandro Rudi,Lorenzo Rosasco +1 more
- 01 Jan 2017
Ingo Steinwart,Andreas Christmann +1 more
- 12 Aug 2008