Open AccessProceedings Article
Performance analysis for L\_2 kernel classification
JooSeuk Kim,Clayton Scott +1 more
- 08 Dec 2008
- Vol. 21, pp 833-840
TL;DR: A distribution free concentration inequality is proved for a cross-validation based estimate of the ISE, and this result is applied to deduce an oracle inequality and consistency of the classifier on the sense of both ISE and probability of error.
read more
Abstract: We provide statistical performance guarantees for a recently introduced kernel classifier that optimizes the L2 or integrated squared error (ISE) of a difference of densities. The classifier is similar to a support vector machine (SVM) in that it is the solution of a quadratic program and yields a sparse classifier. Unlike SVMs, however, the L2 kernel classifier does not involve a regularization parameter. We prove a distribution free concentration inequality for a cross-validation based estimate of the ISE, and apply this result to deduce an oracle inequality and consistency of the classifier on the sense of both ISE and probability of error. Our results also specialize to give performance guarantees for an existing method of L2 kernel density estimation.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Robust Parametric Classification and Variable Selection by a Minimum Distance Criterion
Eric C. Chi,David Scott +1 more
TL;DR: In this paper, a robust penalized logistic regression algorithm based on a minimum distance criterion was proposed to avoid estimation implosion in the presence of many outliers in the important small n large p situation.
30
•Dissertation
Parametric classification and variable selection by the minimum integrated squared error criterion
Eric C. Chi
- 01 Jan 2012
TL;DR: In this article, Parametric classification and variable selection by the Minimum Integrated Squared Error Criterion (MIQE) was used for parametric classification, variable selection and variable classification.
5
L₂ Kernel Classification
JooSeuk Kim,Clayton Scott +1 more
TL;DR: This work proposes a kernel classifier that optimizes the L2 or integrated squared error of a “difference of densities” of the Gaussian kernel and extends the method through the introduction of a natural regularization parameter, which allows it to remain competitive with the SVM in high dimensions.
References
Support-Vector Networks
Corinna Cortes,Vladimir Vapnik +1 more
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Testing Statistical Hypotheses
J. D. Biggins,Erich L. Lehmann +1 more
TL;DR: Lehmann as discussed by the authors, Testing Statistical Hypotheses (2nd ed.). By E. L. Lehmann, 1986. xx, 600p. £44.13.
1K
Probability density estimation from optimally condensed data samples
Mark Girolami,Chao He +1 more
TL;DR: The Reduced Set Density Estimator is presented, which provides a kernel-based density estimator which employs a small percentage of the available data sample and is optimal in the L/sub 2/ sense.
Parametric Statistical Modeling by Minimum Integrated Square Error
TL;DR: This article investigates the use of integrated square error, or L2 distance, as a theoretical and practical estimation tool for a variety of parametric statistical models and demonstrates by example the well-known result that minimum distance estimators, including L2E, are inherently robust.
250
Asymptotically optimal discriminant functions for pattern classification
C. Wolverton,T. Wagner +1 more
TL;DR: It is shown that as the number of labeled samples used to construct the approximations increases, the resulting sequence of discriminant functions is asymptotically optimal in the sense that the probability of misclassification when using the approxIMations in the decision procedure converges in probability or with probability 1, depending on the assumptions made.
179