Two-dimensional solution path for support vector regression
Gang Wang,Dit-Yan Yeung,Frederick H. Lochovsky +2 more
- 25 Jun 2006
- Vol. 148, pp 993-1000
TL;DR: This paper shows that the solution path for ε-SVR is also piecewise linear with respect to ε, and proposes an efficient algorithm for exploring the two-dimensional solution space defined by the regularization and error parameters.
read more
Abstract: Recently, a very appealing approach was proposed to compute the entire solution path for support vector classification (SVC) with very low extra computational cost. This approach was later extended to a support vector regression (SVR) model called e-SVR. However, the method requires that the error parameter e be set a priori, which is only possible if the desired accuracy of the approximation can be specified in advance. In this paper, we show that the solution path for e-SVR is also piecewise linear with respect to e. We further propose an efficient algorithm for exploring the two-dimensional solution space defined by the regularization and error parameters. As opposed to the algorithm for SVC, our proposed algorithm for e-SVR initializes the number of support vectors to zero and then increases it gradually as the algorithm proceeds. As such, a good regression function possessing the sparseness property can be obtained after only a few iterations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 1. Linear SVR results for four different combinations of values for λ and . (a) proper values of λ and are specified; (b) λ = ∞; (c) > (ymax − ymin)/2; (d) < (ymax − ymin)/2, but all the data points are inside the -tube. 
Figure 4. Based on three -paths with λ = 1 and γ = 0.05, 0.5, 5, the optimal solution for each path in terms of the mean squared error on the validation set is plotted. 
Figure 5. Change in elbow size as a function of for three -paths with λ = 1 and γ = 0.05, 0.5, 5. Since decreases rapidly in the beginning, the horizontal axis is shown in log scale. 
Figure 7. Shifting from the -path algorithm to the λ-path algorithm at four shifting points with different values of . The horizontal axis is in log scale. 
Figure 2. The set of data points is partitioned into five subsets according to the -insensitive loss function. 
Figure 6. Relationships between MSE, , and the number of steps in the algorithm for different values of λ. (a) MSE vs. , with the horizontal axis in log scale; (b) vs. number of steps; (c) MSE vs. number of steps.
Citations
Sparse convex optimization methods for machine learning
Martin Jaggi
- 01 Jan 2011
TL;DR: A convergence proof guaranteeing e-small error is given after O( 1e ) iterations, and the sparsity of approximate solutions for any `1-regularized convex optimization problem (and for optimization over the simplex), expressed as a function of the approximation quality.
Classification model selection via bilevel programming
TL;DR: This work proposes a bilevel program that is significantly more versatile than commonly used grid search procedures, enabling the use of models with many hyper-parameters, and demonstrates the practicality of this approach for model selection in machine learning.
95
Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules
Timon Schroeter,Anton Schwaighofer,Sebastian Mika,Antonius Ter Laak,Detlev Suelzle,Ursula Ganzer,Nikolaus Heinrich,Klaus-Robert Müller +7 more
TL;DR: This work investigates the use of different Machine Learning methods to construct models for aqueous solubility, evaluating all approaches in terms of their prediction accuracy and in how far the individual error bars can faithfully represent the actual prediction error.
83
A kernel path algorithm for support vector machines
Gang Wang,Dit-Yan Yeung,Frederick H. Lochovsky +2 more
- 20 Jun 2007
TL;DR: This paper learns the hyperparameter of the kernel function for a support vector machine (SVM) without having to train the model multiple times, and finds that the solutions of the neighborhood hyperparameters can be calculated exactly.
Machine learning models for lipophilicity and their domain of applicability.
Timon Schroeter,Anton Schwaighofer,Sebastian Mika,Antonius Ter Laak,Detlev Suelzle,Ursula Ganzer,Nikolaus Heinrich,Klaus-Robert Müller +7 more
TL;DR: This study constructs a log D7 model based on 14,556 drug discovery compounds of Bayer Schering Pharma, and considers error bars for each method, and investigates how well they quantify the domain of applicability of each model.
References
•Book
The Nature of Statistical Learning Theory
Vladimir Vapnik
- 01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
46K
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Bernhard Schölkopf,Alexander J. Smola +1 more
- 01 Dec 2001
TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.
10.2K
Least angle regression
Bradley Efron,Trevor Hastie,Iain M. Johnstone,Robert Tibshirani,Hemant Ishwaran,Keith Knight,Jean-Michel Loubes,Jean-Michel Loubes,Pascal Massart,Pascal Massart,David Madigan,David Madigan,Greg Ridgeway,Greg Ridgeway,Saharon Rosset,Saharon Rosset,Ji Zhu,Robert A. Stine,Berwin A. Turlach,Sanford Weisberg +19 more
TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.
New Support Vector Algorithms
TL;DR: A new class of support vector algorithms for regression and classification that eliminates one of the other free parameters of the algorithm: the accuracy parameter in the regression case, and the regularization constant C in the classification case.
3K
•Proceedings Article
1-norm Support Vector Machines
Ji Zhu,Saharon Rosset,Robert Tibshirani,Trevor Hastie +3 more
- 09 Dec 2003
TL;DR: It is argued that the 1-norm SVM may have some advantage over the standard 2- norm SVM, especially when there are redundant noise features, and an efficient algorithm is proposed that computes the whole solution path of the1-normSVM, hence facilitates adaptive selection of the tuning parameter for the 1
Related Papers (5)
Saharon Rosset,Ji Zhu +1 more
Bradley Efron,Trevor Hastie,Iain M. Johnstone,Robert Tibshirani,Hemant Ishwaran,Keith Knight,Jean-Michel Loubes,Jean-Michel Loubes,Pascal Massart,Pascal Massart,David Madigan,David Madigan,Greg Ridgeway,Greg Ridgeway,Saharon Rosset,Saharon Rosset,Ji Zhu,Robert A. Stine,Berwin A. Turlach,Sanford Weisberg +19 more
Ji Zhu,Saharon Rosset,Robert Tibshirani,Trevor Hastie +3 more
- 09 Dec 2003