TL;DR: A least squares version for support vector machine (SVM) classifiers that follows from solving a set of linear equations, instead of quadratic programming for classical SVM's.
Abstract: In this letter we discuss a least squares version for support vector machine (SVM) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM‘s. The approach is illustrated on a two-spiral benchmark classification problem.
TL;DR: Support vector machines for dynamic reconstruction of a chaotic system, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel.
Abstract: Introduction to support vector learning roadmap. Part 1 Theory: three remarks on the support vector method of function estimation, Vladimir Vapnik generalization performance of support vector machines and other pattern classifiers, Peter Bartlett and John Shawe-Taylor Bayesian voting schemes and large margin classifiers, Nello Cristianini and John Shawe-Taylor support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, Grace Wahba geometry and invariance in kernel based methods, Christopher J.C. Burges on the annealed VC entropy for margin classifiers - a statistical mechanics study, Manfred Opper entropy numbers, operators and support vector kernels, Robert C. Williamson et al. Part 2 Implementations: solving the quadratic programming problem arising in support vector classification, Linda Kaufman making large-scale support vector machine learning practical, Thorsten Joachims fast training of support vector machines using sequential minimal optimization, John C. Platt. Part 3 Applications: support vector machines for dynamic reconstruction of a chaotic system, Davide Mattera and Simon Haykin using support vector machines for time series prediction, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel. Part 4 Extensions of the algorithm: reducing the run-time complexity in support vector machines, Edgar E. Osuna and Federico Girosi support vector regression with ANOVA decomposition kernels, Mark O. Stitson et al support vector density estimation, Jason Weston et al combining support vector and mathematical programming methods for classification, Bernhard Scholkopf et al.
TL;DR: The output of a lassi(cid:12)er should be a alibrated posterior probability to enable post-pro essing and a method to train a kernel lassi with a logit link and a regularized maximum likelihood is proposed.
TL;DR: SMO breaks this large quadratic programming problem into a series of smallest possible QP problems, which avoids using a time-consuming numerical QP optimization as an inner loop and hence SMO is fastest for linear SVMs and sparse data sets.
TL;DR: In this article, the authors proposed a new algorithm for training Support Vector Machines (SVM) called SMO (Sequential Minimal Optimization), which breaks this large QP problem into a series of smallest possible QP problems.
Abstract: This chapter describes a new algorithm for training Support Vector Machines: Sequential Minimal Optimization, or SMO Training a Support Vector Machine (SVM) requires the solution of a very large quadratic programming (QP) optimization problem SMO breaks this large QP problem into a series of smallest possible QP problems These small QP problems are solved analytically, which avoids using a time-consuming numerical QP optimization as an inner loop The amount of memory required for SMO is linear in the training set size, which allows SMO to handle very large training sets Because large matrix computation is avoided, SMO scales somewhere between linear and quadratic in the training set size for various test problems, while a standard projected conjugate gradient (PCG) chunking algorithm scales somewhere between linear and cubic in the training set size SMO's computation time is dominated by SVM evaluation, hence SMO is fastest for linear SVMs and sparse data sets For the MNIST database, SMO is as fast as PCG chunking; while for the UCI Adult database and linear SVMs, SMO can be more than 1000 times faster than the PCG chunking algorithm
TL;DR: SVM light as discussed by the authors is an implementation of an SVM learner which addresses the problem of large-scale SVM training with many training examples on the shelf, which makes large scale SVM learning more practical.
Abstract: Training a support vector machine SVM leads to a quadratic optimization problem with bound constraints and one linear equality constraint Despite the fact that this type of problem is well understood, there are many issues to be considered in designing an SVM learner In particular, for large learning tasks with many training examples on the shelf optimization techniques for general quadratic programs quickly become intractable in their memory and time requirements SVM light is an implementation of an SVM learner which addresses the problem of large tasks This chapter presents algorithmic and computational results developed for SVM light V 20, which make large-scale SVM training more practical The results give guidelines for the application of SVMs to large domains
TL;DR: An analysis of why Transductive Support Vector Machines are well suited for text classi cation is presented, and an algorithm for training TSVMs, handling 10,000 examples and more is proposed.
Abstract: This paper introduces Transductive Support Vector Machines (TSVMs) for text classi cation. While regular Support Vector Machines (SVMs) try to induce a general decision function for a learning task, Transductive Support Vector Machines take into account a particular test set and try to minimize misclassi cations of just those particular examples. The paper presents an analysis of why TSVMs are well suited for text classi cation. These theoretical ndings are supported by experiments on three test collections. The experiments show substantial improvements over inductive methods, especially for small training sets, cutting the number of labeled training examples down to a twentieth on some tasks. This work also proposes an algorithm for training TSVMs e ciently, handling 10,000 examples and more.
TL;DR: In this article, a non-linear classification technique based on Fisher's discriminant is proposed and the main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space.
Abstract: A non-linear classification technique based on Fisher's discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) non-linear decision function in input space. Large scale simulations demonstrate the competitiveness of our approach.
TL;DR: The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data and is regularized by controlling the length of the weight vector in an associated feature space.
Abstract: Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified ν between 0 and 1.
We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. We provide a theoretical analysis of the statistical performance of our algorithm.
The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data.
TL;DR: This paper shows the use of a data domain description method, inspired by the support vector machine by Vapnik, called the support vectors domain description (SVDD), which can be used for novelty or outlier detection and is compared with other outlier Detection methods on real data.
TL;DR: This chapter presents algorithmic and computational results developed for SV M light V2.0, which make large-scale SVM training more practical and give guidelines for the application of SVMs to large domains.
Abstract: Training a support vector machine (SVM) leads to a quadratic optimization problem with bound constraints and one linear equality constraint. Despite the fact that this type of problem is well understood, there are many issues to be considered in designing an SVM learner. In particular, for large learning tasks with many training examples, oo-the-shelf optimization techniques for general quadratic programs quickly become intractable in their memory and time requirements. SV M light1 is an implementation of an SVM learner which addresses the problem of large tasks. This chapter presents algorithmic and computational results developed for SV M light V2.0, which make large-scale SVM training more practical. The results give guidelines for the application of SVMs to large domains.
TL;DR: The use of support vector machines in classifying e-mail as spam or nonspam is studied by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees, which found SVM's performed best when using binary features.
Abstract: We study the use of support vector machines (SVM) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM performed best when using binary features. For both data sets, boosting trees and SVM had acceptable test performance in terms of accuracy and speed. However, SVM had significantly less training time.
TL;DR: It is observed that a simple remapping of the input x(i)-->x(i)(a) improves the performance of linear SVM's to such an extend that it makes them, for this problem, a valid alternative to RBF kernels.
Abstract: Traditional classification approaches generalize poorly on image classification tasks, because of the high dimensionality of the feature space. This paper shows that support vector machines (SVM) can generalize well on difficult image classification problems where the only features are high dimensional histograms. Heavy-tailed RBF kernels of the form K(x, y)=e/sup -/spl rho///spl Sigma//sub i//sup |xia-yia|b/ with a /spl les/1 and b/spl les/2 are evaluated on the classification of images extracted from the Corel stock photo collection and shown to far outperform traditional polynomial or Gaussian radial basis function (RBF) kernels. Moreover, we observed that a simple remapping of the input x/sub i//spl rarr/x/sub i//sup a/ improves the performance of linear SVM to such an extend that it makes them, for this problem, a valid alternative to RBF kernels.
TL;DR: The Relevance Vector Machine is introduced, a Bayesian treatment of a generalised linear model of identical functional form to the SVM, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions.
Abstract: The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a trade-off parameter and the need to utilise 'Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions.
TL;DR: Simulation results for both artificial and real data show remarkable improvement of generalization errors, supporting the idea of modifying a kernel function to enlarge the spatial resolution around the separating boundary surface by a conformal mapping, such that the separability between classes is increased.
TL;DR: A formulation of the SVM is proposed that enables a multi-class pattern recognition problem to be solved in a single optimisation and a similar generalization of linear programming machines is proposed.
Abstract: The solution of binary classi cation problems using support vector machines (SVMs) is well developed, but multi-class problems with more than two classes have typically been solved by combining independently produced binary classi ers. We propose a formulation of the SVM that enables a multi-class pattern recognition problem to be solved in a single optimisation. We also propose a similar generalization of linear programming machines. We report experiments using bench-mark datasets in which these two methods achieve a reduction in the number of support vectors and kernel calculations needed. 1. k-Class Pattern Recognition The k-class pattern recognition problem is to construct a decision function given ` iid (independent and identically distributed) samples (points) of an unknown function, typically with noise: (x1; y1); : : : ; (x`; y`) (1) where xi; i = 1; : : : ; ` is a vector of length d and yi 2 f1; : : : ; kg represents the class of the sample. A natural loss function is the number of mistakes made. 2. Solving k-Class Problems with Binary SVMs For the binary pattern recognition problem (case k = 2), the support vector approach has been well developed [3, 5]. The classical approach to solving k-class pattern recognition problems is to consider the problem as a collection of binary classi cation problems. In the one-versus-rest method one constructs k classi ers, one for each class. The n classi er constructs a hyperplane between class n and the k 1 other classes. A particular point is assigned to the class for which the distance from the margin, in the positive direction (i.e. in the direction in which class \one" lies rather than class \rest"), is maximal. This method has been used widely in ESANN'1999 proceedings European Symposium on Artificial Neural Networks Bruges (Belgium), 21-23 April 1999, D-Facto public., ISBN 2-600049-9-X, pp. 219-224
TL;DR: Two schemes for adjusting the sensitivity and speciicity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves are discussed and their use on real-life medical diagnostic tasks is illustrated.
Abstract: For many applications it is important to accurately distinguish false negative results from false positives. This is particularly important for medical diagnosis where the correct balance between sensitivity and speciicity plays an important role in evaluating the performance of a classiier. In this paper we discuss two schemes for adjusting the sensitivity and speciicity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves. We then illustrate their use on real-life medical diagnostic tasks.
TL;DR: Successive overrelaxation for symmetric linear complementarity problems and quadratic programs is used to train a support vector machine (SVM) for discriminating between the elements of two massive datasets, each with millions of points.
Abstract: Successive overrelaxation (SOR) for symmetric linear complementarity problems and quadratic programs is used to train a support vector machine (SVM) for discriminating between the elements of two massive datasets, each with millions of points. Because SOR handles one point at a time, similar to Platt's sequential minimal optimization (SMO) algorithm (1999) which handles two constraints at a time and Joachims' SVM/sup light/ (1998) which handles a small number of points at a time, SOR can process very large datasets that need not reside in memory. The algorithm converges linearly to a solution. Encouraging numerical results are presented on datasets with up to 10 000 000 points. Such massive discrimination problems cannot be processed by conventional linear or quadratic programming methods, and to our knowledge have not been solved by other methods. On smaller problems, SOR was faster than SVM/sup light/ and comparable or faster than SMO.
TL;DR: This chapter summarises results that have been obtained for high conndence generalization error bounds for the Support Vector Machine (SVM) and other pattern classiiers related to the SVM and argues that the margin and number of support vectors are both estimators of the degree to which the distribution generating the inputs assists identiication of the target hyperplane.
Abstract: The aim of this chapter is to summarise results that have been obtained for high conndence generalization error bounds for the Support Vector Machine (SVM) and other pattern classiiers related to the SVM. As a by-product of the analysis we argue that the margin and number of support vectors are both estimators of the degree to which the distribution generating the inputs assists identiication of the target hyperplane. 1.1 Introduction Generalization analysis of pattern classiiers is concerned with determining the factors that aaect the accuracy of a pattern classiier. Such an analysis requires assumptions to be made about how the data used to train the classiier was gathered and how subsequent data will be generated. One of the most popular assumptions originally championed by Vapnik and Chervonenkis 12] is to assume that the training and testing data are both generated according to the same probability distribution. The distribution can be viewed as a model of the natural processes which give rise to the observed phenomenon. Since it is usually more diicult to estimate the distribution than to learn the classiication function, it is important that no assumptions are made about the distribution, resulting in a so-called distribution-free analysis. We will consider bounds on the generalization error, that is the probability of
TL;DR: Empirical results using benchmark machine learning datasets are provided to show that support vectors form a svccdnct and suficient set for block-by-block incremental learning.
Abstract: With the increase in the size of real-world databases, there is an ever-increasing need to scale up inductive learning algorithms. Incremental learning techniques are one possible solution to the scalability problem. In this paper, we propose three ctiteria to evaluate the robustness and reliability of incremental learning methods, and use them to study the robustness of an incremental training method for Support Vector Machines. We provide empirical results using benchmark machine learning datasets to show that support vectors form a svccdnct and suficient set for block-by-block incremental learning.
TL;DR: An iterative training algorithm for LS-SVM's which is based on a conjugate gradient method which enables solving large scale classification problems which is illustrated on a multi two-spiral benchmark problem.
Abstract: Support vector machines (SVM's) have been introduced in literature as a method for pattern recognition and function estimation, within the framework of statistical learning theory and structural risk minimization. A least squares version (LSSVM) has been recently reported which expresses the training in terms of solving a set of linear equations instead of quadratic programming as for the standard SVM case. In this paper we present an iterative training algorithm for LS-SVM's which is based on a conjugate gradient method. This enables solving large scale classification problems which is illustrated on a multi two-spiral benchmark problem. Keywords. Support vector machines, classification, neural networks, RBF kernels, conjugate gradient method.
TL;DR: This paper shows that Support Vector Machines (SVM) can generalize well on difficult image classification problems where the only features are high dimensional histograms and observes that a simple remapping of the input xi → x a i improves the performance of linear SVMs to such an extend that it makes them a valid alternative to RBF kernels.
Abstract: Traditional classification approaches generalize poorly on image classification tasks, because of the high dimensionality of the feature space This paper shows that Support Vector Machines (SVM) can generalize well on difficult image classification problems where the only features are high dimensional histograms Heavy-tailed RBF kernels of the form K(x,y) = e−ρ P i |x i −y i | with a ≤ 1 and b ≤ 2 are evaluated on the classification of images extracted from the Corel Stock Photo Collection and shown to far outperform traditional polynomial or Gaussian RBF kernels Moreover, we observed that a simple remapping of the input xi → x a i improves the performance of linear SVMs to such an extend that it makes them, for this problem, a valid alternative to RBF kernels keywords: Support Vector Machines, Radial Basis Functions, Image Histogram, Image Classification, Corel
TL;DR: This work introduces a multi-view nonlinear shape model utilising 2D view-dependent constraint without explicit reference to 3D structures, and adopts Kernel PCA based on Support Vector Machines.
Abstract: Recovering the shape of any 3D object using multiple 2D views requires establishing correspondence between feature points at different views. However changes in viewpoint introduce self-occlusions, resulting nonlinear variations in the shape and inconsistent 2D features between views. Here we introduce a multi-view nonlinear shape model utilising 2D view-dependent constraint without explicit reference to 3D structures. For nonlinear model transformation, we adopt Kernel PCA based on Support Vector Machines.
TL;DR: With the described techniques the recognition performance can be improved by 26% over leading existing approaches, and there is evidence that existing related methods could profit from advanced TIS recognition.
Abstract: MOTIVATION
In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS).
RESULTS
The task of finding TIS can be modeled as a classification problem. We demonstrate the applicability of support vector machines for this task, and show how to incorporate prior biological knowledge by engineering an appropriate kernel function. With the described techniques the recognition performance can be improved by 26% over leading existing approaches. We provide evidence that existing related methods (e.g. ESTScan) could profit from advanced TIS recognition.
TL;DR: A sheet material having a decorative surface and a working surface, for application to a support surface is disclosed, the working surface of which is provided with a continuous coating of tacky, pressure-sensitive, adhesive, which adhesive is provide with a coating of a discontinuous layer of resilient, non-adhesive particles.
TL;DR: The scaling problem of different SVMs is highlighted and various normalization methods are proposed to cope with this problem and their efficiencies are measured empirically.
Abstract: Support vector machines (SVMs) are primarily designed for 2-class classification problems. Although in several papers it is mentioned that the combination of K SVMs can be used to solve a K-class classification problem, such a procedure requires some care. In this paper, the scaling problem of different SVMs is highlighted. Various normalization methods are proposed to cope with this problem and their efficiencies are measured empirically. This simple way of ssing SVMs to learn a K-class classification problem consists in choosing the maximum applied to the outputs of K SVMs solving a one-per-class decomposition of the general problem. In the second part of this paper, more sophisticated techniques are suggested. On the one hand, a stacking of the K SVMs with other classification techniques is proposed. On the other end, the one-per-class decomposition scheme is replaced by more elaborated schemes based on error-correcting codes. An incremental algorithm for the elaboration of pertinent decomposition schemes is mentioned, which exploits the properties of SVMs for an efficient computation.
TL;DR: This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition and presents results on several standard vowel and phonetic Classification tasks and shows better performance than Gaussian mixture classifiers.
Abstract: Support vector machines (SVMs) represent a new approach to pattern classification which has attracted a great deal of interest in the machine learning community. Their appeal lies in their strong connection to the underlying statistical learning theory, in particular the theory of structural risk minimization. SVMs have been shown to be particularly successful in fields such as image identification and face recognition; in many problems SVM classifiers have been shown to perform much better than other nonlinear classifiers such as artificial neural networks and k-nearest neighbors. This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition. We present results on several standard vowel and phonetic classification tasks and show better performance than Gaussian mixture classifiers. We also present an analysis of the difficulties we foresee in applying SVMs to continuous speech recognition problems.
TL;DR: An extension of least squares support vector machines (LS-SVMs) to the multiclass case, related to classical neural net approaches for classification where multi-classes are encoded by considering multiple outputs for the network.
Abstract: We present an extension of least squares support vector machines (LS-SVMs) to the multiclass case. While standard SVM solutions involve solving quadratic or linear programming problems, the least squares version of SVMs corresponds to solving a set of linear equations, due to equality instead of inequality constraints in the problem formulation. In LS-SVMs the Mercer condition is still applicable. Hence several type of kernels such as polynomial, RBFs and MLPs can be used. The multiclass case that we discuss here is related to classical neural net approaches for classification where multi-classes are encoded by considering multiple outputs for the network. Efficient methods for solving large scale LS-SVMs are available.
TL;DR: This work investigates the generalization performance of support vector machines (SVMs), which have been recently introduced as a general alternative to neural networks, and finds that SVMs overfit only weakly.
Abstract: Using methods of Statistical Physics, we investigate the generalization performance of support vector machines (SVMs), which have been recently introduced as a general alternative to neural networks. For nonlinear classification rules, the generalization error saturates on a plateau, when the number of examples is too small to properly estimate the coefficients of the nonlinear part. When trained on simple rules, we find that SVMs overfit only weakly. The performance of SVMs is strongly enhanced, when the distribution of the inputs has a gap in feature space.