TL;DR: Decomposition implementations for two "all-together" multiclass SVM methods are given and it is shown that for large problems methods by considering all data at once in general need fewer support vectors.
Abstract: Support vector machines (SVMs) were originally designed for binary classification. How to effectively extend it for multiclass classification is still an ongoing research issue. Several methods have been proposed where typically we construct a multiclass classifier by combining several binary classifiers. Some authors also proposed methods that consider all classes at once. As it is computationally more expensive to solve multiclass problems, comparisons of these methods using large-scale problems have not been seriously conducted. Especially for methods solving multiclass SVM in one step, a much larger optimization problem is required so up to now experiments are limited to small data sets. In this paper we give decomposition implementations for two such "all-together" methods. We then compare their performance with three methods based on binary classifications: "one-against-all," "one-against-one," and directed acyclic graph SVM (DAGSVM). Our experiments indicate that the "one-against-one" and DAG methods are more suitable for practical use than the other methods. Results also show that for large problems methods by considering all data at once in general need fewer support vectors.
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.
TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.
TL;DR: This article used machine learning techniques such as Naive Bayes, maximum entropy classification, and support vector machines (SVM) for sentiment classification of movie reviews, and found that SVM outperformed human-produced baselines.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.
TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.
Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.
TL;DR: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS- sVM for Unsupervised Learning LS- SVM for Recurrent Networks and Control.
Abstract: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS-SVM for Unsupervised Learning LS-SVM for Recurrent Networks and Control.
TL;DR: The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVMs) is considered by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters.
Abstract: The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVMs) is considered. This is done by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters. Usual methods for choosing parameters, based on exhaustive search become intractable as soon as the number of parameters exceeds two. Some experimental results assess the feasibility of our approach for a large number of parameters (more than 100) and demonstrate an improvement of generalization performance.
TL;DR: Support Vector Machines (SVM’s) are intuitive, theoretically wellfounded, and have shown to be practically successful.
Abstract: Support Vector Machines (SVM’s) are a relatively new learning method used for binary classification. The basic idea is to find a hyperplane which separates the d-dimensional data perfectly into its two classes. However, since example data is often not linearly separable, SVM’s introduce the notion of a “kernel induced feature space” which casts the data into a higher dimensional space where the data is separable. Typically, casting into such a space would cause problems computationally, and with overfitting. The key insight used in SVM’s is that the higher-dimensional space doesn’t need to be dealt with directly (as it turns out, only the formula for the dot-product in that space is needed), which eliminates the above concerns. Furthermore, the VC-dimension (a measure of a system’s likelihood to perform well on unseen data) of SVM’s can be explicitly calculated, unlike other learning methods like neural networks, for which there is no measure. Overall, SVM’s are intuitive, theoretically wellfounded, and have shown to be practically successful. SVM’s have also been extended to solve regression tasks (where the system is trained to output a numerical value, rather than “yes/no” classification).
TL;DR: An introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images are given.
Abstract: The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.
TL;DR: The proposed extensions of the Support Vector Machine learning approach lead to mixed integer quadratic programs that can be solved heuristic ally and a generalization of SVMs makes a state-of-the-art classification technique, including non-linear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods.
Abstract: This paper presents two new formulations of multiple-instance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristic ally. Our generalization of SVMs makes a state-of-the-art classification technique, including non-linear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods. We present experimental results on a pharmaceutical data set and on applications in automated image indexing and document categorization.
TL;DR: In this paper, a Gaussian kernel based clustering method using support vector machines (SVM) is proposed to find the minimal enclosing sphere, which can separate into several components, each enclosing a separate cluster of points.
Abstract: We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several components, each enclosing a separate cluster of points. We present a simple algorithm for identifying these clusters. The width of the Gaussian kernel controls the scale at which the data is probed while the soft margin constant helps coping with outliers and overlapping clusters. The structure of a dataset is explored by varying the two parameters, maintaining a minimal number of support vectors to assure smooth cluster boundaries. We demonstrate the performance of our algorithm on several datasets.
TL;DR: The SVM approach as represented by Schoelkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable.
Abstract: We implemented versions of the SVM appropriate for one-class classification in the context of information retrieval. The experiments were conducted on the standard Reuters data set. For the SVM implementation we used both a version of Schoelkopf et al. and a somewhat different version of one-class SVM based on identifying "outlier" data as representative of the second-class. We report on experiments with different kernels for both of these implementations and with different representations of the data, including binary vectors, tf-idf representation and a modification called "Hadamard" representation. Then we compared it with one-class versions of the algorithms prototype (Rocchio), nearest neighbor, naive Bayes, and finally a natural one-class neural network classification method based on "bottleneck" compression generated filters.The SVM approach as represented by Schoelkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable. However, the SVM methods turned out to be quite sensitive to the choice of representation and kernel in ways which are not well understood; therefore, for the time being leaving the neural network approach as the most robust.
TL;DR: The methods of this paper are illustrated for RBF kernels and demonstrate how to obtain robust estimates with selection of an appropriate number of hidden units, in the case of outliers or non-Gaussian error distributions with heavy tails.
TL;DR: This work shows how to obtain accurate probability estimates for multiclass problems by combining calibrated binary probability estimates, and proposes a new method for obtaining calibrated two-class probability estimates that can be applied to any classifier that produces a ranking of examples.
Abstract: Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outputs of other classifiers, or domain knowledge. Previous calibration methods apply only to two-class problems. Here, we show how to obtain accurate probability estimates for multiclass problems by combining calibrated binary probability estimates. We also propose a new method for obtaining calibrated two-class probability estimates that can be applied to any classifier that produces a ranking of examples. Using naive Bayes and support vector machine classifiers, we give experimental results from a variety of two-class and multiclass domains, including direct marketing, text categorization and digit recognition.
TL;DR: Learning To Classify Text Using Support Vector Machines (LTSVMs) as discussed by the authors is a new approach to generate text classifiers from examples, which combines high performance and efficiency with theoretical understanding and improved robustness.
Abstract: Based on ideas from Support Vector Machines (SVMs), Learning To Classify Text Using Support Vector Machines presents a new approach to generating text classifiers from examples. The approach combines high performance and efficiency with theoretical understanding and improved robustness. In particular, it is highly effective without greedy heuristic components. The SVM approach is computationally efficient in training and classification, and it comes with a learning theory that can guide real-world applications. Learning To Classify Text Using Support Vector Machines gives a complete and detailed description of the SVM approach to learning text classifiers, including training algorithms, transductive text classification, efficient performance estimation, and a statistical learning model of text classification. In addition, it includes an overview of the field of text classification, making it self-contained even for newcomers to the field. This book gives a concise introduction to SVMs for pattern recognition, and it includes a detailed description of how to formulate text-classification tasks for machine learning.
TL;DR: This work introduces kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels, and uses the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for the task of extracting person-affiliation and organization-location relations from text.
Abstract: We present an application of kernel methods to extracting relations from unstructured natural language sources. We introduce kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels. We use the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for the task of extracting person-affiliation and organization-location relations from text. We experimentally evaluate the proposed methods and compare them with feature-based learning algorithms, with promising results.
TL;DR: Using a set of benchmark data from a KDD (knowledge discovery and data mining) competition designed by DARPA, it is demonstrated that efficient and accurate classifiers can be built to detect intrusions.
Abstract: Information security is an issue of serious global concern. The complexity, accessibility, and openness of the Internet have served to increase the security risk of information systems tremendously. This paper concerns intrusion detection. We describe approaches to intrusion detection using neural networks and support vector machines. The key ideas are to discover useful patterns or features that describe user behavior on a system, and use the set of relevant features to build classifiers that can recognize anomalies and known intrusions, hopefully in real time. Using a set of benchmark data from a KDD (knowledge discovery and data mining) competition designed by DARPA, we demonstrate that efficient and accurate classifiers can be built to detect intrusions. We compare the performance of neural networks based, and support vector machine based, systems for intrusion detection.
TL;DR: This work describes a representation of gait appearance based on simple features such as moments extracted from orthogonal view video silhouettes of human walking motion that contains enough information to perform well on human identification and gender classification tasks.
Abstract: We describe a representation of gait appearance for the purpose of person identification and classification This gait representation is based on simple features such as moments extracted from orthogonal view video silhouettes of human walking motion Despite its simplicity, the resulting feature vector contains enough information to perform well on human identification and gender classification tasks We explore the recognition behaviors of two different methods to aggregate features over time under different recognition tasks We demonstrate the accuracy of recognition using gait video sequences collected over different days and times and under varying lighting environments In addition, we show results for gender classification based our gait appearance features using a support-vector machine
TL;DR: This paper discusses for the first time the problem of designing output codes for multiclass problems, and gives a time and space efficient algorithm for solving the quadratic program.
Abstract: Output coding is a general framework for solving multiclass categorization problems. Previous research on output codes has focused on building multiclass machines given predefined output codes. In this paper we discuss for the first time the problem of designing output codes for multiclass problems. For the design problem of discrete codes, which have been used extensively in previous works, we present mostly negative results. We then introduce the notion of continuous codes and cast the design problem of continuous codes as a constrained optimization problem. We describe three optimization problems corresponding to three different norms of the code matrix. Interestingly, for the l2 norm our formalism results in a quadratic program whose dual does not depend on the length of the code. A special case of our formalism provides a multiclass scheme for building support vector machines which can be solved efficiently. We give a time and space efficient algorithm for solving the quadratic program. We describe preliminary experiments with synthetic data show that our algorithm is often two orders of magnitude faster than standard quadratic programming packages. We conclude with the generalization properties of the algorithm.
TL;DR: Nonlinear support vector machines are investigated for appearance-based gender classification with low-resolution "thumbnail" faces processed from the FERET (FacE REcognition Technology) face database, demonstrating robustness and stability with respect to scale and the degree of facial detail.
Abstract: Nonlinear support vector machines (SVMs) are investigated for appearance-based gender classification with low-resolution "thumbnail" faces processed from 1,755 images from the FERET (FacE REcognition Technology) face database. The performance of SVMs (3.4% error) is shown to be superior to traditional pattern classifiers (linear, quadratic, Fisher linear discriminant, nearest-neighbor) as well as more modern techniques, such as radial basis function (RBF) classifiers and large ensemble-RBF networks. Furthermore, the difference in classification performance with low-resolution "thumbnails" (21/spl times/12 pixels) and the corresponding higher-resolution images (84/spl times/48 pixels) was found to be only 1%, thus demonstrating robustness and stability with respect to scale and the degree of facial detail.
TL;DR: This work reports the recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
Abstract: Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
TL;DR: A framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, which allows for Bayesian model selection and is less complex in implementation is presented.
Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated in O(d) rather than O(n), d ≪ n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities ('error bars'), allows for Bayesian model selection and is less complex in implementation.
TL;DR: In this article, a superset of the traditional two-dimensional space vector modulation (SVM) scheme is proposed for controlling the four-leg voltage-source converters, which can effectively provide the neutral connection in three-phase four-wire systems.
Abstract: Four-leg voltage-source converters can effectively provide the neutral connection in three-phase four-wire systems. They can be used in inverter, rectifier, and active filter applications to handle the neutral current caused by the unbalanced and/or nonlinear load or unbalanced source. In this paper, three-dimensional (3-D) space vector modulation (SVM) schemes are proposed for controlling the four-leg voltage-source converters. Important issues for 3-D SVM, such as definition of 3-D vectors, identification of adjacent switching vectors in the 3-D space, and switching vector sequencing schemes and comparisons are addressed. The proposed 3-D SVM is a superset of the traditional two-dimensional (2-D) SVM, and thus it inherits all the merits of the traditional 2-D SVM. A 100 kW 5 kHz four-leg inverter and a 20 kHz four-leg rectifier prototypes are built and controlled by the proposed 3-D SVM. Experimental results are presented to validate the effectiveness of the 3-D SVM.
TL;DR: The performance of both types of classifiers in two-class fault/no-fault recognition examples are examined and the attempts to improve the overall generalisationperformance of both techniques through the use of genetic algorithm based feature selection process are examined.
TL;DR: A brief introduction of SVMs is described and its numerous applications are summarized, which show good generalization performance on many real-life data and the approach is properly motivated theoretically.
Abstract: In this paper, we present a comprehensive survey on applications of Support Vector Machines (SVMs) for pattern recognition. Since SVMs show good generalization performance on many real-life data and the approach is properly motivated theoretically, it has been applied to wide range of applications. This paper describes a brief introduction of SVMs and summarizes its numerous applications.
TL;DR: Experimental results demonstrate the effectiveness of SVMs in texture classification, and it is shown that SVMs can incorporate conventional texture feature extraction methods within their own architecture, while also providing solutions to problems inherent in these methods.
Abstract: This paper investigates the application of support vector machines (SVMs) in texture classification. Instead of relying on an external feature extractor, the SVM receives the gray-level values of the raw pixels, as SVMs can generalize well even in high-dimensional spaces. Furthermore, it is shown that SVMs can incorporate conventional texture feature extraction methods within their own architecture, while also providing solutions to problems inherent in these methods. One-against-others decomposition is adopted to apply binary SVMs to multitexture classification, plus a neural network is used as an arbitrator to make final classifications from several one-against-others SVM outputs. Experimental results demonstrate the effectiveness of SVMs in texture classification.
TL;DR: Two main approaches to the problem of ranking k instances with the use of a "large margin" principle are introduced: the "fixed margin" policy in which the margin of the closest neighboring classes is being maximized and a direct generalization of SVM to ranking learning.
Abstract: We discuss the problem of ranking k instances with the use of a "large margin" principle. We introduce two main approaches: the first is the "fixed margin" policy in which the margin of the closest neighboring classes is being maximized — which turns out to be a direct generalization of SVM to ranking learning. The second approach allows for k - 1 different margins where the sum of margins is maximized. This approach is shown to reduce to v-SVM when the number of classes k - 2. Both approaches are optimal in size of 2l where l is the total number of training examples. Experiments performed on visual classification and "collaborative filtering" show that both approaches outperform existing ordinal regression algorithms applied for ranking and multi-class SVM applied to general multi-class classification.
TL;DR: This work shows how matching pursuit can be extended to use non-squared error loss functions, and how it can be used to build kernel-based solutions to machine learning problems, while keeping control of the sparsity of the solution.
Abstract: Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the least-squares sense. We show how matching pursuit can be extended to use non-squared error loss functions, and how it can be used to build kernel-based solutions to machine learning problems, while keeping control of the sparsity of the solution. We present a version of the algorithm that makes an optimal choice of both the next basis and the weights of all the previously chosen bases. Finally, links to boosting algorithms and RBF training procedures, as well as an extensive experimental comparison with SVMs for classification are given, showing comparable results with typically much sparser models.
TL;DR: It is shown that an NE recognizer based on Support Vector Machines (SVMs) gives better scores than conventional systems, but off-the-shelf SVM classifiers are too inefficient for this task.
Abstract: Named Entity (NE) recognition is a task in which proper nouns and numerical information are extracted from documents and are classified into categories such as person, organization, and date. It is a key technology of Information Extraction and Open-Domain Question Answering. First, we show that an NE recognizer based on Support Vector Machines (SVMs) gives better scores than conventional systems. However, off-the-shelf SVM classifiers are too inefficient for this task. Therefore, we present a method that makes the system substantially faster. This approach can also be applied to other similar tasks such as chunking and part-of-speech tagging. We also present an SVM-based feature selection method and an efficient training method.
TL;DR: This work generalizes SMO so that it can handle regression problems, and addresses problems with several modifications that enable caching to be effectively used with SMO.
Abstract: The sequential minimal optimization algorithm (SMO) has been shown to be an effective method for training support vector machines (SVMs) on classification tasks defined on sparse data sets. SMO differs from most SVM algorithms in that it does not require a quadratic programming solver. In this work, we generalize SMO so that it can handle regression problems. However, one problem with SMO is that its rate of convergence slows down dramatically when data is non-sparse and when there are many support vectors in the solution—as is often the case in regression—because kernel function evaluations tend to dominate the runtime in this case. Moreover, caching kernel function outputs can easily degrade SMO's performance even more because SMO tends to access kernel function outputs in an unstructured manner. We address these problems with several modifications that enable caching to be effectively used with SMO. For regression problems, our modifications improve convergence time by over an order of magnitude.