TL;DR: A new data set is proposed, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.
Abstract: During the last decade, anomaly detection has attracted the attention of many researchers to overcome the weakness of signature-based IDSs in detecting novel attacks, and KDDCUP'99 is the mostly widely used data set for the evaluation of these systems. Having conducted a statistical analysis on this data set, we found two important issues which highly affects the performance of evaluated systems, and results in a very poor evaluation of anomaly detection approaches. To solve these issues, we have proposed a new data set, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.
TL;DR: An extension of the SPM method is developed, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and a linear SPM kernel based on SIFT sparse codes is proposed, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.
Abstract: Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n2 ~ n3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.
TL;DR: In this article, the authors proposed a new method, objective perturbation, for privacy-preserving machine learning algorithm design, which perturbs the objective function before optimizing over classifiers.
Abstract: Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the $\epsilon$-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.
TL;DR: A survey of time series prediction applications using a novel machine learning approach: support vector machines (SVM).
Abstract: Time series prediction techniques have been used in many real-world applications such as financial market prediction, electric utility load forecasting , weather and environmental state prediction, and reliability forecasting. The underlying system models and time series data generating processes are generally complex for these applications and the models for these systems are usually not known a priori. Accurate and unbiased estimation of the time series data produced by these systems cannot always be achieved using well known linear techniques, and thus the estimation process requires more advanced time series prediction algorithms. This paper provides a survey of time series prediction applications using a novel machine learning approach: support vector machines (SVM). The underlying motivation for using SVMs is the ability of this methodology to accurately forecast time series data when the underlying system processes are typically nonlinear, non-stationary and not defined a-priori. SVMs have also been proven to outperform other non-linear techniques including neural-network based non-linear prediction techniques such as multi-layer perceptrons.The ultimate goal is to provide the reader with insight into the applications using SVM for time series prediction, to give a brief tutorial on SVMs for time series prediction, to outline some of the advantages and challenges in using SVMs for time series prediction, and to provide a source for the reader to locate books, technical journals, and other online SVM research resources.
TL;DR: Of the four SVM variations considered in this paper, the novel granular SVMs-repetitive undersampling algorithm (GSVM-RU) is the best in terms of both effectiveness and efficiency.
Abstract: Traditional classification algorithms can be limited in their performance on highly unbalanced data sets. A popular stream of work for countering the problem of class imbalance has been the application of a sundry of sampling strategies. In this paper, we focus on designing modifications to support vector machines (SVMs) to appropriately tackle the problem of class imbalance. We incorporate different ldquorebalancerdquo heuristics in SVM modeling, including cost-sensitive learning, and over- and undersampling. These SVM-based strategies are compared with various state-of-the-art approaches on a variety of data sets by using various metrics, including G-mean, area under the receiver operating characteristic curve, F-measure, and area under the precision/recall curve. We show that we are able to surpass or match the previously known best algorithms on each data set. In particular, of the four SVM variations considered in this paper, the novel granular SVMs-repetitive undersampling algorithm (GSVM-RU) is the best in terms of both effectiveness and efficiency. GSVM-RU is effective, as it can minimize the negative effect of information loss while maximizing the positive effect of data cleaning in the undersampling process. GSVM-RU is efficient by extracting much less support vectors and, hence, greatly speeding up SVM prediction.
TL;DR: The feasibility of classifying different human activities based on micro-Doppler signatures is investigated and the potentials of classify human activities over extended time duration, through wall, and at oblique angles with respect to the radar are investigated and discussed.
Abstract: The feasibility of classifying different human activities based on micro-Doppler signatures is investigated. Measured data of 12 human subjects performing seven different activities are collected using a Doppler radar. The seven activities include running, walking, walking while holding a stick, crawling, boxing while moving forward, boxing while standing in place, and sitting still. Six features are extracted from the Doppler spectrogram. A support vector machine (SVM) is then trained using the measurement features to classify the activities. A multiclass classification is implemented using a decision-tree structure. Optimal parameters for the SVM are found through a fourfold cross-validation. The resulting classification accuracy is found to be more than 90%. The potentials of classifying human activities over extended time duration, through wall, and at oblique angles with respect to the radar are also investigated and discussed.
TL;DR: An uncertainty measure is proposed that generalizes margin-based uncertainty to the multi-class case and is easy to compute, so that active learning can handle a large number of classes and large data sizes efficiently.
Abstract: One of the principal bottlenecks in applying learning techniques to classification problems is the large amount of labeled training data required. Especially for images and video, providing training data is very expensive in terms of human time and effort. In this paper we propose an active learning approach to tackle the problem. Instead of passively accepting random training examples, the active learning algorithm iteratively selects unlabeled examples for the user to label, so that human effort is focused on labeling the most “useful” examples. Our method relies on the idea of uncertainty sampling, in which the algorithm selects unlabeled examples that it finds hardest to classify. Specifically, we propose an uncertainty measure that generalizes margin-based uncertainty to the multi-class case and is easy to compute, so that active learning can handle a large number of classes and large data sizes efficiently. We demonstrate results for letter and digit recognition on datasets from the UCI repository, object recognition results on the Caltech-101 dataset, and scene categorization results on a dataset of 13 natural scene categories. The proposed method gives large reductions in the number of training examples required over random selection to achieve similar classification accuracy, with little computational overhead.
TL;DR: A new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets are introduced that can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that the authors call multilayers kernel machines (MKMs).
Abstract: We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets.
TL;DR: A method to identify and localize object classes in images by constructing a classifier on the histogram of local features found in each superpixel using superpixels as the basic unit of a class segmentation or pixel localization scheme.
Abstract: We propose a method to identify and localize object classes in images Instead of operating at the pixel level, we advocate the use of superpixels as the basic unit of a class segmentation or pixel localization scheme To this end, we construct a classifier on the histogram of local features found in each superpixel We regularize this classifier by aggregating histograms in the neighborhood of each superpixel and then refine our results further by using the classifier in a conditional random field operating on the superpixel graph Our proposed method exceeds the previously published state-of-the-art on two challenging datasets: Graz-02 and the PASCAL VOC 2007 Segmentation Challenge
TL;DR: A large-margin formulation and algorithm for structured output prediction that allows the use of latent variables and the generality and performance of the approach is demonstrated through three applications including motiffinding, noun-phrase coreference resolution, and optimizing precision at k in information retrieval.
Abstract: We present a large-margin formulation and algorithm for structured output prediction that allows the use of latent variables. Our proposal covers a large range of application problems, with an optimization problem that can be solved efficiently using Concave-Convex Programming. The generality and performance of the approach is demonstrated through three applications including motiffinding, noun-phrase coreference resolution, and optimizing precision at k in information retrieval.
TL;DR: A method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is least significant bit (LSB) matching.
Abstract: This paper presents a novel method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is LSB matching. First, arguments are provided for modeling differences between adjacent pixels using first-order and second-order Markov chains. Subsets of sample transition probability matrices are then used as features for a steganalyzer implemented by support vector machines. The accuracy of the presented steganalyzer is evaluated on LSB matching and four different databases. The steganalyzer achieves superior accuracy with respect to prior art and provides stable results across various cover sources. Since the feature set based on second-order Markov chain is high-dimensional, we address the issue of curse of dimensionality using a feature selection algorithm and show that the curse did not occur in our experiments.
TL;DR: A new spectral-spatial classification scheme for hyperspectral images is proposed that improves the classification accuracies and provides classification maps with more homogeneous regions, when compared to pixel wise classification.
Abstract: A new spectral-spatial classification scheme for hyperspectral images is proposed. The method combines the results of a pixel wise support vector machine classification and the segmentation map obtained by partitional clustering using majority voting. The ISODATA algorithm and Gaussian mixture resolving techniques are used for image clustering. Experimental results are presented for two hyperspectral airborne images. The developed classification scheme improves the classification accuracies and provides classification maps with more homogeneous regions, when compared to pixel wise classification. The proposed method performs particularly well for classification of images with large spatial structures and when different classes have dissimilar spectral responses and a comparable number of pixels.
TL;DR: An efficient version of the RLDA recently presented by Ye to cope with critical ill-posed hyperspectral image classification problems is introduced in the remote sensing community and several LDA-based classifiers are compared theoretically and experimentally with the standard LDA and theRLDA.
Abstract: This paper analyzes the classification of hyperspectral remote sensing images with linear discriminant analysis (LDA) in the presence of a small ratio between the number of training samples and the number of spectral features. In these particular ill-posed problems, a reliable LDA requires one to introduce regularization for problem solving. Nonetheless, in such a challenging scenario, the resulting regularized LDA (RLDA) is highly sensitive to the tuning of the regularization parameter. In this context, we introduce in the remote sensing community an efficient version of the RLDA recently presented by Ye to cope with critical ill-posed problems. In addition, several LDA-based classifiers (i.e., penalized LDA, orthogonal LDA, and uncorrelated LDA) are compared theoretically and experimentally with the standard LDA and the RLDA. Method differences are highlighted through toy examples and are exhaustively tested on several ill-posed problems related to the classification of hyperspectral remote sensing images. Experimental results confirm the effectiveness of the presented RLDA technique and point out the main properties of other analyzed LDA techniques in critical ill-posed hyperspectral image classification problems.
TL;DR: Details of the new paradigm and corresponding algorithms are discussed, some new algorithms are introduced, several specific forms of privileged information are considered, and superiority of thenew learning paradigm over the classical learning paradigm when solving practical problems is demonstrated.
Abstract: In the Afterword to the second edition of the book "Estimation of Dependences Based on Empirical Data" by V. Vapnik, an advanced learning paradigm called Learning Using Hidden Information (LUHI) was introduced. This Afterword also suggested an extension of the SVM method (the so called SVM γ + method) to implement algorithms which address the LUHI paradigm (Vapnik, 1982-2006, Sections 2.4.2 and 2.5.3 of the Afterword). See also (Vapnik, Vashist, & Pavlovitch, 2008, 2009) for further development of the algorithms. In contrast to the existing machine learning paradigm where a teacher does not play an important role, the advanced learning paradigm considers some elements of human teaching. In the new paradigm along with examples, a teacher can provide students with hidden information that exists in explanations, comments, comparisons, and so on. This paper discusses details of the new paradigm 1 and corresponding algorithms, introduces some new algorithms, considers several specific forms of privileged information, demonstrates superiority of the new learning paradigm over the classical learning paradigm when solving practical problems, and discusses general questions related to the new ideas.
TL;DR: This study verified the effectiveness and robustness of SVMs in the classification of remotely sensed images and showed that SVMs, especially with the use of radial basis function kernel, outperform the maximum likelihood classifier in terms of overall and individual class accuracies.
TL;DR: Support vector machines are a family of machine learning methods originally introduced for the problem of classification and later generalized to various other situations, and are currently used in various domains of application, including bioinformatics, text categorization, and computer vision.
TL;DR: A least squares version of the recently proposed twin support vector machine (TSVM) for binary classification has comparable classification accuracy to that of TSVM but with considerably lesser computational time.
Abstract: In this paper we formulate a least squares version of the recently proposed twin support vector machine (TSVM) for binary classification. This formulation leads to extremely simple and fast algorithm for generating binary classifiers based on two non-parallel hyperplanes. Here we attempt to solve two modified primal problems of TSVM, instead of two dual problems usually solved. We show that the solution of the two modified primal problems reduces to solving just two systems of linear equations as opposed to solving two quadratic programming problems along with two systems of linear equations in TSVM. Classification using nonlinear kernel also leads to systems of linear equations. Our experiments on publicly available datasets indicate that the proposed least squares TSVM has comparable classification accuracy to that of TSVM but with considerably lesser computational time. Since linear least squares TSVM can easily handle large datasets, we further went on to investigate its efficiency for text categorization applications. Computational results demonstrate the effectiveness of the proposed method over linear proximal SVM on all the text corpuses considered.
TL;DR: This study investigates several widely-used unsupervised and supervised term weighting methods on benchmark data collections in combination with SVM and kNN algorithms and proposes a new simple supervisedterm weighting method, tf.rf, to improve the terms' discriminating power for text categorization task.
Abstract: In vector space model (VSM), text representation is the task of transforming the content of a textual document into a vector in the term space so that the document could be recognized and classified by a computer or a classifier. Different terms (i.e. words, phrases, or any other indexing units used to identify the contents of a text) have different importance in a text. The term weighting methods assign appropriate weights to the terms to improve the performance of text categorization. In this study, we investigate several widely-used unsupervised (traditional) and supervised term weighting methods on benchmark data collections in combination with SVM and kNN algorithms. In consideration of the distribution of relevant documents in the collection, we propose a new simple supervised term weighting method, i.e. tf.rf, to improve the terms' discriminating power for text categorization task. From the controlled experimental results, these supervised term weighting methods have mixed performance. Specifically, our proposed supervised term weighting method, tf.rf, has a consistently better performance than other term weighting methods while other supervised term weighting methods based on information theory or statistical metric perform the worst in all experiments. On the other hand, the popularly used tf.idf method has not shown a uniformly good performance in terms of different data sets.
TL;DR: In this paper, an advanced learning paradigm called Learning Using Hidden Information (LUHI) was introduced, where a teacher can provide students with hidden information that exists in explanations, comments, comparisons, and so on.
TL;DR: This paper describes a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set, and is shown to outperform state-of-the-art techniques on three varied datasets.
Abstract: Significant research has been devoted to detecting people in images and videos. In this paper we describe a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set. This augmentation results in an extremely high-dimensional feature space (more than 170,000 dimensions). In such high-dimensional spaces, classical machine learning algorithms such as SVMs are nearly intractable with respect to training. Furthermore, the number of training samples is much smaller than the dimensionality of the feature space, by at least an order of magnitude. Finally, the extraction of features from a densely sampled grid structure leads to a high degree of multicollinearity. To circumvent these data characteristics, we employ Partial Least Squares (PLS) analysis, an efficient dimensionality reduction technique, one which preserves significant discriminative information, to project the data onto a much lower dimensional subspace (20 dimensions, reduced from the original 170,000). Our human detection system, employing PLS analysis over the enriched descriptor set, is shown to outperform state-of-the-art techniques on three varied datasets including the popular INRIA pedestrian dataset, the low-resolution gray-scale DaimlerChrysler pedestrian dataset, and the ETHZ pedestrian dataset consisting of full-length videos of crowded scenes.
TL;DR: Two active learning algorithms for semiautomatic definition of training samples in remote sensing image classification, based on predefined heuristics, are proposed, which reach the same level of accuracy as larger data sets.
Abstract: In this paper, we propose two active learning algorithms for semiautomatic definition of training samples in remote sensing image classification. Based on predefined heuristics, the classifier ranks the unlabeled pixels and automatically chooses those that are considered the most valuable for its improvement. Once the pixels have been selected, the analyst labels them manually and the process is iterated. Starting with a small and nonoptimal training set, the model itself builds the optimal set of samples which minimizes the classification error. We have applied the proposed algorithms to a variety of remote sensing data, including very high resolution and hyperspectral images, using support vector machines. Experimental results confirm the consistency of the methods. The required number of training samples can be reduced to 10% using the methods proposed, reaching the same level of accuracy as larger data sets. A comparison with a state-of-the-art active learning method, margin sampling, is provided, highlighting advantages of the methods proposed. The effect of spatial resolution and separability of the classes on the quality of the selection of pixels is also discussed.
TL;DR: Experimental results show that the proposed model outperforms the SVR model with non-filtered forecasting variables and a random walk model.
Abstract: As financial time series are inherently noisy and non-stationary, it is regarded as one of the most challenging applications of time series forecasting. Due to the advantages of generalization capability in obtaining a unique solution, support vector regression (SVR) has also been successfully applied in financial time series forecasting. In the modeling of financial time series using SVR, one of the key problems is the inherent high noise. Thus, detecting and removing the noise are important but difficult tasks when building an SVR forecasting model. To alleviate the influence of noise, a two-stage modeling approach using independent component analysis (ICA) and support vector regression is proposed in financial time series forecasting. ICA is a novel statistical signal processing technique that was originally proposed to find the latent source signals from observed mixture signals without having any prior knowledge of the mixing mechanism. The proposed approach first uses ICA to the forecasting variables for generating the independent components (ICs). After identifying and removing the ICs containing the noise, the rest of the ICs are then used to reconstruct the forecasting variables which contain less noise and served as the input variables of the SVR forecasting model. In order to evaluate the performance of the proposed approach, the Nikkei 225 opening index and TAIEX closing index are used as illustrative examples. Experimental results show that the proposed model outperforms the SVR model with non-filtered forecasting variables and a random walk model.
TL;DR: A novel learning method, Support Vector Machine (SVM), is applied on different data which have two or multi class, and the comparative results using different kernel functions for all data samples are shown.
Abstract: Classification is one of the most important tasks for different application such as text categorization, tone recognition, image classification, micro-array gene expression, proteins structure predictions, data Classification etc. Most of the existing supervised classification methods are based on traditional statistics, which can provide ideal results when sample size is tending to infinity. However, only finite samples can be acquired in practice. In this paper, a novel learning method, Support Vector Machine (SVM), is applied on different data (Diabetes data, Heart Data, Satellite Data and Shuttle data) which have two or multi class. SVM, a powerful machine method developed from statistical learning and has made significant achievement in some field. Introduced in the early 90’s, they led to an explosion of interest in machine learning. The foundations of SVM have been developed by Vapnik and are gaining popularity in field of machine learning due to many attractive features and promising empirical performance. SVM method does not suffer the limitations of data dimensionality and limited samples [1] & [2]. In our experiment, the support vectors, which are critical for classification, are obtained by learning from the training samples. In this paper we have shown the comparative results using different kernel functions for all data samples.
TL;DR: A novel wrapper Algorithm for Feature Selection, using Support Vector Machines with kernel functions, based on a sequential backward selection, using the number of errors in a validation subset as the measure to decide which feature to remove in each iteration.
TL;DR: This book presents a meta-analysis of Mouse Urine Spectroscopy for Salival Analysis of the Effect of Mouthwash, which highlights the importance of knowing the carrier and removal status of the gas molecule.
Abstract: Acknowledgements. Preface. 1 Introduction. 1.1 Past, Present and Future. 1.2 About this Book. Bibliography. 2 Case Studies. 2.1 Introduction. 2.2 Datasets, Matrices and Vectors. 2.3 Case Study 1: Forensic Analysis of Banknotes. 2.4 Case Study 2: Near Infrared Spectroscopic Analysis of Food. 2.5 Case Study 3: Thermal Analysis of Polymers. 2.6 Case Study 4: Environmental Pollution using Headspace Mass Spectrometry. 2.7 Case Study 5: Human Sweat Analysed by Gas Chromatography Mass Spectrometry. 2.8 Case Study 6: Liquid Chromatography Mass Spectrometry of Pharmaceutical Tablets. 2.9 Case Study 7: Atomic Spectroscopy for the Study of Hypertension. 2.10 Case Study 8: Metabolic Profiling of Mouse Urine by Gas Chromatography of Urine Extracts. 2.11 Case Study 9: Nuclear Magnetic Resonance Spectroscopy for Salival Analysis of the Effect of Mouthwash. 2.12 Case Study 10: Simulations. 2.13 Case Study 11: Null Dataset. 2.14 Case Study 12: GCMS and Microbiology of Mouse Scent Marks. Bibliography. 3 Exploratory Data Analysis. 3.1 Introduction. 3.2 Principal Components Analysis. 3.2.1 Background. 3.2.2 Scores and Loadings. 3.2.3 Eigenvalues. 3.2.4 PCA Algorithm. 3.2.5 Graphical Representation. 3.3 Dissimilarity Indices, Principal Co-ordinates Analysis and Ranking. 3.3.1 Dissimilarity. 3.3.2 Principal Co-ordinates Analysis. 3.3.3 Ranking. 3.4 Self Organizing Maps. 3.4.1 Background. 3.4.2 SOM Algorithm. 3.4.3 Initialization. 3.4.4 Training. 3.4.5 Map Quality. 3.4.6 Visualization. Bibliography. 4 Preprocessing. 4.1 Introduction. 4.2 Data Scaling. 4.2.1 Transforming Individual Elements. 4.2.2 Row Scaling. 4.2.3 Column Scaling. 4.3 Multivariate Methods of Data Reduction. 4.3.1 Largest Principal Components. 4.3.2 Discriminatory Principal Components. 4.3.3 Partial Least Squares Discriminatory Analysis Scores. 4.4 Strategies for Data Preprocessing. 4.4.1 Flow Charts. 4.4.2 Level 1. 4.4.3 Level 2. 4.4.4 Level 3. 4.4.5 Level 4. Bibliography. 5 Two Class Classifiers. 5.1 Introduction. 5.1.1 Two Class Classifiers. 5.1.2 Preprocessing. 5.1.3 Notation. 5.1.4 Autoprediction and Class Boundaries. 5.2 Euclidean Distance to Centroids. 5.3 Linear Discriminant Analysis. 5.4 Quadratic Discriminant Analysis. 5.5 Partial Least Squares Discriminant Analysis. 5.5.1 PLS Method. 5.5.2 PLS Algorithm. 5.5.3 PLS-DA. 5.6 Learning Vector Quantization. 5.6.1 Voronoi Tesselation and Codebooks. 5.6.2 LVQ1. 5.6.3 LVQ3. 5.6.4 LVQ Illustration and Summary of Parameters. 5.7 Support Vector Machines. 5.7.1 Linear Learning Machines. 5.7.2 Kernels. 5.7.3 Controlling Complexity and Soft Margin SVMs. 5.7.4 SVM Parameters. Bibliography. 6 One Class Classifiers. 6.1 Introduction. 6.2 Distance Based Classifiers. 6.3 PC Based Models and SIMCA. 6.4 Indicators of Significance. 6.4.1 Gaussian Density Estimators and Chi-Squared. 6.4.2 Hotelling's T 2 . 6.4.3 D-Statistic. 6.4.4 Q-Statistic or Squared Prediction Error. 6.4.5 Visualization of D- and Q-Statistics for Disjoint PC Models. 6.4.6 Multivariate Normality and What to do if it Fails. 6.5 Support Vector Data Description. 6.6 Summarizing One Class Classifiers. 6.6.1 Class Membership Plots. 6.6.2 ROC Curves. Bibliography. 7 Multiclass Classifiers. 7.1 Introduction. 7.2 EDC, LDA and QDA. 7.3 LVQ. 7.4 PLS. 7.4.1 PLS2. 7.4.2 PLS1. 7.5 SVM. 7.6 One against One Decisions. Bibliography. 8 Validation and Optimization. 8.1 Introduction. 8.1.1 Validation. 8.1.2 Optimization. 8.2 Classification Abilities, Contingency Tables and Related Concepts. 8.2.1 Two Class Classifiers. 8.2.2 Multiclass Classifiers. 8.2.3 One Class Classifiers. 8.3 Validation. 8.3.1 Testing Models. 8.3.2 Test and Training Sets. 8.3.3 Predictions. 8.3.4 Increasing the Number of Variables for the Classifier. 8.4 Iterative Approaches for Validation. 8.4.1 Predictive Ability, Model Stability, Classification by Majority Vote and Cross Classification Rate. 8.4.2 Number of Iterations. 8.4.3 Test and Training Set Boundaries. 8.5 Optimizing PLS Models. 8.5.1 Number of Components: Cross-Validation and Bootstrap. 8.5.2 Thresholds and ROC Curves. 8.6 Optimizing Learning Vector Quantization Models. 8.7 Optimizing Support Vector Machine Models. Bibliography. 9 Determining Potential Discriminatory Variables. 9.1 Introduction. 9.1.1 Two Class Distributions. 9.1.2 Multiclass Distributions. 9.1.3 Multilevel and Multiway Distributions. 9.1.4 Sample Sizes. 9.1.5 Modelling after Variable Reduction. 9.1.6 Preliminary Variable Reduction. 9.2 Which Variables are most Significant?. 9.2.1 Basic Concepts: Statistical Indicators and Rank. 9.2.2 T-Statistic and Fisher Weights. 9.2.3 Multiple Linear Regression, ANOVA and the F-Ratio. 9.2.4 Partial Least Squares. 9.2.5 Relationship between the Indicator Functions. 9.3 How Many Variables are Significant? 9.3.1 Probabilistic Approaches. 9.3.2 Empirical Methods: Monte Carlo. 9.3.3 Cost/Benefit of Increasing the Number of Variables. Bibliography. 10 Bayesian Methods and Unequal Class Sizes. 10.1 Introduction. 10.2 Contingency Tables and Bayes' Theorem. 10.3 Bayesian Extensions to Classifiers. Bibliography. 11 Class Separation Indices. 11.1 Introduction. 11.2 Davies Bouldin Index. 11.3 Silhouette Width and Modified Silhouette Width. 11.3.1 Silhouette Width. 11.3.2 Modified Silhouette Width. 11.4 Overlap Coefficient. Bibliography. 12 Comparing Different Patterns. 12.1 Introduction. 12.2 Correlation Based Methods. 12.2.1 Mantel Test. 12.2.2 R V Coefficient. 12.3 Consensus PCA. 12.4 Procrustes Analysis. Bibliography. Index.
TL;DR: In this article, support vector machine (SVM) is used to predict hourly building cooling load, which can achieve better accuracy and generalization than the traditional back-propagation (BP) neural network model.
TL;DR: This work considers regularized support vector machines and shows that they are precisely equivalent to a new robust optimization formulation, thus establishing robustness as the reason regularized SVMs generalize well and gives a new proof of consistency of (kernelized) SVMs.
Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well.
TL;DR: A support vector classifier was trained that reliably distinguishes healthy volunteers from clinically depressed patients and two feature selection algorithms were implemented that incorporate reliability information into the feature selection process.
Abstract: The application of multivoxel pattern analysis methods has attracted increasing attention, particularly for brain state prediction and real-time functional MRI applications. Support vector classification is the most popular of these techniques, owing to reports that it has better prediction accuracy and is less sensitive to noise. Support vector classification was applied to learn functional connectivity patterns that distinguish patients with depression from healthy volunteers. In addition, two feature selection algorithms were implemented (one filter method, one wrapper method) that incorporate reliability information into the feature selection process. These reliability feature selections methods were compared to two previously proposed feature selection methods. A support vector classifier was trained that reliably distinguishes healthy volunteers from clinically depressed patients. The reliability feature selection methods outperformed previously utilized methods. The proposed framework for applying support vector classification to functional connectivity data is applicable to other disease states beyond major depression.
TL;DR: This paper presents a meta-modelling architecture for semi-supervised image classification of hyperspectral remote sensing data using a SVM and a proposed circular validation strategy for land-cover maps updating.
Abstract: About the editors. List of authors. Preface. Acknowledgments. List of symbols. List of abbreviations. I Introduction. 1 Machine learning techniques in remote sensing data analysis (Bjorn Waske, Mathieu Fauvel, Jon Atli Benediktsson and Jocelyn Chanussot). 1.1 Introduction. 1.2 Supervised classification: algorithms and applications. 1.3 Conclusion. Acknowledgments. References. 2 An introduction to kernel learning algorithms (Peter V. Gehler and Bernhard Scholkopf). 2.1 Introduction. 2.2 Kernels. 2.3 The representer theorem. 2.4 Learning with kernels. 2.5 Conclusion. References. II Supervised image classification. 3 The Support Vector Machine (SVM) algorithm for supervised classification of hyperspectral remote sensing data (J. Anthony Gualtieri). 3.1 Introduction. 3.2 Aspects of hyperspectral data and its acquisition. 3.3 Hyperspectral remote sensing and supervised classification. 3.4 Mathematical foundations of supervised classification. 3.5 From structural risk minimization to a support vector machine algorithm. 3.6 Benchmark hyperspectral data sets. 3.7 Results. 3.8 Using spatial coherence. 3.9 Why do SVMs perform better than other methods? 3.10 Conclusions. References. 4 On training and evaluation of SVM for remote sensing applications (Giles M. Foody). 4.1 Introduction. 4.2 Classification for thematic mapping. 4.3 Overview of classification by a SVM. 4.4 Training stage. 4.5 Testing stage. 4.6 Conclusion. Acknowledgments. References. 5 Kernel Fisher's Discriminant with heterogeneous kernels (M. Murat Dundar and Glenn Fung). 5.1 Introduction. 5.2 Linear Fisher's Discriminant. 5.3 Kernel Fisher Discriminant. 5.4 Kernel Fisher's Discriminant with heterogeneous kernels. 5.5 Automatic kernel selection KFD algorithm. 5.6 Numerical results. 5.7 Conclusion. References. 6 Multi-temporal image classification with kernels (Jordi Munoz-Mari, Luis Gomez-Choa, Manel Martinez-Ramon, Jose Luis Rojo-Alvarez, Javier Calpe-Maravilla and Gustavo Camps-Valls). 6.1 Introduction. 6.2 Multi-temporal classification and change detection with kernels. 6.3 Contextual and multi-source data fusion with kernels. 6.4 Multi-temporal/-source urban monitoring. 6.5 Conclusions. Acknowledgments. References. 7 Target detection with kernels (Nasser M. Nasrabadi). 7.1 Introduction. 7.2 Kernel learning theory. 7.3 Linear subspace-based anomaly detectors and their kernel versions. 7.4 Results. 7.5 Conclusion. References. 8 One-class SVMs for hyperspectral anomaly detection (Amit Banerjee, Philippe Burlina and Chris Diehl). 8.1 Introduction. 8.2 Deriving the SVDD. 8.3 SVDD function optimization. 8.4 SVDD algorithms for hyperspectral anomaly detection. 8.5 Experimental results. 8.6 Conclusions. References. III Semi-supervised image classification. 9 A domain adaptation SVM and a circular validation strategy for land-cover maps updating (Mattia Marconcini and Lorenzo Bruzzone). 9.1 Introduction. 9.2 Literature survey. 9.3 Proposed domain adaptation SVM. 9.4 Proposed circular validation strategy. 9.5 Experimental results. 9.6 Discussions and conclusion. References. 10 Mean kernels for semi-supervised remote sensing image classification (Luis Gomez-Chova, Javier Calpe-Maravilla, Lorenzo Bruzzone and Gustavo Camps-Valls). 10.1 Introduction. 10.2 Semi-supervised classification with mean kernels. 10.3 Experimental results. 10.4 Conclusions. Acknowledgments. References. IV Function approximation and regression. 11 Kernel methods for unmixing hyperspectral imagery (Joshua Broadwater, Amit Banerjee and Philippe Burlina). 11.1 Introduction. 11.2 Mixing models. 11.3 Proposed kernel unmixing algorithm. 11.4 Experimental results of the kernel unmixing algorithm. 11.5 Development of physics-based kernels for unmixing. 11.6 Physics-based kernel results. 11.7 Summary. References. 12 Kernel-based quantitative remote sensing inversion (Yanfei Wang, Changchun Yang and Xiaowen Li). 12.1 Introduction. 12.2 Typical kernel-based remote sensing inverse problems. 12.3 Well-posedness and ill-posedness. 12.4 Regularization. 12.5 Optimization techniques. 12.6 Kernel-based BRDF model inversion. 12.7 Aerosol particle size distribution function retrieval. 12.8 Conclusion. Acknowledgments. References. 13 Land and sea surface temperature estimation by support vector regression (Gabriele Moser and Sebastiano B. Serpico). 13.1 Introduction. 13.2 Previous work. 13.3 Methodology. 13.4 Experimental results. 13.5 Conclusions. Acknowledgments. References. V Kernel-based feature extraction. 14 Kernel multivariate analysis in remote sensing feature extraction (Jeronimo Arenas-Garcia and Kaare Brandt Petersen). 14.1 Introduction. 14.2 Multivariate analysis methods. 14.3 Kernel multivariate analysis. 14.4 Sparse Kernel OPLS. 14.5 Experiments: pixel-based hyperspectral image classification. 14.6 Conclusions. Acknowledgments. References. 15 KPCA algorithm for hyperspectral target/anomaly detection (Yanfeng Gu). 15.1 Introduction. 15.2 Motivation. 15.3 Kernel-based feature extraction in hyperspectral images. 15.4 Kernel-based target detection in hyperspectral images. 15.5 Kernel-based anomaly detection in hyperspectral images. 15.6 Conclusions. Acknowledgments References. 16 Remote sensing data Classification with kernel nonparametric feature extractions (Bor-Chen Kuo, Jinn-Min Yang and Cheng-Hsuan Li). 16.1 Introduction. 16.2 Related feature extractions. 16.3 Kernel-based NWFE and FLFE. 16.4 Eigenvalue resolution with regularization. 16.5 Experiments. 16.6 Comments and conclusions. References. Index.
TL;DR: A simple yet powerful branch and bound scheme that allows efficient maximization of a large class of quality functions over all possible subimages and converges to a globally optimal solution typically in linear or even sublinear time, in contrast to the quadratic scaling of exhaustive or sliding window search.
Abstract: Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To estimate the object's location, one can take a sliding window approach, but this strongly increases the computational cost because the classifier or similarity function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branch and bound scheme that allows efficient maximization of a large class of quality functions over all possible subimages. It converges to a globally optimal solution typically in linear or even sublinear time, in contrast to the quadratic scaling of exhaustive or sliding window search. We show how our method is applicable to different object detection and image retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest-neighbor classifiers based on the lambda2 distance. We demonstrate state-of-the-art localization performance of the resulting systems on the UIUC Cars data set, the PASCAL VOC 2006 data set, and in the PASCAL VOC 2007 competition.