TL;DR: In this article, the first 30 m resolution global land cover maps using Landsat Thematic Mapper TM and enhanced thematic mapper plus ETM+ data were produced. And the authors used four classifiers that were freely available were employed, including the conventional maximum likelihood classifier MLC, J4.8 decision tree classifier, Random Forest RF classifier and support vector machine SVM classifier.
Abstract: We have produced the first 30 m resolution global land-cover maps using Landsat Thematic Mapper TM and Enhanced Thematic Mapper Plus ETM+ data. We have classified over 6600 scenes of Landsat TM data after 2006, and over 2300 scenes of Landsat TM and ETM+ data before 2006, all selected from the green season. These images cover most of the world's land surface except Antarctica and Greenland. Most of these images came from the United States Geological Survey in level L1T orthorectified. Four classifiers that were freely available were employed, including the conventional maximum likelihood classifier MLC, J4.8 decision tree classifier, Random Forest RF classifier and support vector machine SVM classifier. A total of 91,433 training samples were collected by traversing each scene and finding the most representative and homogeneous samples. A total of 38,664 test samples were collected at preset, fixed locations based on a globally systematic unaligned sampling strategy. Two software tools, Global Analyst and Global Mapper developed by extending the functionality of Google Earth, were used in developing the training and test sample databases by referencing the Moderate Resolution Imaging Spectroradiometer enhanced vegetation index MODIS EVI time series for 2010 and high resolution images from Google Earth. A unique land-cover classification system was developed that can be crosswalked to the existing United Nations Food and Agriculture Organization FAO land-cover classification system as well as the International Geosphere-Biosphere Programme IGBP system. Using the four classification algorithms, we obtained the initial set of global land-cover maps. The SVM produced the highest overall classification accuracy OCA of 64.9% assessed with our test samples, with RF 59.8%, J4.8 57.9%, and MLC 53.9% ranked from the second to the fourth. We also estimated the OCAs using a subset of our test samples 8629 each of which represented a homogeneous area greater than 500 m × 500 m. Using this subset, we found the OCA for the SVM to be 71.5%. As a consistent source for estimating the coverage of global land-cover types in the world, estimation from the test samples shows that only 6.90% of the world is planted for agricultural production. The total area of cropland is 11.51% if unplanted croplands are included. The forests, grasslands, and shrublands cover 28.35%, 13.37%, and 11.49% of the world, respectively. The impervious surface covers only 0.66% of the world. Inland waterbodies, barren lands, and snow and ice cover 3.56%, 16.51%, and 12.81% of the world, respectively.
TL;DR: This paper explores the nature of open set recognition and formalizes its definition as a constrained minimization problem, and introduces a novel “1-vs-set machine,” which sculpts a decision space from the marginal distances of a 1-class or binary SVM with a linear kernel.
Abstract: To date, almost all experimental evaluations of machine learning-based recognition algorithms in computer vision have taken the form of “closed set” recognition, whereby all testing classes are known at training time. A more realistic scenario for vision applications is “open set” recognition, where incomplete knowledge of the world is present at training time, and unknown classes can be submitted to an algorithm during testing. This paper explores the nature of open set recognition and formalizes its definition as a constrained minimization problem. The open set recognition problem is not well addressed by existing algorithms because it requires strong generalization. As a step toward a solution, we introduce a novel “1-vs-set machine,” which sculpts a decision space from the marginal distances of a 1-class or binary SVM with a linear kernel. This methodology applies to several different applications in computer vision where open set recognition is a challenging problem, including object recognition and face verification. We consider both in this work, with large scale cross-dataset experiments performed over the Caltech 256 and ImageNet sets, as well as face matching experiments performed over the Labeled Faces in the Wild set. The experiments highlight the effectiveness of machines adapted for open set evaluation compared to existing 1-class and binary SVMs for the same tasks.
TL;DR: The results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.
Abstract: Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.
TL;DR: The experimental results show that the two PSO-based multi-objective algorithms can automatically evolve a set of nondominated solutions and the first algorithm outperforms the two conventional methods, the single objective method, and the two-stage algorithm.
Abstract: Classification problems often have a large number of features in the data sets, but not all of them are useful for classification. Irrelevant and redundant features may even reduce the performance. Feature selection aims to choose a small number of relevant features to achieve similar or even better classification performance than using all features. It has two main conflicting objectives of maximizing the classification performance and minimizing the number of features. However, most existing feature selection algorithms treat the task as a single objective problem. This paper presents the first study on multi-objective particle swarm optimization (PSO) for feature selection. The task is to generate a Pareto front of nondominated solutions (feature subsets). We investigate two PSO-based multi-objective feature selection algorithms. The first algorithm introduces the idea of nondominated sorting into PSO to address feature selection problems. The second algorithm applies the ideas of crowding, mutation, and dominance to PSO to search for the Pareto front solutions. The two multi-objective algorithms are compared with two conventional feature selection methods, a single objective feature selection method, a two-stage feature selection algorithm, and three well-known evolutionary multi-objective algorithms on 12 benchmark data sets. The experimental results show that the two PSO-based multi-objective algorithms can automatically evolve a set of nondominated solutions. The first algorithm outperforms the two conventional methods, the single objective method, and the two-stage algorithm. It achieves comparable results with the existing three well-known multi-objective algorithms in most cases. The second algorithm achieves better results than the first algorithm and all other methods mentioned previously.
TL;DR: An empirical comparison between SVM and ANN regarding document-level sentiment analysis is presented and it is indicated that ANN produce superior or at least comparable results to SVM's, even on the context of unbalanced data.
Abstract: Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, supervised methods consist of two stages: (i) extraction/selection of informative features and (ii) classification of reviews by using learning models like Support Vector Machines (SVM) and Nai@?ve Bayes (NB). SVM have been extensively and successfully used as a sentiment learning approach while Artificial Neural Networks (ANN) have rarely been considered in comparative studies in the sentiment analysis literature. This paper presents an empirical comparison between SVM and ANN regarding document-level sentiment analysis. We discuss requirements, resulting models and contexts in which both approaches achieve better levels of classification accuracy. We adopt a standard evaluation context with popular supervised methods for feature selection and weighting in a traditional bag-of-words model. Except for some unbalanced data contexts, our experiments indicated that ANN produce superior or at least comparable results to SVM's. Specially on the benchmark dataset of Movies reviews, ANN outperformed SVM by a statistically significant difference, even on the context of unbalanced data. Our results have also confirmed some potential limitations of both models, which have been rarely discussed in the sentiment classification literature, like the computational cost of SVM at the running time and ANN at the training time.
TL;DR: The SARIMA model coupled with a Kalman filter is the most accurate model; however, the proposed seasonal support vector regressor turns out to be highly competitive when performing forecasts during the most congested periods.
Abstract: The literature on short-term traffic flow forecasting has undergone great development recently. Many works, describing a wide variety of different approaches, which very often share similar features and ideas, have been published. However, publications presenting new prediction algorithms usually employ different settings, data sets, and performance measurements, making it difficult to infer a clear picture of the advantages and limitations of each model. The aim of this paper is twofold. First, we review existing approaches to short-term traffic flow forecasting methods under the common view of probabilistic graphical models, presenting an extensive experimental comparison, which proposes a common baseline for their performance analysis and provides the infrastructure to operate on a publicly available data set. Second, we present two new support vector regression models, which are specifically devised to benefit from typical traffic flow seasonality and are shown to represent an interesting compromise between prediction accuracy and computational efficiency. The SARIMA model coupled with a Kalman filter is the most accurate model; however, the proposed seasonal support vector regressor turns out to be highly competitive when performing forecasts during the most congested periods.
TL;DR: Investigating the Parkinson dataset using well-known machine learning tools, sustained vowels are found to carry more PD-discriminative information and representing the samples of a subject with central tendency and dispersion metrics improves generalization of the predictive model.
Abstract: There has been an increased interest in speech pattern analysis applications of Parkinsonism for building predictive telediagnosis and telemonitoring models. For this purpose, we have collected a wide variety of voice samples, including sustained vowels, words, and sentences compiled from a set of speaking exercises for people with Parkinson's disease. There are two main issues in learning from such a dataset that consists of multiple speech recordings per subject: 1) How predictive these various types, e.g., sustained vowels versus words, of voice samples are in Parkinson's disease (PD) diagnosis? 2) How well the central tendency and dispersion metrics serve as representatives of all sample recordings of a subject? In this paper, investigating our Parkinson dataset using well-known machine learning tools, as reported in the literature, sustained vowels are found to carry more PD-discriminative information. We have also found that rather than using each voice recording of each subject as an independent data sample, representing the samples of a subject with central tendency and dispersion metrics improves generalization of the predictive model.
TL;DR: Two distance-based classifiers, the k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers are considered, and a new metric learning approach is introduced for the latter, and an extension of the NCM classifier is introduced to allow for richer class representations.
Abstract: We study large-scale image classification methods that can incorporate new classes and training images continuously over time at negligible cost. To this end, we consider two distance-based classifiers, the k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers, and introduce a new metric learning approach for the latter. We also introduce an extension of the NCM classifier to allow for richer class representations. Experiments on the ImageNet 2010 challenge dataset, which contains over 106 training images of 1,000 classes, show that, surprisingly, the NCM classifier compares favorably to the more flexible k-NN classifier. Moreover, the NCM performance is comparable to that of linear SVMs which obtain current state-of-the-art performance. Experimentally, we study the generalization performance to classes that were not used to learn the metrics. Using a metric learned on 1,000 classes, we show results for the ImageNet-10K dataset which contains 10,000 classes, and obtain performance that is competitive with the current state-of-the-art while being orders of magnitude faster. Furthermore, we show how a zero-shot class prior based on the ImageNet hierarchy can improve performance when few training images are available.
TL;DR: This work proposes to learn more linearly separable and discriminative features from raw acoustic features and train linear SVMs, which are much easier and faster to train than kernel SVMs.
Abstract: Formulating speech separation as a binary classification problem has been shown to be effective. While good separation performance is achieved in matched test conditions using kernel support vector machines (SVMs), separation in unmatched conditions involving new speakers and environments remains a big challenge. A simple yet effective method to cope with the mismatch is to include many different acoustic conditions into the training set. However, large-scale training is almost intractable for kernel machines due to computational complexity. To enable training on relatively large datasets, we propose to learn more linearly separable and discriminative features from raw acoustic features and train linear SVMs, which are much easier and faster to train than kernel SVMs. For feature learning, we employ standard pre-trained deep neural networks (DNNs). The proposed DNN-SVM system is trained on a variety of acoustic conditions within a reasonable amount of time. Experiments on various test mixtures demonstrate good generalization to unseen speakers and background noises.
TL;DR: A novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy and validate the superiority of the SVM method compared to conventional machine learning methods.
TL;DR: D-ADMM is proven to converge when the network is bipartite or when all the functions are strongly convex, although in practice, convergence is observed even when these conditions are not met.
Abstract: We propose a distributed algorithm, named Distributed Alternating Direction Method of Multipliers (D-ADMM), for solving separable optimization problems in networks of interconnected nodes or agents. In a separable optimization problem there is a private cost function and a private constraint set at each node. The goal is to minimize the sum of all the cost functions, constraining the solution to be in the intersection of all the constraint sets. D-ADMM is proven to converge when the network is bipartite or when all the functions are strongly convex, although in practice, convergence is observed even when these conditions are not met. We use D-ADMM to solve the following problems from signal processing and control: average consensus, compressed sensing, and support vector machines. Our simulations show that D-ADMM requires less communications than state-of-the-art algorithms to achieve a given accuracy level. Algorithms with low communication requirements are important, for example, in sensor networks, where sensors are typically battery-operated and communicating is the most energy consuming operation.
TL;DR: A new kernel is derived by establishing a connection with the Riemannian geometry of symmetric positive definite matrices, effectively replacing the traditional spatial filtering approach for motor imagery EEG-based classification in brain-computer interface applications.
TL;DR: A forecasting model based on chaotic mapping, firefly algorithm, and support vector regression (SVR) is proposed to predict stock market price and performs best based on two error measures, namely mean squared error (MSE) and mean absolute percent error (MAPE).
Abstract: Due to the inherent non-linearity and non-stationary characteristics of financial stock market price time series, conventional modeling techniques such as the Box-Jenkins autoregressive integrated moving average (ARIMA) are not adequate for stock market price forecasting. In this paper, a forecasting model based on chaotic mapping, firefly algorithm, and support vector regression (SVR) is proposed to predict stock market price. The forecasting model has three stages. In the first stage, a delay coordinate embedding method is used to reconstruct unseen phase space dynamics. In the second stage, a chaotic firefly algorithm is employed to optimize SVR hyperparameters. Finally in the third stage, the optimized SVR is used to forecast stock market price. The significance of the proposed algorithm is 3-fold. First, it integrates both chaos theory and the firefly algorithm to optimize SVR hyperparameters, whereas previous studies employ a genetic algorithm (GA) to optimize these parameters. Second, it uses a delay coordinate embedding method to reconstruct phase space dynamics. Third, it has high prediction accuracy due to its implementation of structural risk minimization (SRM). To show the applicability and superiority of the proposed algorithm, we selected the three most challenging stock market time series data from NASDAQ historical quotes, namely Intel, National Bank shares and Microsoft daily closed (last) stock price, and applied the proposed algorithm to these data. Compared with genetic algorithm-based SVR (SVR-GA), chaotic genetic algorithm-based SVR (SVR-CGA), firefly-based SVR (SVR-FA), artificial neural networks (ANNs) and adaptive neuro-fuzzy inference systems (ANFIS), the proposed model performs best based on two error measures, namely mean squared error (MSE) and mean absolute percent error (MAPE).
TL;DR: A new multifeature model, aiming to construct a support vector machine (SVM) ensemble combining multiple spectral and spatial features at both pixel and object levels is proposed, which provides more accurate classification results compared to the voting and probabilistic models.
Abstract: In recent years, the resolution of remotely sensed imagery has become increasingly high in both the spectral and spatial domains, which simultaneously provides more plentiful spectral and spatial information. Accordingly, the accurate interpretation of high-resolution imagery depends on effective integration of the spectral, structural and semantic features contained in the images. In this paper, we propose a new multifeature model, aiming to construct a support vector machine (SVM) ensemble combining multiple spectral and spatial features at both pixel and object levels. The features employed in this study include a gray-level co-occurrence matrix, differential morphological profiles, and an urban complexity index. Subsequently, three algorithms are proposed to integrate the multifeature SVMs: certainty voting, probabilistic fusion, and an object-based semantic approach, respectively. The proposed algorithms are compared with other multifeature SVM methods including the vector stacking, feature selection, and composite kernels. Experiments are conducted on the hyperspectral digital imagery collection experiment DC Mall data set and two WorldView-2 data sets. It is found that the multifeature model with semantic-based postprocessing provides more accurate classification results (an accuracy improvement of 1-4% for the three experimental data sets) compared to the voting and probabilistic models.
TL;DR: Two important improvements to the SVR based load forecasting method are introduced, i.e., procedure for generation of model inputs and subsequent model input selection using feature selection algorithms and the use of the particle swarm global optimization based technique for the optimization of SVR hyper-parameters reduces the operator interaction.
Abstract: This paper presents a generic strategy for short-term load forecasting (STLF) based on the support vector regression machines (SVR). Two important improvements to the SVR based load forecasting method are introduced, i.e., procedure for generation of model inputs and subsequent model input selection using feature selection algorithms. One of the objectives of the proposed strategy is to reduce the operator interaction in the model-building procedure. The proposed use of feature selection algorithms for automatic model input selection and the use of the particle swarm global optimization based technique for the optimization of SVR hyper-parameters reduces the operator interaction. To confirm the effectiveness of the proposed modeling strategy, the model has been trained and tested on two publicly available and well-known load forecasting data sets and compared to the state-of-the-art STLF algorithms yielding improved accuracy.
TL;DR: This work applies two modifications in order to make one-class SVMs more suitable for unsupervised anomaly detection: Robust one- Class SVMs and eta one- class SVMs, with the key idea, that outliers should contribute less to the decision boundary as normal instances.
Abstract: Support Vector Machines (SVMs) have been one of the most successful machine learning techniques for the past decade. For anomaly detection, also a semi-supervised variant, the one-class SVM, exists. Here, only normal data is required for training before anomalies can be detected. In theory, the one-class SVM could also be used in an unsupervised anomaly detection setup, where no prior training is conducted. Unfortunately, it turns out that a one-class SVM is sensitive to outliers in the data. In this work, we apply two modifications in order to make one-class SVMs more suitable for unsupervised anomaly detection: Robust one-class SVMs and eta one-class SVMs. The key idea of both modifications is, that outliers should contribute less to the decision boundary as normal instances. Experiments performed on datasets from UCI machine learning repository show that our modifications are very promising: Comparing with other standard unsupervised anomaly detection algorithms, the enhanced one-class SVMs are superior on two out of four datasets. In particular, the proposed eta one-class SVM has shown the most promising results.
TL;DR: A new visualization approach based on a Sensitivity Analysis (SA) to extract human understandable knowledge from supervised learning black box data mining models, such as Neural Networks, Support Vector Machines and ensembles, including Random Forests.
TL;DR: A novel approach based on Support Vector Machine and Bayesian filtering is proposed for online lane change intention prediction that is able to predict driver intention to change lanes on average 1.3 seconds in advance, with a maximum prediction horizon of 3.29 seconds.
Abstract: Predicting driver behavior is a key component for Advanced Driver Assistance Systems (ADAS). In this paper, a novel approach based on Support Vector Machine and Bayesian filtering is proposed for online lane change intention prediction. The approach uses the multiclass probabilistic outputs of the Support Vector Machine as an input to the Bayesian filter, and the output of the Bayesian filter is used for the final prediction of lane changes. A lane tracker integrated in a passenger vehicle is used for real-world data collection for the purpose of training and testing. Data from different drivers on different highways were used to evaluate the robustness of the approach. The results demonstrate that the proposed approach is able to predict driver intention to change lanes on average 1.3 seconds in advance, with a maximum prediction horizon of 3.29 seconds.
TL;DR: Novel cooperative spectrum sensing algorithms for cognitive radio (CR) networks based on machine learning techniques which are used for pattern classification outperform the existing state-of-the-art CSS techniques.
Abstract: We propose novel cooperative spectrum sensing (CSS) algorithms for cognitive radio (CR) networks based on machine learning techniques which are used for pattern classification. In this regard, unsupervised (e.g., K-means clustering and Gaussian mixture model (GMM)) and supervised (e.g., support vector machine (SVM) and weighted K-nearest-neighbor (KNN)) learning-based classification techniques are implemented for CSS. For a radio channel, the vector of the energy levels estimated at CR devices is treated as a feature vector and fed into a classifier to decide whether the channel is available or not. The classifier categorizes each feature vector into either of the two classes, namely, the "channel available class" and the "channel unavailable class". Prior to the online classification, the classifier needs to go through a training phase. For classification, the K-means clustering algorithm partitions the training feature vectors into K clusters, where each cluster corresponds to a combined state of primary users (PUs) and then the classifier determines the class the test energy vector belongs to. The GMM obtains a mixture of Gaussian density functions that well describes the training feature vectors. In the case of the SVM, the support vectors (i.e., a subset of training vectors which fully specify the decision function) are obtained by maximizing the margin between the separating hyperplane and the training feature vectors. Furthermore, the weighted KNN classification technique is proposed for CSS for which the weight of each feature vector is calculated by evaluating the area under the receiver operating characteristic (ROC) curve of that feature vector. The performance of each classification technique is quantified in terms of the average training time, the sample classification delay, and the ROC curve. Our comparative results clearly reveal that the proposed algorithms outperform the existing state-of-the-art CSS techniques.
TL;DR: This work proposes Unbiased Metric Learning (UML), a metric learning approach that learns a set of less biased candidate distance metrics on training examples from multiple biased datasets, based on structural SVM.
Abstract: Many standard computer vision datasets exhibit biases due to a variety of sources including illumination condition, imaging system, and preference of dataset collectors. Biases like these can have downstream effects in the use of vision datasets in the construction of generalizable techniques, especially for the goal of the creation of a classification system capable of generalizing to unseen and novel datasets. In this work we propose Unbiased Metric Learning (UML), a metric learning approach, to achieve this goal. UML operates in the following two steps: (1) By varying hyper parameters, it learns a set of less biased candidate distance metrics on training examples from multiple biased datasets. The key idea is to learn a neighborhood for each example, which consists of not only examples of the same category from the same dataset, but those from other datasets. The learning framework is based on structural SVM. (2) We do model validation on a set of weakly-labeled web images retrieved by issuing class labels as keywords to search engine. The metric with best validation performance is selected. Although the web images sometimes have noisy labels, they often tend to be less biased, which makes them suitable for the validation set in our task. Cross-dataset image classification experiments are carried out. Results show significant performance improvement on four well-known computer vision datasets.
TL;DR: In this paper, a support vector machine (SVM) was used to estimate the state of charge (SOC) of a high capacity LiFeMnPO4 battery cell from an experimental dataset using a SVM approach.
Abstract: The aim of this study is to estimate the state of charge (SOC) of a high-capacity lithium iron manganese phosphate (LiFeMnPO4) battery cell from an experimental dataset using a support vector machine (SVM) approach. SVM is a type of learning machine based on statistical learning theory. Many applications require accurate measurement of battery SOC in order to give users an indication of available runtime. It is particularly important for electric vehicles or portable devices. In this paper, the proposed SOC estimator extracts model parameters from battery charging/discharging testing cycles, using cell current, cell voltage, and cell temperature as independent variables. Tests are carried out on a 60 Ah lithium-ion cell with the dynamic stress test cycle to set up the SVM model. The SVM SOC estimator maintains a high level of accuracy, better than 6% over all ranges of operation, whether the battery is charged/discharged at constant current or it is operating in a variable current profile.
TL;DR: A set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition are proposed, and a new descriptor for spatio-temporal feature extraction from color and depth images is introduced.
Abstract: We propose a set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition. The descriptors proposed are easy to implement, produce relatively small-sized feature sets, and the multi-class classification scheme is fast and suitable for real-time applications. We intuitively characterize actions using pairwise affinities between view-invariant joint angles features over the performance of an action. Additionally, a new descriptor for spatio-temporal feature extraction from color and depth images is introduced. This descriptor involves an application of a modified histogram of oriented gradients (HOG) algorithm. The application produces a feature set at every frame, and these features are collected into a 2D array which then the same algorithm is applied to again (the approach is termed HOG2). Both feature sets are evaluated in a bag-of-words scheme using a linear SVM, showing state-of-the-art results on public datasets from different domains of human-computer interaction.
TL;DR: A framework to classify time series based on a bag-of-features representation (TSBF) that provides a feature-based approach that can handle warping (although differently from DTW), and experimental results show that TSBF provides better results than competitive methods on benchmark datasets from the UCR time series database.
Abstract: Time series classification is an important task with many challenging applications. A nearest neighbor (NN) classifier with dynamic time warping (DTW) distance is a strong solution in this context. On the other hand, feature-based approaches have been proposed as both classifiers and to provide insight into the series, but these approaches have problems handling translations and dilations in local patterns. Considering these shortcomings, we present a framework to classify time series based on a bag-of-features representation (TSBF). Multiple subsequences selected from random locations and of random lengths are partitioned into shorter intervals to capture the local information. Consequently, features computed from these subsequences measure properties at different locations and dilations when viewed from the original series. This provides a feature-based approach that can handle warping (although differently from DTW). Moreover, a supervised learner (that handles mixed data types, different units, etc.) integrates location information into a compact codebook through class probability estimates. Additionally, relevant global features can easily supplement the codebook. TSBF is compared to NN classifiers and other alternatives (bag-of-words strategies, sparse spatial sample kernels, shapelets). Our experimental results show that TSBF provides better results than competitive methods on benchmark datasets from the UCR time series database.
TL;DR: In this article, a non-sparse matrix simulation technique is used to efficiently perform principal component analysis and matrix inversion of the training data kernel matrix, with exponential speedups in the size of the variables and the number of training data examples.
Abstract: Supervised machine learning is the classification of new data based on
already classified training examples. In this work, we show that the support
vector machine, an optimized linear and non-linear binary classifier, can be
implemented on a quantum computer, with exponential speedups in the size of the
vectors and the number of training data examples. At the core of the algorithm
is a non-sparse matrix simulation technique to efficiently perform a principal
component analysis and matrix inversion of the training data kernel matrix. We
thus provide an example of a quantum big feature and big data algorithm and
pave the way for future developments at the intersection of quantum computing
and machine learning.
TL;DR: It is shown how normal operations of power networks can be statistically distinguished from the case under stealthy attacks, and two machine-learning-based techniques for stealthy attack detection are proposed.
Abstract: Aging power industries together with increase in the demand from industrial and residential customers are the main incentive for policy makers to define a road map to the next generation power system called smart grid. In smart grid, the overall monitoring costs will be decreased but at the same time, the risk of cyber attacks might be increased. Recently a new type of attacks (called the stealth attack) has been introduced, which cannot be detected by the bad data detection using state estimation. In this paper, we show how normal operations of power networks can be statistically distinguished from the case under stealthy attacks. We devise two machine learning based techniques for stealthy attack detection. The first method utilizes supervised learning over labeled data and trains a support vector machine. The second method requires no training data and detects the deviation in measurement In both methods, principle component analysis is used to reduce the dimensionality of the data to be processed, and thus leads to lower computation complexities. The results of the proposed detection methods on the IEEE standard test systems demonstrate effectiveness of both schemes.
TL;DR: A transductive learning method is introduced, which is referred to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases and achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject.
Abstract: Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. While a possible solution would be to train person-specific classifiers, that often is neither feasible nor theoretically compelling. The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+, GEMEP-FERA and RU-FACS. STM outperformed generic classifiers in all.
TL;DR: High-resolution aerial sensing has good prospect for the detection of HLB-infected trees and among the tested classification algorithms, support vector machine (SVM) with kernel resulted in better performance than other methods such as SVM (linear), linear discriminant analysis and quadratic discriminantAnalysis.
TL;DR: A novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences is presented.
Abstract: Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.
TL;DR: A new robust twin support vector machine (called R-TWSVM) via second order cone programming formulations for classification, which can deal with data with measurement noise efficiently and successfully overcomes the existing shortcomings of TWSVM is proposed.
TL;DR: Results suggest that RF may be a promising pattern recognition method for E-tongue data processing, because it can deal with classification problems of unbalanced, multiclass and small sample data without data preprocessing procedures.
Abstract: Random forest (RF) has been proposed on the basis of classification and regression trees (CART) with “ensemble learning” strategy by Breiman in 2001. In this paper, RF is introduced and investigated for electronic tongue (E-tongue) data processing. The experiments were designed for type and brand recognition of orange beverage and Chinese vinegar by an E-tongue with seven potentiometric sensors and an Ag/AgCl reference electrode. Principal component analysis (PCA) was used to visualize the distribution of total samples of each data set. Back propagation neural network (BPNN) and support vector machine (SVM), as comparative methods, were also employed to deal with four data sets. Five-fold cross-validation (CV) with twenty replications was applied during modeling and an external testing set was employed to validate the prediction performance of models. The average correct rates (CR) on CV sets of the four data sets performed by BPNN, SVM and RF were 86.68%, 66.45% and 99.07%, respectively. RF has been proved to outperform BPNN and SVM, and has some advantages in such cases, because it can deal with classification problems of unbalanced, multiclass and small sample data without data preprocessing procedures. These results suggest that RF may be a promising pattern recognition method for E-tongues.