TL;DR: These experiments indicate that the “one-against-one” and DAG methods are more suitable for practical use than the other methods, and show that for large problems methods by considering all data at once in general need fewer support vectors.
Abstract: Support vector machines (SVM) were originally designed for binary classification How to effectively extend it for multi-class classification is still an on-going research issue Several methods have been proposed where typically we construct a multi-class classifier by combining several binary classifiers Some authors also proposed methods that consider all classes at once As it is computationally more expensive to solve multiclass problems, comparisons of these methods using large-scale problems have not been seriously conducted Especially for methods solving multi-class SVM in one step, a much larger optimization problem is required so up to now experiments are limited to small data sets In this paper we give decomposition implementations for two such “all-together” methods: [25], [27] and [7] We then compare their performance with three methods based on binary classifications: “one-against-all,” “one-against-one,” and DAGSVM [23] Our experiments indicate that the “one-against-one” and DAG methods are more suitable for practical use than the other methods Results also show that for large problems methods by considering all data at once in general need fewer support vectors
TL;DR: This framework uses a zero-masking strategy for data fusion to extract complementary information from multiple data modalities to aid the diagnosis of AD and has the potential to require less labeled data.
Abstract: The accurate diagnosis of Alzheimer's disease (AD) is essential for patient care and will be increasingly important as disease modifying agents become available, early in the course of the disease. Although studies have applied machine learning methods for the computer-aided diagnosis of AD, a bottleneck in the diagnostic performance was shown in previous methods, due to the lacking of efficient strategies for representing neuroimaging biomarkers. In this study, we designed a novel diagnostic framework with deep learning architecture to aid the diagnosis of AD. This framework uses a zero-masking strategy for data fusion to extract complementary information from multiple data modalities. Compared to the previous state-of-the-art workflows, our method is capable of fusing multimodal neuroimaging features in one setting and has the potential to require less labeled data. A performance gain was achieved in both binary classification and multiclass classification of AD. The advantages and limitations of the proposed framework are discussed.
TL;DR: A Near-Bayesian Support Vector Machine (NBSVM) is proposed for such imbalanced classification problems, by combining the philosophies of decision boundary shift and unequal regularization costs.
TL;DR: A stacked ELMs (S-ELMs) that is specially designed for solving large and complex data problems and can achieve much better testing accuracy than SVM and slightly better accuracy than deep belief network (DBN) with much faster training speed is proposed.
Abstract: Extreme learning machine (ELM) has recently attracted many researchers’ interest due to its very fast learning speed, good generalization ability, and ease of implementation. It provides a unified solution that can be used directly to solve regression, binary, and multiclass classification problems. In this paper, we propose a stacked ELMs (S-ELMs) that is specially designed for solving large and complex data problems. The S-ELMs divides a single large ELM network into multiple stacked small ELMs which are serially connected. The S-ELMs can approximate a very large ELM network with small memory requirement. To further improve the testing accuracy on big data problems, the ELM autoencoder can be implemented during each iteration of the S-ELMs algorithm. The simulation results show that the S-ELMs even with random hidden nodes can achieve similar testing accuracy to support vector machine (SVM) while having low memory requirements. With the help of ELM autoencoder, the S-ELMs can achieve much better testing accuracy than SVM and slightly better accuracy than deep belief network (DBN) with much faster training speed.
TL;DR: A comparative analysis of these multi-classifiers in terms of their advantages, disadvantages and computational complexity is performed.
Abstract: Least Squares Twin Support Vector Machine (LSTSVM) is a binary classifier and the extension of it to multiclass is still an ongoing research issue. In this paper, we extended the formulation of binary LSTSVM classifier to multi-class by using the concepts such as "One-versus-All", "One-versus-One", "All-versus-One" and Directed Acyclic Graph (DAG). This paper performs a comparative analysis of these multi-classifiers in terms of their advantages, disadvantages and computational complexity. The performance of all the four proposed classifiers has been validated on twelve benchmark datasets by using predictive accuracy and training-testing time. All the proposed multi-classifiers have shown better performance as compared to the typical multi-classifiers based on 'Support Vector Machine' and 'Twin Support Vector Machine'. Friedman's statistic and Nemenyi post hoc tests are also used to test significance of predictive accuracy differences between classifiers.
TL;DR: A novel zero-shot classification approach that automatically learns label embeddings from the input data in a semi-supervised large-margin learning framework that tackles the target prediction problem directly without introducing intermediate prediction problems.
Abstract: Given the challenge of gathering labeled training data, zero-shot classification, which transfers information from observed classes to recognize unseen classes, has become increasingly popular in the computer vision community. Most existing zero-shot learning methods require a user to first provide a set of semantic visual attributes for each class as side information before applying a two-step prediction procedure that introduces an intermediate attribute prediction problem. In this paper, we propose a novel zero-shot classification approach that automatically learns label embeddings from the input data in a semi-supervised large-margin learning framework. The proposed framework jointly considers multi-class classification over all classes (observed and unseen) and tackles the target prediction problem directly without introducing intermediate prediction problems. It also has the capacity to incorporate semantic label information from different sources when available. To evaluate the proposed approach, we conduct experiments on standard zero-shot data sets. The empirical results show the proposed approach outperforms existing state-of-the-art zero-shot learning methods.
TL;DR: An adapted state-of-the-art method of processing information known as Reservoir Computing is used to show its utility on the open and time-consuming problem of heartbeat classification, leading to a fast algorithm and approaching a real-time classification solution.
Abstract: An adapted state-of-the-art method of processing information known as Reservoir Computing is used to show its utility on the open and time-consuming problem of heartbeat classification. The MIT-BIH arrhythmia database is used following the guidelines of the Association for the Advancement of Medical Instrumentation. Our approach requires a computationally inexpensive preprocessing of the electrocardiographic signal leading to a fast algorithm and approaching a real-time classification solution. Our multiclass classification results indicate an average specificity of 97.75% with an average accuracy of 98.43%. Sensitivity and positive predicted value show an average of 84.83% and 88.75%, respectively, what makes our approach significant for its use in a clinical context.
TL;DR: In this article, a multimodal task-driven dictionary learning algorithm under the joint sparsity constraint (prior) was proposed to enforce collaborations among multiple homogeneous/heterogeneous sources of information.
Abstract: Dictionary learning algorithms have been successfully used for both reconstructive and discriminative tasks, where an input signal is represented with a sparse linear combination of dictionary atoms. While these methods are mostly developed for single-modality scenarios, recent studies have demonstrated the advantages of feature-level fusion based on the joint sparse representation of the multimodal inputs. In this paper, we propose a multimodal task-driven dictionary learning algorithm under the joint sparsity constraint (prior) to enforce collaborations among multiple homogeneous/heterogeneous sources of information. In this task-driven formulation, the multimodal dictionaries are learned simultaneously with their corresponding classifiers. The resulting multimodal dictionaries can generate discriminative latent features (sparse codes) from the data that are optimized for a given task such as binary or multiclass classification. Moreover, we present an extension of the proposed formulation using a mixed joint and independent sparsity prior which facilitates more flexible fusion of the modalities at feature level. The efficacy of the proposed algorithms for multimodal classification is illustrated on four different applications -- multimodal face recognition, multi-view face recognition, multi-view action recognition, and multimodal biometric recognition. It is also shown that, compared to the counterpart reconstructive-based dictionary learning algorithms, the task-driven formulations are more computationally efficient in the sense that they can be equipped with more compact dictionaries and still achieve superior performance.
TL;DR: The novel Multiple Adaptive Reduced Kernel Extreme Learning Machine (MARK-ELM) is introduced which combines Multiple Kernel Boosting and Multiclass KELM to Network Intrusion Detection to improve the efficacy of network intrusion on data that contains instances of multiple classes of attacks.
Abstract: Apply Multiple Kernel Boosting and Multiclass KELM to Network Intrusion Detection.Tested approach on several machine learning datasets and the KDD Cup 99 dataset.Utilized Fractional Polynomial Kernels for the Network ID problem for the first time.Requires no feature selection, minimal pre-processing and works on imbalanced data.Achieves superior detection rates and lower false alarm rates than other approaches. Detection of cyber-based attacks on computer networks continues to be a relevant and challenging area of research. Daily reports of incidents appear in public media including major ex-filtrations of data for the purposes of stealing identities, credit card numbers, and intellectual property as well as to take control of network resources. Methods used by attackers constantly change in order to defeat techniques employed by information technology (IT) teams intended to discover or block intrusions. "Zero Day" attacks whose "signatures" are not yet in IT databases are continually being uncovered. Machine learning approaches have been widely used to increase the effectiveness of intrusion detection platforms. While some machine learning techniques are effective at detecting certain types of attacks, there are no known methods that can be applied universally and achieve consistent results for multiple attack types. The focus of our research is the development of a framework that combines the outputs of multiple learners in order to improve the efficacy of network intrusion on data that contains instances of multiple classes of attacks. We have chosen the Extreme Learning Machine (ELM) as the core learning algorithm due to recent research that suggests that ELMs are straightforward to implement, computationally efficient and have excellent learning performance characteristics on par with the Support Vector Machine (SVM), one of the most widely used and best performing machine learning platforms (Liu, Gao, & Li, 2012). We introduce the novel Multiple Adaptive Reduced Kernel Extreme Learning Machine (MARK-ELM) which combines Multiple Kernel Boosting (Xia & Hoi, 2013) with the Multiple Classification Reduced Kernel ELM (Deng, Zheng, & Zhang, 2013). We tested this approach on several machine learning datasets as well as the KDD Cup 99 (Hettich & Bay, 1999) intrusion detection dataset. Our results indicate that MARK-ELM works well for the majority of University of California, Irvine (UCI) Machine Learning Repository small datasets and is scalable for larger datasets. For UCI datasets we achieved performance similar to the MKBoost Support Vector Machine (SVM) approach. In our experiments we demonstrate that MARK-ELM achieves superior detection rates and much lower false alarm rates than other approaches on intrusion detection data.
TL;DR: This paper presents a multi-class classification model that is based on active learning and support vector machines (MC_SVMA), which can be used to address unlabeled data, and demonstrates that the model is efficient and exhibits good generalization performance.
TL;DR: A distance-based combination strategy, which weights the competence of the outputs of the base classifiers depending on the closeness of the query instance to each one of the classes, to reduce the effect of the non-competent classifiers in One-vs-One strategy.
TL;DR: A least squares version of Twin K-class support vector machine called as LST-KSVC is proposed, which has comparable accuracy in classification to that of Twin- KSVC but with remarkably less computational time.
TL;DR: A semi-supervised classification algorithm whereby the model is gradually enhanced with unlabeled data collected online and a processing stage is introduced before classification to adaptively reduce the small fluctuations between the features from training and evaluation sessions.
TL;DR: Experimental results show that M3GP can automatically determine a good value for \(d\) depending on the problem, and achieves excellent performance when compared to state-of-the-art-methods like Random Forests, Random Subspaces and Multilayer Perceptron on several benchmark and real-world problems.
Abstract: Data classification is one of the most ubiquitous machine learning tasks in science and engineering. However, Genetic Programming is still not a popular classification methodology, partially due to its poor performance in multiclass problems. The recently proposed M2GP - Multidimensional Multiclass Genetic Programming algorithm achieved promising results in this area, by evolving mappings of the \(p\)-dimensional data into a \(d\)-dimensional space, and applying a minimum Mahalanobis distance classifier. Despite good performance, M2GP employs a greedy strategy to set the number of dimensions \(d\) for the transformed data, and fixes it at the start of the search, an approach that is prone to locally optimal solutions. This work presents the M3GP algorithm, that stands for M2GP with multidimensional populations. M3GP extends M2GP by allowing the search process to progressively search for the optimal number of new dimensions \(d\) that maximize the classification accuracy. Experimental results show that M3GP can automatically determine a good value for \(d\) depending on the problem, and achieves excellent performance when compared to state-of-the-art-methods like Random Forests, Random Subspaces and Multilayer Perceptron on several benchmark and real-world problems.
TL;DR: A multi-class discriminating algorithm based on the fusion of interval type-2 fuzzy logic and ANFIS to improve uncertainty handling and the result shows the competitiveness of this algorithm over other standard ones in the domain of non-stationary and uncertain signal data classification.
TL;DR: The firefly algorithm is employed to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier, called the firefly-based SVM (firefly-SVM).
Abstract: The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.
TL;DR: The mldr package aims to provide the user with the functions needed to perform exploratory analysis of MLDs, determining their main traits both statistically and visually, and brings the proper tools to manipulate this kind of datasets, including the application of the most common transformation methods.
Abstract: Most classification algorithms deal with datasets which have a set of input features, the variables to be used as predictors, and only one output class, the variable to be predicted. However, in late years many scenarios in which the classifier has to work with several outputs have come to life. Automatic labeling of text documents, image annotation or protein classification are among them. Multilabel datasets are the product of these new needs, and they have many specific traits. The mldr package allows the user to load datasets of this kind, obtain their characteristics, produce specialized plots, and manipulate them. The goal is to provide the exploratory tools needed to analyze multilabel datasets, as well as the transformation and manipulation functions that will make possible to apply binary and multiclass classification models to this data or the development of new multilabel classifiers. Thanks to its integrated user interface, the exploratory functions will be available even to non-specialized R users. The mldr package aims to provide the user with the functions needed to perform exploratory analysis of MLDs, determining their main traits both statistically and visually. Moreover, it also brings the proper tools to manipulate this kind of datasets, including the application of the most common transformation methods, BR (Binary Relevance) and LP (Label Powerset), that will be described in the following section. These would be the foundation for processing MLDs with traditional classifiers, as well as for developing new multilabel algorithms. The mldr package does not depend on the RWeka package, and it is not linked to MULAN nor MEKA. It has been designed to allow reading both MULAN and MEKA MLDs, but without any external dependencies. In fact, it would be possible to load MLDs stored in other file formats, as well as creating them from scratch. When loaded, MLDs are wrapped in an S3 type object with class "mldr", which allows for the use of methods. The object will contain the data in the MLD and also a large set of measures obtained from it. The functions provided by the package ease the access to
TL;DR: It is shown, backed-up by thorough statistical analysis, that one-class decomposition is a worthwhile approach, especially in case of problems with complex distribution and a large number of classes.
TL;DR: This paper presents new consistent algorithms for multiclass learning with complex performance measures, defined by arbitrary functions of the confusion matrix, and gives two specific instantiations based on the Frank-Wolfe method for concave performance measures and on the bisection method for ratio-of-linear performance measures.
Abstract: This paper presents new consistent algorithms for multiclass learning with complex performance measures, defined by arbitrary functions of the confusion matrix. This setting includes as a special case all loss-based performance measures, which are simply linear functions of the confusion matrix, but also includes more complex performance measures such as the multiclass G-mean and micro F1 measures. We give a general framework for designing consistent algorithms for such performance measures by viewing the learning problem as an optimization problem over the set of feasible confusion matrices, and give two specific instantiations based on the Frank-Wolfe method for concave performance measures and on the bisection method for ratio-of-linear performance measures. The resulting algorithms are provably consistent and outperform a multiclass version of the state-of-the-art SVMperf method in experiments; for large multiclass problems, the algorithms are also orders of magnitude faster than SVMperf.
TL;DR: It is demonstrated that under favorable conditions, this work can construct logarithmic depth trees that have leaves with low label entropy, and a new objective function is formulated, which is optimized at each node of the tree and creates dynamic partitions of the data.
Abstract: We study the problem of multiclass classification with an extremely large number of classes (k), with the goal of obtaining train and test time complexity logarithmic in the number of classes. We develop top-down tree construction approaches for constructing logarithmic depth trees. On the theoretical front, we formulate a new objective function, which is optimized at each node of the tree and creates dynamic partitions of the data which are both pure (in terms of class labels) and balanced. We demonstrate that under favorable conditions, we can construct logarithmic depth trees that have leaves with low label entropy. However, the objective function at the nodes is challenging to optimize computationally. We address the empirical problem with a new online decision tree construction procedure. Experiments demonstrate that this online algorithm quickly achieves improvement in test error compared to more common logarithmic training time approaches, which makes it a plausible method in computationally constrained large-k applications.
TL;DR: This paper proposes a novel fast feature selection method based on multiple SVDD and applies it to multi-class microarray data that is faster and more effective than other methods.
TL;DR: This paper uses multiple minimum supports to modify CBS algorithm in order to improve the performance of weather forecasting and shows that the covacc parameter of the modified CBS algorithm is better than the three common algorithms.
Abstract: Weather forecast is one of focuses in data mining which uses meteorological data for its process. As the common technique
used in forecasting weather is sequential pattern, several algorithms have been developed by scholars. The common
algorithms used in forecasting weather are: CBS algorithm, CBS algorithm using FEAT and CBS algorithm using FSGP.
Previous studies remark the weaknesses of these three algorithms especially related to classifying weather with more than
one class. In this paper, we use multiple minimum supports to modify CBS algorithm in order to improve the performance
of weather forecasting. The result shows that making use multiple minimum supports to the three algorithms, the three
modified algorithms are able to classify the weather with six categories from a given minimum support. In addition,
the simulation result shows that the covacc parameter of the modified CBS algorithm is better than the three common
algorithms.
TL;DR: A genetic programming (GP) descriptor is proposed for the task of multiclass texture classification that synthesises a set of mathematical formulas relying on the raw pixel values and a sliding window of a predetermined size that has the potential to effectively discriminate between instances of different textures.
Abstract: Texture classification is an essential task in computer vision that aims at grouping instances that have a similar repetitive pattern into one group. Detecting texture primitives can be used to discriminate between materials of different types. The process of detecting prominent features from the texture instances represents a cornerstone step in texture classification. Moreover, building a good model using a few training instances is difficult. In this study, a genetic programming (GP) descriptor is proposed for the task of multiclass texture classification. The proposed method synthesises a set of mathematical formulas relying on the raw pixel values and a sliding window of a predetermined size. Furthermore, only two instances per class are used to automatically evolve a descriptor that has the potential to effectively discriminate between instances of different textures using a simple instance-based classifier to perform the classification task. The performance of the proposed approach is examined using two widely-used data sets, and compared with two GP-based and nine well-known non-GP methods. Furthermore, three hand-crafted domain-expert designed feature extraction methods have been used with the non-GP methods to examine the effectiveness of the proposed method. The results show that the proposed method has significantly outperformed all these other methods on both data sets, and the new method evolves a descriptor that is capable of achieving significantly better performance compared to hand-crafted features.
TL;DR: The proposed method outperforms the methods that are based on single one-class classification algorithms with statistical significance and demonstrates the selective utilization of base classifiers by adopting a stepwise variable selection procedure during stacking.
TL;DR: The classification results of using pathway activity derived from the proposed method show high classification power in three-fold cross-validation and robustness in across dataset validation for all four lung cancer datasets used.
TL;DR: In this article, a semi-supervised max-margin learning framework was proposed for zero-shot multi-class classification, which integrates the semi supervised classification problem over observed classes and the unsupervised cluster-ing problem over unseen classes together.
Abstract: Due to the dramatic expanse of data cat-egories and the lack of labeled instances, zero-shot learning, which transfers knowledge from observed classes to recognize unseen classes, has started drawing a lot of attention from the research community. In this paper, we propose a semi-supervised max-margin learning framework that integrates the semi-supervised classification problem over ob-served classes and the unsupervised cluster-ing problem over unseen classes together to tackle zero-shot multi-class classification. By further integrating label embedding into this framework, we produce a dual formulation that permits convenient incorporation of aux-iliary label semantic knowledge to improve zero-shot learning. We conduct extensive ex-periments on three standard image data sets to evaluate the proposed approach by com-paring to two state-of-the-art methods. Our results demonstrate the efficacy of the pro-posed framework.
TL;DR: A semi-supervised max-margin learning framework that integrates the semisupervised classification problem over observed classes and the unsupervised clustering problem over unseen classes together to tackle zero-shot multi-class classification is proposed.
Abstract: Due to the dramatic expanse of data categories and the lack of labeled instances, zero-shot learning, which transfers knowledge from observed classes to recognize unseen classes, has started drawing a lot of attention from the research community. In this paper, we propose a semi-supervised max-margin learning framework that integrates the semisupervised classification problem over observed classes and the unsupervised clustering problem over unseen classes together to tackle zero-shot multi-class classification. By further integrating label embedding into this framework, we produce a dual formulation that permits convenient incorporation of auxiliary label semantic knowledge to improve zero-shot learning. We conduct extensive experiments on three standard image data sets to evaluate the proposed approach by comparing to two state-of-the-art methods. Our results demonstrate the efficacy of the proposed framework.
TL;DR: This paper proposes a scheme to improve the performance of noise filters in multi-class classification problems, based on decomposing the dataset into multiple binary subproblems, and adapts the principles of the One-vs-One decomposition strategy to noise filtering, making the noise identification process simpler.
Abstract: Noise filters are preprocessing techniques designed to improve data quality in classification tasks by detecting and eliminating examples that contain errors or noise. However, filtering can also remove correct examples and examples containing valuable information, which could be useful for learning. This fact usually implies a margin of improvement on the noise detection accuracy for almost any noise filter. This paper proposes a scheme to improve the performance of noise filters in multi-class classification problems, based on decomposing the dataset into multiple binary subproblems. Decomposition strategies have proven to be successful in improving classification performance in multi-class problems by generating simpler binary subproblems. Similarly, we adapt the principles of the One-vs-One decomposition strategy to noise filtering, making the noise identification process simpler. In order to integrate the filtering results achieved in the binary subproblems, our proposal uses a soft voting approach considering a reliability level based on the aggregation of the noise degree prediction calculated for each binary classifier. The experimental results show that the One-vs-One decomposition strategy usually increases the performance of the noise filters studied, which can detect more accurately the noisy examples.
TL;DR: This paper proposes a novel approach to active learning designed for one-class classification that does not rely on many of the inappropriate assumptions of its predecessors and leads to more robust classification performance.
Abstract: Active learning is a common solution for reducing labeling costs and maximizing the impact of human labeling efforts in binary and multi-class classification settings. However, when we are faced with extreme levels of class imbalance, a situation in which it is not safe to assume that we have a representative sample of the minority class, it has been shown effective to replace the binary classifiers with a one-class classifiers. In such a setting, traditional active learning methods, and many previously proposed in the literature for one-class classifiers, prove to be inappropriate, as they rely on assumptions about the data that no longer stand. In this paper, we propose a novel approach to active learning designed for one-class classification. The proposed method does not rely on many of the inappropriate assumptions of its predecessors and leads to more robust classification performance. The gist of this method consists of labeling, in priority, the instances considered to fit the learned class the least by previous iterations of a one-class classification model. We provide empirical evidence for the merits of the proposed method compared to the available alternatives, and discuss how the method may have an impact in an applied setting.
TL;DR: A novel and fast traffic sign recognition system, a very important part for advanced driver assistance system and for autonomous driving and a challenge to existing state of the art techniques.
Abstract: In this work we developed a novel and fast traffic sign recognition system, a very important part for advanced driver assistance system and for autonomous driving. Traffic signs play a very vital role in safe driving and avoiding accident. We have used image processing and topic discovery model pLSA to tackle this challenging multiclass classification problem. Our algorithm is consist of two parts, shape classification and sign classification for improved accuracy. For processing and representation of image we have used bag of features model with SIFT local descriptor. Where a visual vocabulary of size 300 words are formed using k-means codebook formation algorithm. We exploited the concept that every image is a collection of visual topics and images having same topics will belong to same category. Our algorithm is tested on German traffic sign recognition benchmark (GTSRB) and gives very promising result near to existing state of the art techniques.