TL;DR: The experimental results show that RNN-IDS is very suitable for modeling a classification model with high accuracy and that its performance is superior to that of traditional machine learning classification methods in both binary and multiclass classification.
Abstract: Intrusion detection plays an important role in ensuring information security, and the key technology is to accurately identify various attacks in the network. In this paper, we explore how to model an intrusion detection system based on deep learning, and we propose a deep learning approach for intrusion detection using recurrent neural networks (RNN-IDS). Moreover, we study the performance of the model in binary classification and multiclass classification, and the number of neurons and different learning rate impacts on the performance of the proposed model. We compare it with those of J48, artificial neural network, random forest, support vector machine, and other machine learning methods proposed by previous researchers on the benchmark data set. The experimental results show that RNN-IDS is very suitable for modeling a classification model with high accuracy and that its performance is superior to that of traditional machine learning classification methods in both binary and multiclass classification. The RNN-IDS model improves the accuracy of the intrusion detection and provides a new research method for intrusion detection.
TL;DR: This paper provides some sufficient conditions on a loss function so that risk minimization under that loss function would be inherently tolerant to label noise for multiclass classification problems, and generalizes the existing results on noise-tolerant loss functions for binary classification.
Abstract: In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate approach would be to look for loss functions that are inherently noise-tolerant. For binary classification there exist theoretical results on loss functions that are robust to label noise. In this paper, we provide some sufficient conditions on a loss function so that risk minimization under that loss function would be inherently tolerant to label noise for multiclass classification problems. These results generalize the existing results on noise-tolerant loss functions for binary classification. We study some of the widely used loss functions in deep networks and show that the loss function based on mean absolute value of error is inherently robust to label noise. Thus standard back propagation is enough to learn the true classifier even under label noise. Through experiments, we illustrate the robustness of risk minimization with such loss functions for learning neural networks.
TL;DR: Naïve Bayes and Random Forest classification algorithms were found to be the next accurate after SVM accordingly and the research shows that time taken to build a model and precision (accuracy) is a factor on one hand; while kappa statistic and Mean Absolute Error (MAE) is another factor on the other hand.
Abstract: --Supervised Machine Learning (SML) is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances. Supervised classification is one of the tasks most frequently carried out by the intelligent systems. This paper describes various Supervised Machine Learning (ML) classification techniques, compares various supervised learning algorithms as well as determines the most efficient classification algorithm based on the data set, the number of instances and variables (features).Seven different machine learning algorithms were considered:Decision Table, Random Forest (RF) , Naïve Bayes (NB) , Support Vector Machine (SVM), Neural Networks (Perceptron), JRip and Decision Tree (J48) using Waikato Environment for Knowledge Analysis (WEKA)machine learning tool.To implement the algorithms, Diabetes data set was used for the classification with 786 instances with eight attributes as independent variable and one as dependent variable for the analysis. The results show that SVMwas found to be the algorithm with most precision and accuracy. Naïve Bayes and Random Forest classification algorithms were found to be the next accurate after SVM accordingly. The research shows that time taken to build a model and precision (accuracy) is a factor on one hand; while kappa statistic and Mean Absolute Error (MAE) is another factor on the other hand. Therefore, ML algorithms requires precision, accuracy and minimum error to have supervised predictive machine learning.
TL;DR: This chapter introduces the reader to Keras, which is a library that provides highly powerful and abstract building blocks to build deep learning networks.
Abstract: This chapter introduces the reader to Keras, which is a library that provides highly powerful and abstract building blocks to build deep learning networks.
TL;DR: Comparisons of clustering distribution and classification accuracy with six other features show that the proposed feature mining approach is quite suitable for spindle bearing fault diagnosis with multiclass classification regardless of the load fluctuation.
Abstract: Considering various health conditions under varying operational conditions, the mining sensitive feature from the measured signals is still a great challenge for intelligent fault diagnosis of spindle bearings. This paper proposed a novel energy-fluctuated multiscale feature mining approach based on wavelet packet energy (WPE) image and deep convolutional network (ConvNet) for spindle bearing fault diagnosis. Different from the vector characteristics applied in intelligent diagnosis of spindle bearings, wavelet packet transform is first combined with phase space reconstruction to rebuild a 2-D WPE image of the frequency subspaces. This special image can reconstruct the local relationship of the WP nodes and hold the energy fluctuation of the measured signal. Then, the identifiable characteristics can be further learned by a special architecture of the deep ConvNet. Other than the traditional neural network architecture, to maintain the global and local information simultaneously, deep ConvNet combines the skipping layer with the last convolutional layer as the input of the multiscale layer. The comparisons of clustering distribution and classification accuracy with six other features show that the proposed feature mining approach is quite suitable for spindle bearing fault diagnosis with multiclass classification regardless of the load fluctuation.
TL;DR: A deep convolutional neural network based pipeline for the diagnosis of Alzheimer's disease and its stages using magnetic resonance imaging (MRI) scans and new state-of-the-art results are obtained for multiclass classification of the disease.
Abstract: In the recent years, deep learning has gained huge fame in solving problems from various fields including medical image analysis. This work proposes a deep convolutional neural network based pipeline for the diagnosis of Alzheimer's disease and its stages using magnetic resonance imaging (MRI) scans. Alzheimer's disease causes permanent damage to the brain cells associated with memory and thinking skills. The diagnosis of Alzheimer's in elderly people is quite difficult and requires a highly discriminative feature representation for classification due to similar brain patterns and pixel intensities. Deep learning techniques are capable of learning such representations from data. In this paper, a 4-way classifier is implemented to classify Alzheimer's (AD), mild cognitive impairment (MCI), late mild cognitive impairment (LMCI) and healthy persons. Experiments are performed using ADNI dataset on a high performance graphical processing unit based system and new state-of-the-art results are obtained for multiclass classification of the disease. The proposed technique results in a prediction accuracy of 98.8%, which is a noticeable increase in accuracy as compared to the previous studies and clearly reveals the effectiveness of the proposed method.
TL;DR: The goal of this study is to provide a comprehensive review of different classification techniques in machine learning and will be helpful for both academia and new comers in the field of machine learning to further strengthen the basis of classification methods.
Abstract: Classification is a data mining (machine learning) technique used to predict group membership for data instances. There are several classification techniques that can be used for classification purpose. In this paper, we present the basic classification techniques. Later we discuss some major types of classification method including Bayesian networks, decision tree induction, k-nearest neighbor classifier and Support Vector Machines (SVM) with their strengths, weaknesses, potential applications and issues with their available solution. The goal of this study is to provide a comprehensive review of different classification techniques in machine learning. This work will be helpful for both academia and new comers in the field of machine learning to further strengthen the basis of classification methods.
TL;DR: Squares is presented, a performance visualization for multiclass classification problems that supports estimating common performance metrics while displaying instance-level distribution information necessary for helping practitioners prioritize efforts and access data.
Abstract: Performance analysis is critical in applied machine learning because it influences the models practitioners produce. Current performance analysis tools suffer from issues including obscuring important characteristics of model behavior and dissociating performance from data. In this work, we present Squares, a performance visualization for multiclass classification problems. Squares supports estimating common performance metrics while displaying instance-level distribution information necessary for helping practitioners prioritize efforts and access data. Our controlled study shows that practitioners can assess performance significantly faster and more accurately with Squares than a confusion matrix, a common performance analysis tool in machine learning.
TL;DR: A novel feature selection method based on maximize the sum of relevance and distance (MSRD) for solving the problem of high dimensionality and a PTHS algorithm that employs parallel optimization and candidate model pruning based on k-means and a hierarchical selection framework is proposed.
TL;DR: The severity of code smells is an important factor to take into consideration when reporting code smell detection results, since it allows the prioritization of refactoring efforts and creates larger issues to the maintainability of software a system.
Abstract: Several code smells detection tools have been developed providing different results, because smells can be subjectively interpreted and hence detected in different ways. Machine learning techniques have been used for different topics in software engineering, e.g., design pattern detection, code smell detection, bug prediction, recommending systems. In this paper, we focus our attention on the classification of code smell severity through the use of machine learning techniques in different experiments. The severity of code smells is an important factor to take into consideration when reporting code smell detection results, since it allows the prioritization of refactoring efforts. In fact, code smells with high severity can be particularly large and complex, and create larger issues to the maintainability of software a system. In our experiments, we apply several machine learning models, spanning from multinomial classification to regression, plus a method to apply binary classifiers for ordinal classification. In fact, we model code smell severity as an ordinal variable. We take the baseline models from previous work, where we applied binary classification models for code smell detection with good results. We report and compare the performance of the models according to their accuracy and four different performance measures used for the evaluation of ordinal classification techniques. From our results, while the accuracy of the classification of severity is not high as in the binary classification of absence or presence of code smells, the ranking correlation of the actual and predicted severity for the best models reaches 0.880.96, measured through Spearmans .
TL;DR: Experimental results show that CCR-ELM can achieve better performance for classification problems with imbalanced data distributions than the original ELM and existing ELM imbalance learning approach, and the kernel based CCRs can improve the performance further.
TL;DR: Experimental results show the fault diagnosis based on Gaussian–Bernoulli deep belief network is with superior diagnostic performance than the traditional feature extraction methods.
Abstract: Fault detection and isolation (FDI) is very difficult for electronics-rich analog systems due to its sophisticated mechanism and variable operational conditions. Traditionally, FDI in such systems is done through the monitoring of deviation of output signals in voltage or current at system level, which commonly arises from the degradation of one or more critical components. Therefore, FDI can be transformed to a multiclass classification task given the extracted features of the output signals in voltage or current of the circuit. Traditional feature extraction on the circuit output is mostly based on time-domain, frequency-domain, or time-frequency signal processing, which collapse high-dimensional raw signals into a lower dimensional feature set. Such low-dimensional feature set usually suffers from information loss so as to affect the accuracy of the later fault diagnosis. In order to retain as much information as possible, deep learning is proposed which employs a hierarchical structure to capture the different levels of semantic representations of the signals. In this paper, a novel fault diagnostic application of Gaussian–Bernoulli deep belief network (GB-DBN) for electronics-rich analog systems is developed which can more effectively capture the high-order semantic features within the raw output signals. The novel fault diagnosis is validated experimentally on two typical analog filter circuits. Experimental results show the fault diagnosis based on GB-DBN is with superior diagnostic performance than the traditional feature extraction methods.
TL;DR: Experiments on six benchmarking UCI datasets and two artificial datasets demonstrate that the proposed FDAF-score algorithm can not only obtain good results with fewer features than the original datasets as well as fast computation but also deal with the classification problem with noises well.
Abstract: The feature ranking method is discussed based on Fisher discriminate analysis (FDA) and F-score.The relative distribution of different classes is considered in the paper.The method removes all insignificant features at a time, so it can effectively reduce computational cost.The advantages of the proposed method are discussed. F-score is a simple feature selection technique, however, it works only for two classes. This paper proposes a novel feature ranking method based on Fisher discriminate analysis (FDA) and F-score, denoted as FDAF-score, which considers the relative distribution of classes in a multi-dimensional feature space. The main idea is that a proper subset is got according to maximizing the proportion of average between-class distance to the relative within-class scatter. Because the method removes all insignificant features at a time, it can effectively reduce computational cost. Experiments on six benchmarking UCI datasets and two artificial datasets demonstrate that the proposed FDAF-score algorithm can not only obtain good results with fewer features than the original datasets as well as fast computation but also deal with the classification problem with noises well.
TL;DR: In this article, the authors provide sufficient conditions on a loss function so that risk minimization under that loss function would be inherently tolerant to label noise for multiclass classification problems, and show that standard back propagation is enough to learn the true classifier even under label noise.
Abstract: In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate approach would be to look for loss functions that are inherently noise-tolerant. For binary classification there exist theoretical results on loss functions that are robust to label noise. In this paper, we provide some sufficient conditions on a loss function so that risk minimization under that loss function would be inherently tolerant to label noise for multiclass classification problems. These results generalize the existing results on noise-tolerant loss functions for binary classification. We study some of the widely used loss functions in deep networks and show that the loss function based on mean absolute value of error is inherently robust to label noise. Thus standard back propagation is enough to learn the true classifier even under label noise. Through experiments, we illustrate the robustness of risk minimization with such loss functions for learning neural networks.
TL;DR: A new ensemble of classifiers that consists of decision trees and random vector functional link network is proposed for multi-class classification that is significantly better than other state-of-the-art classifiers for medium and large sized data sets.
TL;DR: A novel approach, based on weighted one-against-rest SVM (WOAR-SVM), which enables seamless integration of several binary hypotheses into a composite, multiclass hypothesis, where each binary classifier may feature a unique set of classification parameters.
TL;DR: This paper presents a hybrid system where a supervised deep belief network is trained to select generic features, and a kernel-based SVM is trained from the features that learned by the DBN, and substituted linear kernel for nonlinear ones without loss of accuracy.
TL;DR: The overall computational complexity of GWLMBSVM is lower than multi-class WLTSVM classifier, since WLMSVM uses the strategy all-versus-one which is the key idea of multiple birth support vector machine, lower than that of multiple WL TSVM.
TL;DR: The results suggest that the proposed MPS allows capturing the performance of a classification with minimum influence from the training and testing conditions, and is demonstrated by its robustness towards imbalanced data and its sensitivity towards class separation in feature space.
TL;DR: It is proved COAL can be efficiently implemented for any regression family that admits squared loss optimization; it also enjoys strong guarantees with respect to predictive performance and labeling effort.
Abstract: We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs. Our algorithm, COAL, makes predictions by regressing to each label's cost and predicting the smallest. On a new example, it uses a set of regressors that perform well on past data to estimate possible costs for each label. It queries only the labels that could be the best, ignoring the sure losers. We prove COAL can be efficiently implemented for any regression family that admits squared loss optimization; it also enjoys strong guarantees with respect to predictive performance and labeling effort. We empirically compare COAL to passive learning and several active learning baselines, showing significant improvements in labeling effort and test cost on real-world datasets.
TL;DR: This work aims to improve the multi-class classification and to reduce the required EEG channel in motor imagery-based BCI by subject-specific time-frequency selection and uses only few Laplacian EEG channels located around the sensorimotor area for classification.
TL;DR: A new framework, called DeepFood, is proposed which not only extracts rich and effective features from a dataset of food ingredient images using deep learning but also improves the average accuracy of multi-class classification by applying advanced machine learning techniques.
Abstract: Deep learning has brought a series of breakthroughs in image processing. Specifically, there are significant improvements in the application of food image classification using deep learning techniques. However, very little work has been studied for the classification of food ingredients. Therefore, this paper proposes a new framework, called DeepFood which not only extracts rich and effective features from a dataset of food ingredient images using deep learning but also improves the average accuracy of multi-class classification by applying advanced machine learning techniques. First, a set of transfer learning algorithms based on Convolutional Neural Networks (CNNs) are leveraged for deep feature extraction. Then, a multi-class classification algorithm is exploited based on the performance of the classifiers on each deep feature set. The DeepFood framework is evaluated on a multi-class dataset that includes 41 classes of food ingredients and 100 images for each class. Experimental results illustrate the effectiveness of the DeepFood framework for multi-class classification of food ingredients. This model that integrates ResNet deep feature sets, Information Gain (IG) feature selection, and the SMO classifier has shown its supremacy for foodingredients recognition compared to several existing work in this area.
TL;DR: A modified item is added into multiple birth support vector machine to make the variance of the distances from each samples of a given class to their hyperplanes as small as possible and the proposed algorithm is efficient and has good classification performance.
TL;DR: An automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models is proposed and generally applicable to other kinds of plaintext clinical reports.
Abstract: Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.
TL;DR: A semi-supervised classification tree induction algorithm that can exploit both the labelled and unlabeled data, while preserving all of the appealing characteristics of standard supervised decision trees: being non-parametric, efficient, having good predictive performance and producing readily interpretable models.
Abstract: In many real-life problems, obtaining labelled data can be a very expensive and laborious task, while unlabeled data can be abundant. The availability of labeled data can seriously limit the performance of supervised learning methods. Here, we propose a semi-supervised classification tree induction algorithm that can exploit both the labelled and unlabeled data, while preserving all of the appealing characteristics of standard supervised decision trees: being non-parametric, efficient, having good predictive performance and producing readily interpretable models. Moreover, we further improve their predictive performance by using them as base predictive models in random forests. We performed an extensive empirical evaluation on 12 binary and 12 multi-class classification datasets. The results showed that the proposed methods improve the predictive performance of their supervised counterparts. Moreover, we show that, in cases with limited availability of labeled data, the semi-supervised decision trees often yield models that are smaller and easier to interpret than supervised decision trees.
TL;DR: The convolutional neural networks model of the winner of ilsvrc12 competition is implemented and the method distinguishes 1.2 million images with 1000 categories in success.
Abstract: Image classification is one of the important problems in the field of machine learning. Deep learning architectures are used in many machine learning applications such as image classification and object detection. The ability to manipulate large image clusters and implement them quickly makes deep learning a popular method in classifying images. This study points out the success of the convolutional neural networks which is the architecture of deep learning, in solving image classification problems. In the study, the convolutional neural network model of the winner of ilsvrc12 competition is implemented. The method distinguishes 1.2 million images with 1000 categories in success. The application is performed with the caffe library, and the image classification process is employed. In the application that uses the speed facility provided by GPU, the test operation is performed by using the images in Caltech-101 dataset.
TL;DR: In this article, the authors show that if one can equip positive data with confidence (positive-confidence), one can successfully learn a binary classifier, which they name positive-confidence (Pconf) classification.
Abstract: Can we learn a binary classifier from only positive data, without any negative data or unlabeled data? We show that if one can equip positive data with confidence (positive-confidence), one can successfully learn a binary classifier, which we name positive-confidence (Pconf) classification. Our work is related to one-class classification which is aimed at "describing" the positive class by clustering-related methods, but one-class classification does not have the ability to tune hyper-parameters and their aim is not on "discriminating" positive and negative classes. For the Pconf classification problem, we provide a simple empirical risk minimization framework that is model-independent and optimization-independent. We theoretically establish the consistency and an estimation error bound, and demonstrate the usefulness of the proposed method for training deep neural networks through experiments.
TL;DR: This paper concentrates on determining the emotional state from speech signals by classifying feature vectors into classes, using either a pre-trained Support Vector Machine (SVM) model or Linear Discriminant Analysis (LDA) classifier.
Abstract: Emotions exhibited by a speaker can be detected by analyzing his/her speech, facial expressions and gestures or by combining these properties. This paper concentrates on determining the emotional state from speech signals. Various acoustic features such as energy, zero crossing rate(ZCR), fundamental frequency, Mel Frequency Cepstral Coefficients (MFCCs), etc are extracted for short term, overlapping frames derived from the speech signal. A feature vector for every utterance is then constructed by analyzing the global statistics (mean, median, etc) of the extracted features over all frames. To select a subset of useful features from the full candidate feature vector, sequential backward selection (SBS) method is used with k-fold cross validation. Detection of emotion in the samples is done by classifying their respective feature vectors into classes, using either a pre-trained Support Vector Machine (SVM) model or Linear Discriminant Analysis (LDA) classifier. This approach is tested with two acted emotional databases - Berlin Database of Emotional Speech (EmoDB), and BML Emotion Database (RED). For multi class classification, accuracy of 80% for EmoDB and 73% for RED is achieved which are higher than or comparable to previous works on both the databases.
TL;DR: A novel action recognition method named stratified pooling, which is based on deep convolutional neural networks (SP-CNN), which outperforms the state-of-the-art performance on HMDB-51 and UCF-101 datasets.
Abstract: Video based human action recognition is an active and challenging topic in computer vision. Over the last few years, deep convolutional neural networks (CNN) has become the most popular method and achieved the state-of-the-art performance on several datasets, such as HMDB-51 and UCF-101. Since each video has a various number of frame-level features, how to combine these features to acquire good video-level feature becomes a challenging task. Therefore, this paper proposed a novel action recognition method named stratified pooling, which is based on deep convolutional neural networks (SP-CNN). The process is mainly composed of five parts: (i) fine-tuning a pre-trained CNN on the target dataset, (ii) frame-level features extraction; (iii) the principal component analysis (PCA) method for feature dimensionality reduction; (iv) stratified pooling frame-level features to get video-level feature; and (v) SVM for multiclass classification. Finally, the experimental results conducted on HMDB-51 and UCF-101 datasets show that the proposed method outperforms the state-of-the-art.
TL;DR: The results demonstrate that the proposed DNN with MCWSVM is efficient in terms of better classification accuracy at a lesser execution time when compared to K-nearest neighbors (KNN), SVM and naive Bayes method (NBM).