TL;DR: The approach does not sacrifice one class in favor of the other, but produces high predictions against both minority and majority classes, and compares well in comparison with a base classifier, a standard benchmarking boosting algorithm and three advanced boosting-based algorithms for imbalanced data set.
Abstract: Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. Traditional machine learning algorithms may be biased towards the majority class, thus producing poor predictive accuracy over the minority class. In this paper, we describe a new approach that combines boosting, an ensemble-based learning algorithm, with data generation to improve the predictive power of classifiers against imbalanced data sets consisting of two classes. In the DataBoost-IM method, hard examples from both the majority and minority classes are identified during execution of the boosting algorithm. Subsequently, the hard examples are used to separately generate synthetic examples for the majority and minority classes. The synthetic data are then added to the original training set, and the class distribution and the total weights of the different classes in the new training set are rebalanced. The DataBoost-IM method was evaluated, in terms of the F-measures, G-mean and overall accuracy, against seventeen highly and moderately imbalanced data sets using decision trees as base classifiers. Our results are promising and show that the DataBoost-IM method compares well in comparison with a base classifier, a standard benchmarking boosting algorithm and three advanced boosting-based algorithms for imbalanced data set. Results indicate that our approach does not sacrifice one class in favor of the other, but produces high predictions against both minority and majority classes.
TL;DR: The tests performed using CAIM and six other state-of-the-art discretization algorithms show that discrete attributes generated by the CAIM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency.
Abstract: The task of extracting knowledge from databases is quite often performed by machine learning algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones. We describe such an algorithm, called CAIM (class-attribute interdependence maximization), which is designed to work with supervised data. The goal of the CAIM algorithm is to maximize the class-attribute interdependence and to generate a (possibly) minimal number of discrete intervals. The algorithm does not require the user to predefine the number of intervals, as opposed to some other discretization algorithms. The tests performed using CAIM and six other state-of-the-art discretization algorithms show that discrete attributes generated by the CAIM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency. Two machine learning algorithms, the CLIP4 rule algorithm and the decision tree algorithm, are used to generate classification rules from data discretized by CAIM. For both the CLIP4 and decision tree algorithms, the accuracy of the generated rules is higher and the number of the rules is lower for data discretized using the CAIM algorithm when compared to data discretized using six other discretization algorithms. The highest classification accuracy was achieved for data sets discretized with the CAIM algorithm, as compared with the other six algorithms.
TL;DR: It seems that "greedy" algorithms, such as SPAM, SRIDHCR, and TDS, do not perform particularly well for supervised clustering and seem to terminate prematurely too often.
Abstract: This work centers on a novel data mining technique we term supervised clustering. Unlike traditional clustering, supervised clustering assumes that the examples are classified and has the goal of identifying class-uniform clusters that have high probability densities. Four representative-based algorithms for supervised clustering are introduced: a greedy algorithm with random restart, named SRIDHCR, that seeks for solutions by inserting and removing single objects from the current solution, SPAM (a variation of the clustering algorithm PAM), an evolutionary computing algorithm named SCEC, and a fast medoid-based top-down splitting algorithm, named TDS. The four algorithms were evaluated using a benchmark consisting of four UCI machine learning data sets. In general, it seems that "greedy" algorithms, such as SPAM, SRIDHCR, and TDS, do not perform particularly well for supervised clustering and seem to terminate prematurely too often. We also briefly describe the applications of supervised clustering.
TL;DR: The paper describes a hybrid inductive machine learning algorithm called CLIP4 that first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes, which is a unique feature of the algorithm.
Abstract: The paper describes a hybrid inductive machine learning algorithm called CLIP4. The algorithm first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes. The unique feature of the algorithm is generation of rules that involve inequalities. The algorithm works with the data that have large number of examples and attributes, can cope with noisy data, and can use numerical, nominal continuous, and missing-value attributes. The algorithm's flexibility and efficiency are shown on several well-known benchmarking data sets, and the results are compared with other machine learning algorithms. The benchmarking results in each instance show the CLIP4's accuracy, CPU time, and rule complexity, CLIP4 has built-in features like tree pruning, methods for partitioning the data (for data with large number of examples and attributes, and for data containing noise), data-independent mechanism for dealing with missing values, genetic operators to improve accuracy on small data, and the discretization schemes. CLIP4 generates model of data that consists of well-generalized rules, and ranks attributes and selectors that can be used for feature selection.
TL;DR: A canonical transformation of the data to a Lattice Latin Hypercube form is developed, which preserves the Pareto property of points but reduces storage space and algorithm run time.
Abstract: The focus in this research is on developing a fast, efficient hybrid algorithm to identify the Pareto frontier in multi-dimensional data sets The hybrid algorithm is a blend of two different base algorithms, the Simple Cull (SC) algorithm that has a low overhead but is of overall high computational complexity, and the Divide & Conquer (DC) algorithm that has a lower computational complexity but has a high overhead The hybrid algorithm employs aspects of each of the two base algorithms, adapting in response to the properties of the data
Each of the two base algorithms perform better for different classes of data, with the SC algorithm performing best for data sets with few nondominated points, high dimensionality, or fewer total numbers of points, while the DC algorithm performs better otherwise The general approach to the hybrid algorithm is to execute the following steps in order: (1) Execute one pass of the SC algorithm through the data if merited; (2) Execute the DC algorithm, which recursively splits the data into smaller problem sizes; (3) Switch to the SC algorithm for problem sizes below a certain limit
In order to determine whether Step 1 should be executed, and to determine at what problem size the switch in Step 3 should be made, estimates of both algorithms' run times as a function of the size of the data set, the dimension of the data set, and the expected number of nondominated points are needed These are developed in the thesis
To aid in increasing the speed and reducing the computational and storage complexity of the algorithm, and to enable the ability for the algorithm to adapt to the data, a canonical transformation of the data to a Lattice Latin Hypercube (LLH) form is developed The transformation preserves the Pareto property of points but reduces storage space and algorithm run time
In order to test the three algorithms, three different methods for creating randomized data sets with arbitrary dimensionality and numbers of nondominated points are developed Each of the methods provides insight into the properties of nondominated sets, along with providing test cases for the algorithms Additionally, a spacecraft design problem is developed to serve as a source of real world test data
TL;DR: A new fusion structure and algorithm with incremental learning ability is constructed by adopting the modified RFWR algorithm together with the weighted average algorithm.
Abstract: This paper addresses multi-sensor data fusion with incremental learning ability. A new cost function is proposed for the receptive field weighted regression (RFWR) algorithm based on the idea of back propagation (BP), so that the computation efficiency and the learning strategy of the modified RFWR are much more applicable for multi-sensor data fusion problem. Thus a new fusion structure and algorithm with incremental learning ability is constructed by adopting the modified RFWR algorithm together with the weighted average algorithm. Experiments of a two-camera unified positioning system are implemented successfully to test the proposed computation structure and algorithms.
TL;DR: The no-free-lunch theorem is extended to subsets of functions and it is shown that for algorithm a performing better on a set of functions than algorithm b, three has to be another subset of functions on which b performs better in average than a.
Abstract: In this paper, the no-free-lunch theorem is extended to subsets of functions. It is shown that for algorithm a performing better on a set of functions than algorithm b, three has to be another subset of functions on which b performs better in average than a. to achieve a performance evaluation for an algorithm, it is not sufficient to demonstrate its better performance on a given set of functions. Instead of this, the diversity of an algorithm is considered in this paper in more detail. The total number of possible algorithms is computed and compared with the number of algorithms instances that a random search or a population-based algorithm can have. It comes out that the number of different random searches is very small in comparison to the total number of algorithms. On the other hand, population-based algorithms are principally able to cover the set of all possible algorithms. The smaller variance of algorithm performance, measured by the repeated application of the algorithm under different settings on different random sets of functions, comes out to be a value reflecting the higher count of instances.
TL;DR: It is shown that the application of McAllester's PAC-Bayesian bound to rule learning yields a practical learning algorithm, which is based on ensembles of weighted rule sets, and it is proved that the bound can be further improved by allowing the learner to abstain from uncertain predictions.
Abstract: While there is a lot of empirical evidence showing that traditional rule learning approaches work well in practice, it is nearly impossible to derive analytical results about their predictive accuracy. In this paper, we investigate rule-learning from a theoretical perspective. We show that the application of McAllester's PAC-Bayesian bound to rule learning yields a practical learning algorithm, which is based on ensembles of weighted rule sets. Experiments with the resulting learning algorithm show not only that it is competitive with state-of-the-art rule learners, but also that its error rate can often be bounded tightly. In fact, the bound turns out to be tighter than one of the "best" bounds for a practical learning scheme known so far (the Set Covering Machine). Finally, we prove that the bound can be further improved by allowing the learner to abstain from uncertain predictions.
TL;DR: This thesis introduces a new rule induction algorithm for learning classification rules, which broadly follows the approach of algorithms represented by CN2, and proposes a new search method which employs several novel search-space pruning rules and rule-evaluation techniques which results in a highly efficient algorithm with improved induction performance.
Abstract: Machine learning has been studied intensively during the past two decades. One motivation has been the desire to automate the process of knowledge acquisition during the construction of expert systems. The recent emergence of data mining as a major application for machine learning algorithms has led to the need for algorithms that can handle very large data sets. In real data mining applications, data sets with millions of training examples, thousands of attributes and hundreds of classes are common. Designing learning algorithms appropriate for such applications has thus become an important research problem. A great deal of research in machine learning has focused on classification learning. Among the various machine learning approaches developed for classification, rule induction is of particular interest for data mining because it generates models in the form of IF-THEN rules which are more expressive and easier for humans to comprehend. One weakness with rule induction algorithms is that they often scale relatively poorly with large data sets, especially on noisy data. The work reported in this thesis aims to design and develop scalable rule induction algorithms that can process large data sets efficiently while building from them the best possible models. There are two main approaches for rule induction, represented respectively by CN2 and the AQ family of algorithms. These approaches vary in the search strategy employed for examining the space of possible rules, each of which has its own advantages and disadvantages. The first part of this thesis introduces a new rule induction algorithm for learning classification rules, which broadly follows the approach of algorithms represented by CN2. The algorithm presents a new search method which employs several novel search-space pruning rules and rule-evaluation techniques. This results in a highly efficient algorithm with improved induction performance. Real-world data do not only contain nominal attributes but also continuous attributes. The ability to handle continuously valued data is thus crucial to the success of any general purpose learning algorithm. Most current discretisation approaches are developed as pre- processes for learning algorithms. The second part of this thesis proposes a new approach which discretises continuous-valued attributes during the learning process. Incorporating discretisation into the learning process has the advantage of taking into account the bias inherent in the learning system as well as the interactions between the different attributes. This in turn leads to improved performance. Overfitting the training data is a major problem in machine learning, particularly when noise is present. Overfitting increases learning time and reduces both the accuracy and the comprehensibility of the generated rules, making learning from large data sets more difficult. Pruning is a technique widely used for addressing such problems and consequently forms an essential component of practical learning algorithms. The third part of this thesis presents three new pruning techniques for rule induction based on the Minimum Description Length (MDL) principle. The result is an effective learning algorithm that not only produces an accurate and compact rule set, but also significantly accelerates the learning process. RULES-3 Plus is a simple rule induction algorithm developed at the author's laboratory which follows a similar approach to the AQ family of algorithms. Despite having been successfully applied to many learning problems, it has some drawbacks which adversely affect its performance. The fourth part of this thesis reports on an attempt to overcome these drawbacks by utilising the ideas presented in the first three parts of the thesis. A new version of RULES-3 Plus is reported that is a general and efficient algorithm with a wide range of potential applications.
TL;DR: The recursive predictive error (RPE) learning algorithm for recurrent neural networks is introduced, and its stability is demonstrated, and in order to overcome the disadvantage of the centralized computing of RPE learning algorithm, a parallel structure algorithm is derived.
TL;DR: A variational approximation to the greedy EM algorithm which speedups that are at least linear in the number of data points by strictly increasing a lower bound on the data log-likelihood in every learning step guarantees convergence.
Abstract: Mixture probability densities are popular models that are used in several data mining and machine learning applications, eg, clustering A standard algorithm for learning such models from data is the Expectation-Maximization (EM) algorithm However, EM can be slow with large datasets, and therefore approximation techniques are needed In this paper we propose a variational approximation to the greedy EM algorithm which oers speedups that are at least linear in the number of data points Moreover, by strictly increasing a lower bound on the data log-likelihood in every learning step, our algorithm guarantees convergence We demonstrate the proposed algorithm on a synthetic experiment where satisfactory results are obtained
TL;DR: The HGA (hybrid partheno-genetic algorithm) is proposed, that is, the hill-climbing algorithm is integrated to search for a better individual to overcome the shortcoming that the optimal high rank schema can be deserted arbitrarily.
TL;DR: A novel method to learn decision lists (classifiers as sets of rules) with the Discrete Function Learning (DFL) algorithm and introduces a method called 2 function to deal with noises in datasets and to avoid overfitting.
Abstract: In this paper, we propose a novel method to learn decision lists (classifiers as sets of rules) with the Discrete Function Learning (DFL) algorithm. The DFL algorithm works by finding a subset of attributes which gives all the needed information to decide group attribute of the cases. We perform experiments on some datasets of the UCI machine learning repository to validate the DFL algorithm and the proposed prediction method. The experimental results show that the DFL algorithm achieves a competitive accuracy to current classification methods with excellent efficiency. The classifiers learned with the DFL algorithm contain biologically critical attributes of the corresponding datasets. We further introduce a method called 2 function to deal with noises in datasets and to avoid overfitting. The experimental results show that the 2 function method is a good supplement to the DFL algorithm.
TL;DR: The present algorithm is a very efficient network-growing algorithm that was applied to the famous vertical-horizontal lines detection problem, a medical data problem and a road classification problem and confirmed that the method could solve problems that single-layered networks failed to.
Abstract: In this paper, we extend our greedy network-growing algorithm to multi-layered networks. With multi-layered networks, we can solve many complex problems that single-layered networks fail to solve. In addition, the network-growing algorithm is used in conjunction with teacher-directed learning that produces appropriate outputs without computing errors between targets and outputs. Thus, the present algorithm is a very efficient network-growing algorithm. The new algorithm was applied to three problems: the famous vertical-horizontal lines detection problem, a medical data problem and a road classification problem. In all these cases, experimental results confirmed that the method could solve problems that single-layered networks failed to. In addition, information maximization makes it possible to extract salient features in input patterns.
TL;DR: An efficient learning algorithm and its improved algorithm based on local search are proposed and their results indicate that they achieve accuracy much better than BP algorithm and convergence speed much faster thanBP algorithm.
Abstract: BP algorithm is frequently applied to train feedforward neural network, but it often suffers from slowness of convergence speed. In this paper, an efficient learning algorithm and its improved algorithm based on local search are proposed. Computer simulations with standard problems such as XOR, Parity, TwoNorm and MushRoom problems are presented and compared with BP algorithm. Experimental results indicate that our proposed algorithms achieve accuracy much better than BP algorithm and convergence speed much faster than BP algorithm, and the generalization of our proposed algorithms for TwoNorm and MushRoom problems is comparable to BP algorithm.
TL;DR: This document presents an overview of path search algorithms from literature, and results show that the MLE algorithm outperforms the PDP algorithm in most situations, but requires more computations under all circumstances.
Abstract: The CADTES and SAS groups of the EEMCS faculty are working on the Adaptive Wireless Networking (AWGN) project. As part of this project adaptive algorithms are developed for digital signal processing in W-CDMA systems. One of these algorithms is the path search algorithm that estimates the delays of the paths between a transmitter and a receiver that are caused by reflections. Several options exist for implementing the path search algorithm. One of the questions posed within the AWGN project is: to what extent will it be useful for the path search function to switch between different algorithms, as the conditions between transmitter and receiver change?
First, this document presents an overview of path search algorithms from literature. About twenty papers are discussed, the algorithms they describe have been compared with each other on sixteen points. Based on the similarities that are discovered, the algorithms are classified in three classes: algorithms using a Power Delay Profile (PDP), algorithms based on a Maximum Likelihood Estimation (MLE) method and subspace-based algorithms.
Next, an algorithm is selected from each class. Both the Power Delay Profile and the Maximum Likelihood algorithms are implemented, the subspace algorithm is analyzed in theory only. In order to set up meaningful simulations channel models and simulation scenarios are investigated. The available simulator is discussed, as well as the modifications that were made.
The algorithms? performance is determined by simulation, results show that the MLE algorithm outperforms the PDP algorithm in most situations. The MLE algorithm however requires more computations under all circumstances. In view of this trade-off between performance and number of computations the MLE algorithm should be used in case of closely spaced paths, if time-variant path delays need to be estimated and if strong Doppler effects occur. BER simulations will have to be carried out to quantify the benefits of selecting the MLE algorithm in these cases.
TL;DR: This paper presents one method in order to improve the speed of decision trees with algorithm of ID3 and theSpeed of computers and it shows good results in terms of speed and efficiency.
Abstract: ID3 is the key of algorithm of decision trees learning. This paper presents one method in order to improve the speed of decision trees with algorithm of ID3 and the speed of computers.
TL;DR: The proposed algorithm, named Learn++, takes advantage of synergistic generalization performance of an ensemble of classifiers in which each classifier is trained with a strategically chosen subset of the training databases that subsequently become available.
Abstract: An incremental learning algorithm is introduced for learning new information from additional data that may later become available, after a classifier has already been trained using a previously available database. The proposed algorithm is capable of incrementally learning new information without forgetting previously acquired knowledge and without requiring access to the original database, even when new data include examples of previously unseen classes. Scenarios requiring such a learning algorithm are encountered often in nondestructive evaluation (NDE) in which large volumes of data are collected in batches over a period of time, and new defect types may become available in subsequent databases. The algorithm, named Learn++, takes advantage of synergistic generalization performance of an ensemble of classifiers in which each classifier is trained with a strategically chosen subset of the training databases that subsequently become available. The ensemble of classifiers then is combined through a weighted majority voting procedure. Learn++ is independent of the specific classifier(s) comprising the ensemble, and hence may be used with any supervised learning algorithm. The voting procedure also allows Learn++ to estimate the confidence in its own decision. We present the algorithm and its promising results on two separate ultrasonic weld inspection applications.
TL;DR: This paper studies a simple learning algorithm for binary classification that predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error, and shows that the prediction is much more stable than the prediction of an algorithm that predicting with the best hypothesis.
Abstract: We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction of an algorithm that predicts with the best hypothesis. By allowing the algorithm to abstain from predicting on some examples, we show that the predictions it makes when it does not abstain are very reliable. Finally, we show that the probability that the algorithm abstains is comparable to the generalization error of the best hypothesis in the class.
TL;DR: Taking an ensemble containing two of the best known active learning algorithms and a new algorithm, the resulting new active learning master algorithm is empirically shown to consistently perform almost as well as and sometimes outperform the best algorithm in the ensemble on a range of classification problems.
Abstract: This work is concerned with the question of how to combine online an ensemble of active learners so as to expedite the learning progress in pool-based active learning. We develop an active-learning master algorithm, based on a known competitive algorithm for the multi-armed bandit problem. A major challenge in successfully choosing top performing active learners online is to reliably estimate their progress during the learning session. To this end we propose a simple maximum entropy criterion that provides effective estimates in realistic settings. We study the performance of the proposed master algorithm using an ensemble containing two of the best known active-learning algorithms as well as a new algorithm. The resulting active-learning master algorithm is empirically shown to consistently perform almost as well as and sometimes outperform the best algorithm in the ensemble on a range of classification problems.
TL;DR: Applications of Markov chain Monte Carlo (MCMC) methods to estimate the total weight ofMultiplicative weight-update algorithms such as Winnow and Weighted Majority are explored and empirical results are presented indicating that in practice, the time complexity is much better than what is implied by the worst-case theoretical analysis.