TL;DR: This work presents a novel learning algorithm for efficient construction of the radial basis function (RBF) networks that can deliver the same level of accuracy as the support vector machines (SVMs) in data classification applications and compares the performance of the RBF networks constructed with the proposed learning algorithm and those constructed with a conventional cluster-based learning algorithm.
Abstract: This work presents a novel learning algorithm for efficient construction of the radial basis function (RBF) networks that can deliver the same level of accuracy as the support vector machines (SVMs) in data classification applications. The proposed learning algorithm works by constructing one RBF subnetwork to approximate the probability density function of each class of objects in the training data set. With respect to algorithm design, the main distinction of the proposed learning algorithm is the novel kernel density estimation algorithm that features an average time complexity of O(nlogn), where n is the number of samples in the training data set. One important advantage of the proposed learning algorithm, in comparison with the SVM, is that the proposed learning algorithm generally takes far less time to construct a data classifier with an optimized parameter setting. This feature is of significance for many contemporary applications, in particular, for those applications in which new objects are continuously added into an already large database. Another desirable feature of the proposed learning algorithm is that the RBF networks constructed are capable of carrying out data classification with more than two classes of objects in one single run. In other words, unlike with the SVM, there is no need to resort to mechanisms such as one-against-one or one-against-all for handling datasets with more than two classes of objects. The comparison with SVM is of particular interest, because it has been shown in a number of recent studies that SVM generally are able to deliver higher classification accuracy than the other existing data classification algorithms. As the proposed learning algorithm is instance-based, the data reduction issue is also addressed in this paper. One interesting observation in this regard is that, for all three data sets used in data reduction experiments, the number of training samples remaining after a na/spl inodot//spl uml/ve data reduction mechanism is applied is quite close to the number of support vectors identified by the SVM software. This paper also compares the performance of the RBF networks constructed with the proposed learning algorithm and those constructed with a conventional cluster-based learning algorithm. The most interesting observation learned is that, with respect to data classification, the distributions of training samples near the boundaries between different classes of objects carry more crucial information than the distributions of samples in the inner parts of the clusters.
TL;DR: The method, wrapped progressive sampling, is a combination of classifier wrapping and progressive sampling of training data that shows little improvement for the algorithm which offers few parameter variations, but marked improvements for the algorithms offering many possible testable parameter combinations.
Abstract: We present a heuristic meta-learning search method for finding a set of optimized algorithmic parameters for a range of machine learning algo- rithms. The method, wrapped progressive sampling, is a combination of classifier wrapping and progressive sampling of training data. A series of experiments on UCI benchmark data sets with nominal features, and five machine learning algorithms to which simple wrapping and wrapped progres- sive sampling is applied, yields results that show little improvement for the algorithm which offers few parameter variations, but marked improvements for the algorithms offering many possible testable parameter combinations, yielding up to 32.2% error reduction with the winnow learning algorithm
TL;DR: A novel feature weighted fuzzy clustering algorithm is proposed in this paper, in which the ReliefF algorithm is used to assign the weights for every feature.
Abstract: In the field of cluster analysis, the fuzzy k-means, k-modes and k-prototypes algorithms were designed for numerical, categorical and mixed data sets respectively. However, all the above algorithms assume that each feature of the samples plays an uniform contribution for cluster analysis. To consider the particular contributions of different features, a novel feature weighted fuzzy clustering algorithm is proposed in this paper, in which the ReliefF algorithm is used to assign the weights for every feature. By weighting the features of samples, the above three clustering algorithms can be unified, and better classification results can be also achieved. The experimental results with various real data sets illustrate the effectiveness of the proposed algorithm.
TL;DR: A modification to this algorithm, where the voting weights of the classifiers are updated dynamically based on the location of the test input in the feature space, which provides improved performance, stronger immunity to catastrophic forgetting and finer balance to the stability-plasticity dilemma than its predecessor.
Abstract: We have previously introduced Learn++, an ensemble based incremental learning algorithm for acquiring new knowledge from data that later become available, even when such data introduce new classes. In this paper, we describe a modification to this algorithm, where the voting weights of the classifiers are updated dynamically based on the location of the test input in the feature space. The new algorithm provides improved performance, stronger immunity to catastrophic forgetting and finer balance to the stability-plasticity dilemma than its predecessor, particularly when new classes are introduced. The modified algorithm and its performance, as compared to Adaboost.Ml and the original Learn++, on real and benchmark datasets are presented.
TL;DR: A new rule induction algorithm called RULes-6, derived from the RULES-3 Plus algorithm, which employs a fast and noise-tolerant search method for extracting IF-THEN rules from examples and uses simple and effective methods for rule evaluation and continuous attributes handling.
Abstract: RULES-3 Plus is a member of the RULES family of simple inductive learning algorithms with successful engineering applications. However, it requires modification in order to be a practical tool for problems involving large data sets. In particular, efficient mechanisms for handling continuous attributes and noisy data are needed. This paper presents a new rule induction algorithm called RULES-6, derived from the RULES-3 Plus algorithm. The algorithm employs a fast and noise-tolerant search method for extracting IF-THEN rules from examples. It also uses simple and effective methods for rule evaluation and continuous attributes handling. A detailed empirical evaluation of the algorithm is reported in the paper. The results presented demonstrate the strong performance of the algorithm.
TL;DR: The empirical comparison of a recent algorithm RM, its new extensions and three classical classifiers in different aspects including classification accuracy, computational time and storage requirement shows that nominal attributes do have an impact on the performance of those compared learning algorithms.
Abstract: There are many learning algorithms available in the field of pattern classification and people are still discovering new algorithms that they hope will work better. Any new learning algorithm, beside its theoretical foundation, needs to be justified in many aspects including accuracy and efficiency when applied to real life problems. In this paper, we report the empirical comparison of a recent algorithm RM, its new extensions and three classical classifiers in different aspects including classification accuracy, computational time and storage requirement. The comparison is performed in a standardized way and we believe that this would give a good insight into the algorithm RM and its extension. The experiments also show that nominal attributes do have an impact on the performance of those compared learning algorithms.
TL;DR: The proposed algorithm belongs to the class of Estimation Distribution Al algorithms and represents an interesting alternative to approach the Linkage Problem in Genetic Algorithms.
Abstract: This paper extends the FDA - the Factorized Distribution Algorithm - with a structural learning component. The FDA has been extensively investigated for the optimization of additively decomposed discrete functions (ADFs). Now, we are able to deal with more general problems, which are solved by FDA in a blackbox optimization scenario. The key point is the construction of the Junction Tree, which is placed at the centre of the algorithm. Learning the Junction Tree directly from the data is a process that is accomplished by making independency tests of as lower as possible order. The proposed algorithm belongs to the class of Estimation Distribution Algorithms and represents an interesting alternative to approach the Linkage Problem in Genetic Algorithms.
TL;DR: This paper presents a weighted-binary-sequential method to predict the status of customer patronage for the next day and shows that time-weighted sequential algorithms are generally superior to un- weighed sequential algorithm.
Abstract: This paper presents a weighted-binary-sequential method to predict the status of customer patronage for the next day. Most of the research using association rules to mine sequential data focus on the algorithms and computing efficiency of pattern or rule generation. But few of them consider the time value of the sequential data. It is desirable to weight recent observations more heavily than remote observations in the analysis of time-series data. In this paper, we address a time-weighted concept on association algorithm to mine the binary-time-series data. The weighted binary sequence algorithm will give more weight on the recent data in finding the longest frequent patterns from binary-time-series data. There are two weighting methods; dynamic-length weighting and fixed-length weighting. Both algorithms are compared to the un-weighted algorithm to show how time value influences the prediction accuracy. Some performance results with a real-life website application given in this paper show that time-weighted sequential algorithms are generally superior to un-weighted sequential algorithm.
TL;DR: A browser extension is presented to dynamically learn to filter unwanted images (such as advertisements or flashy graphics) based on minimal user feedback using pieces of the Uniform Resource Locators of such images as predictors.
Abstract: We present a browser extension to dynamically learn to filter unwanted images (such as advertisements or flashy graphics) based on minimal user feedback. To do so, we apply the weighted majority algorithm using pieces of the Uniform Resource Locators of such images as predictors. Experimental results tend to confirm that the accuracy of the predictions converges quickly to very high levels.
TL;DR: The results show that TIBL algorithm and it’s combining method, improve the performance of the k-nearest neighbor classification, and also achieves higher generalization accuracy than other popular machine learning algorithms.
Abstract: The basic k-nearest-neighbor classification algorithm works well in many domains but has several shortcomings. This paper proposes a tolerant instance-based learning algorithm TIBL and it’s combining method by simple voting of TIBL, which is an integration of genetic algorithm, tolerant rough sets and k-nearest neighbor classification algorithm. The proposed algorithms seek to reduce storage requirement and increase generalization accuracy when compared to the basic k-nearest neighbor algorithm and other learning models. Experiments have been conducted on some benchmark datasets from the UCI Machine Learning Repository. The results show that TIBL algorithm and it’s combining method, improve the performance of the k-nearest neighbor classification, and also achieves higher generalization accuracy than other popular machine learning algorithms.
TL;DR: In this new algorithm, the original graph is coarsened, partitioned by the divisive MinMaxCut algorithm and then decoarsened to construct faster and more accurate algorithm.
Abstract: The divisive MinMaxCut algorithm of Ding et al.[3] produces more accurate clustering results than existing document cluster methods. Multilevel algorithms [4, 1, 5, 7] have been used to boost the speed of graph partitioning algorithms. We combine these two algorithms to construct faster and more accurate algorithm. In this new algorithm, the original graph is coarsened, partitioned by the divisive MinMaxCut algorithm and then decoarsened. A refining algorithm is also applied to improve the accuracy at each level.
TL;DR: Experimental results show the ID3 algorithm pruning method performs better than alternatives, especially when dealing with deficient data sets, and a new pruned method is put forward - IEEP which can prune more unknown nodes and can not fall algorithm accuracy rate.
Abstract: ID3 algorithm is a decision tree induction algorithm, but its pruning method (EEP) is an ineffective method when the data sets are deficient, uncertain. In this paper we analyze and study the ID3 algorithm and its pruning methods, then improve on EEP algorithm, and put forward a new pruning method - IEEP which can prune more unknown nodes and can not fall algorithm accuracy rate. We present experimental results that show the method performs better than alternatives, especially when dealing with deficient data sets.
TL;DR: This paper presents an algorithm which significantly reduces the intensity of computation and is a version of the incremental EM algorithm which cycles through data cases in blocks which has the standard convergence guarantee of EM.
Abstract: The EM algorithm is one of the most popular statistical learning algorithms. It is a method for parameter estimation in various problems involving missing data. However, it is a batch learning method and often requires significant computational resources. So we need to develop more elaborate methods to adapt the databases with a large number of records or large dimensionality. In this paper, we present an algorithm which significantly reduces the intensity of computation. The algorithm is based on partial E-steps which has the standard convergence guarantee of EM. It is a version of the incremental EM algorithm which cycles through data cases in blocks. We confirm that the algorithm can reduce computational costs evidently through its application to large databases.
TL;DR: Experiments show that C-SVM can effectively solve the misclassification problem resulted from the imbalance in the number of training samples of different classes and the problem that important samples are misclassified.
Abstract: This paper proposes a weighted C-SVM algorithm and analyzes its classification performance theoretically. This weighted C-SVMintroduces weight factors for classes and samples. Experiments show that C-SVM can effectively solve the misclassification problem resulted fromthe imbalance in the number of training samples of different classes and the problem that important samples are misclassified.
TL;DR: In this paper, scoring-based and constraint-based algorithms are two approaches for learning BN structure from data, and a hybrid algorithm combines these two approaches in order being more efficient, and the algorithm, then, is modified to overcome incomplete databases.
Abstract: Scoring-based and constraint-based algorithms are two approaches for learning BN structure from data. Hybrid algorithm combines these two approaches in order being more efficient. Experimental result shows its superior. The algorithm, then, is modified to overcome incomplete databases. It is expected that the proposed algorithm can also show its superiority.
TL;DR: In this paper, an algorithm to solve the dependency on initial conditions of the k-means algorithm with similarity functions is proposed, this algorithm is tested and compared against k-Means algorithmWith similarity functions.
Abstract: The k-means algorithm is a frequently used algorithm for solving clustering problems This algorithm has the disadvantage that it depends on the initial conditions, for that reason, the global k-means algorithm was proposed to solve this problem On the other hand, the k-means algorithm only works with numerical features This problem is solved by the k-means algorithm with similarity functions that allows working with qualitative and quantitative variables and missing data (mixed and incomplete data) However, this algorithm still depends on the initial conditions Therefore, in this paper an algorithm to solve the dependency on initial conditions of the k-means algorithm with similarity functions is proposed, our algorithm is tested and compared against k-means algorithm with similarity functions
TL;DR: This work uses a genetic algorithm to locate sets of models which are not outperformed on all of the tasks and explores the role that the algorithm representation and initial population has on task performance.
Abstract: Exploring multiple classes of learning algorithms for those algorithms which perform best in multiple tasks is a complex problem of multiple-criteria optimisation. We use a genetic algorithm to locate sets of models which are not outperformed on all of the tasks. The genetic algorithm develops a population of multiple types of learning algorithms, with competition between individuals of different types. We find that inherent differences in the convergence time and performance levels of the different algorithms leads to misleading population effects. We explore the role that the algorithm representation and initial population has on task performance. Our findings suggest that separating the representation of different algorithms is beneficial in enhancing performance. Also, initial seeding is required to avoid premature convergence to non-optimal classes of algorithms.
TL;DR: Experiments on combinatorial optimization problem and GA-deceptive problems show that ensemble method improves the performance of genetic algorithm greatly and proposed an efficient hybrid optimization algorithm: GA ensemble.
Abstract: Ensemble method has been deeply studied and widely used in the machine learning communities. Its basic idea can be represented as: A ‘weak’ learning algorithm that performs just slightly better than random guessing can be ‘boosted’ into an arbitrarily accurate ‘strong’ learning algorithm. Inspired from the fascinating idea, the paper used ensemble method to improve the performance of genetic algorithm and proposed an efficient hybrid optimization algorithm: GA ensemble. In GA ensemble, a collection of genetic algorithms are designed to solve the same problem and population of each algorithm is sampled from a solutions pool using bagging method. Experiments on combinatorial optimization problem and GA-deceptive problems show that ensemble method improves the performance of genetic algorithm greatly.
TL;DR: The synthetic diagnosing of transformer faults based on dissolved gas-in-oil analysis (DGA) is introduced and the weighted majority algorithm is adopted in terms of decomposing each diagnosis result into discharge and over-heating.
Abstract: The synthetic diagnosing of transformer faults based on dissolved gas-in-oil analysis (DGA) is introduced. The multi-expert is mainly composed of three-ratio method, Duval's triangle, neural network and case-based reasoning. However, the results of multi-expert diagnosing may be inconsistent, and how to successfully integrate those diagnosing results is still pendent. The weighted majority algorithm is then adopted in terms of decomposing each diagnosis result into discharge and over-heating. Furthermore, the weighted coefficient of a particular expert is tentatively determined by its probability of correct diagnosis. Lastly, the method proposed is proved feasible in practice.
TL;DR: A algorithm of mining association rules with weighted items base on probability was designed, it solved the problem of the classical Apriori algorithm which can't mine association rules in the little probability items by using an improved model of weighted support measurements.
Abstract: A algorithm of mining association rules with weighted items base on probability was designed,it solved the problem of the classical Apriori algorithm which can't mine association rules in the little probability items.At the same time the problem of invalidation of the "downward closure property" in the weighted setting was solved by using an improved model of weighted support measurements. The algorithm is both scalable and efficient in discovering relationships in practical using.
TL;DR: An improved and C4.5 based algorithm is introduced by adding boosting techniques, which thus improves accuracy and is used to analyze supermarket customer data.
Abstract: Abstrcat The key to the management of a modern company is the customer value analysis The classification algorithm can deal with this analysis very well The Decision Tree Algorithm, especially C45, is an important kind of classification algorithm However, C45 Algorithm has some shortcomings in accuracy An improved and C45 based algorithm is introduced by adding boosting techniques, which thus improves accuracy The paper uses the new algorithm to analyze supermarket customer data The experiment proves that the accuracy of the improved algorithm is better than C45
TL;DR: This paper enhances the Lempel-Ziv LZ78 algorithm for improved performance from a learning perspective and applies it to the learning of user macros in a computer desktop environment.
Abstract: One application of the Lempel-Ziv LZ78 algorithm, other than compression, is learning repeating sequences in a data stream One shortcoming of the algorithm though is its slow learning rate. In this paper we enhance the algorithm for improved performance from a learning perspective and apply it to the learning of user macros in a computer desktop environment. Once a macro is learned it can be predicted and offered back at opportune times. With the enhanced algorithm, it is possible for a macro to be learned in as few as two exposures to a sequence.
TL;DR: Comparing the performance of an idealized genetic algorithm that uses a fitness function based on the generalization error with that of an empirical genetic algorithm based on Rademacher penalization indicates that the empirical algorithm does almost as well as the idealized algorithm would.
Abstract: We propose an abstract self bounding genetic algorithm that can be applied to various problems of machine learning. The bound on the generalization error that is output by our algorithm is based on Rademacher penalization, a data driven penalization technique. We prove probabilistic oracle inequalities for the theoretical risk of the estimators based on this approach. This is done by comparing the performance of an idealized genetic algorithm that uses a fitness function based on the generalization error with that of an empirical genetic algorithm based on Rademacher penalization. The inequalities indicate that although we are not able to implement the idealized algorithm (because of the inability to compute the generalization error), the empirical algorithm does almost as well as the idealized algorithm would.
TL;DR: This paper proposes an algorithm which decides dynamically which algorithm of ME should be used maintaining a balance between speed and accuracy, and will thus be a combination of different ME algorithms with varying computational complexity.
Abstract: Motion estimation (ME) is one of the most time-consuming parts of video encoding system, and significantly affects the quality of reconstructed image sequences.
It is a commonly accepted theory that an algorithm for ME which has more computational complexity (low speed) results in more accurate results. Thus such an algorithm can be used only on need basis if it is to be implemented in a fast environment. Algorithms which have less computational complexity can be used in situations where they provide required accuracy.
In this paper, we propose an algorithm which decides dynamically which algorithm of ME should be used maintaining a balance between speed and accuracy. The resulting algorithm will thus be a combination of different ME algorithms with varying computational complexity.
TL;DR: This thesis focused on extending Baird's work further by comparing the performance of the residual algorithm against direct application of the Temporal Difference learning algorithm.
Abstract: A number of reinforcement learning algorithms have been developed that are guaranteed to converge to an optimal solution for look-up tables. However, it has also been shown that these algorithms become unstable when used directly with a function approximation system. A new class of algorithms developed by Baird (1995) were created to handle the problem that direct algorithms have with function approximation systems. This thesis focused on extending Baird's work further by comparing the performance of the residual algorithm against direct application of the Temporal Difference learning algorithm. Four benchmark experiments were used to test each algorithm with various values of lambda and alpha over a period of twenty trials. Overall it was shown that the residual algorithm outperformed direct application of the TD learning algorithm on all four experiments.
TL;DR: An efficient Bayesian network learning algorithm, which is an improvement to J. Cheng’s algorithm that uses Mutual Information and Conditional Mutual Information as Conditional Independence (CI) tests, and an efficient method for finding an approximate minimum cut-set is proposed in this algorithm.
Abstract: Generally speaking, dependency analysis based Bayesian network learning algorithms are of higher efficiency. J. Cheng’s algorithm is a representative of this kinds of algorithms, while its efficiency could be improved further. This paper presents an efficient Bayesian network learning algorithm, which is an improvement to J. Cheng’s algorithm that uses Mutual Information (MI) and Conditional Mutual Information (CMI) as Conditional Independence (CI) tests. Through redefining the equations for calculating MI and CMI, our algorithm could decrease a large number of basic operations such as logarithms, divisions etc. and reduce the times of access to datasets to the minimum. Moreover, to efficiently calculate CMI, an efficient method for finding an approximate minimum cut-set is proposed in our algorithm. Experimental results show that under the same accuracy, our algorithm is much more efficient than J. Cheng’s algorithm.
TL;DR: The proposed genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data is elitist which maintains the monotonic convergence property of the EM algorithm.
Abstract: We propose a genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data. This algorithm is capable of selecting the number of components of the model using the minimum description length (MDL) criterion. Our approach benefits from the properties of genetic algorithms (GA) and the EM algorithm by combination of both into a single procedure. The population-based stochastic search of the GA explores the search space more thoroughly than the EM method. Therefore, our algorithm enables escaping from local optimal solutions since the algorithm becomes less sensitive to its initialization. The GA-EM algorithm is elitist which maintains the monotonic convergence property of the EM algorithm. The experiments on simulated and real data show that the GA-EM outperforms the EM method since: (1) we have obtained a better MDL score while using exactly the same termination condition for both algorithms; (2) our approach identifies the number of components which were used to generate the underlying data more often than the EM algorithm.
TL;DR: This paper investigates algorithm control techniques that make decisions based only on observations of the improvement in solution quality achieved by each algorithm, and shows that a low knowledge approach results in a system that achieves significantly better performance than all of the pure algorithms without requiring additional human expertise.
Abstract: This paper addresses the question of allocating computational resources among a set of algorithms to achieve the best performance on scheduling problems. Our primary motivation in addressing this problem is to reduce the expertise needed to apply optimization technology. Therefore, we investigate algorithm control techniques that make decisions based only on observations of the improvement in solution quality achieved by each algorithm. We call our approach “low knowledge” since it does not rely on complex prediction models, either of the problem domain or of algorithm behavior. We show that a low-knowledge approach results in a system that achieves significantly better performance than all of the pure algorithms without requiring additional human expertise. Furthermore the low-knowledge approach achieves performance equivalent to a perfect high-knowledge classification approach.
TL;DR: A coarse-grained algorithm is proposed, AC2001/3.1, that is worst case optimal and preserves as much as possible the ease of its integration into a solver (no heavy data structure to be maintained during search) and is competitive with the best fine- grained algorithms such as AC-6.
TL;DR: This paper presents a novel scheduling algorithm for heterogeneous computing systems known as the Productive Duplication-based Heterogeneous Earliest-Finish-Time (PDHEFT) algorithm, based on a recently proposed list-scheduling heuristic which is proven to perform well with a low time complexity.
Abstract: The scheduling problem has been shown to be NP-complete in general cases, and as a consequence many heuristic algorithms account for a myriad of previously proposed scheduling algorithms. Most of these algorithms are designed for homogeneous computing systems. This paper presents a novel scheduling algorithm for heterogeneous computing systems. The proposed method is known as the Productive Duplication-based Heterogeneous Earliest-Finish-Time (PDHEFT) algorithm. The PDHEFT algorithm is based on a recently proposed list-scheduling heuristic known as the Heterogeneous Earliest-Finish-Time (HEFT) algorithm which is proven to perform well with a low time complexity. However, the major performance gain of the PDHEFT algorithm is achieved through its distinctive duplication policy. The duplication policy is unique in that it takes into account the communication to computation ratio (CCR) of each task and the potential load of processors. The PDHEFT algorithm performs very competitively in terms of both resulting schedules and time complexity. In evaluating the PDHEFT algorithm a comparison is made with another two algorithms that have performed relatively well, namely, the HEFT and LDBS algorithms. It is shown that the proposed algorithm outperforms both of them with a low time complexity.