TL;DR: This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.
Abstract: The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.
TL;DR: It is shown that linear regression and alternating decision trees have a very high probability of achieving better performance than always selecting the single best algorithm.
Abstract: Machine learning is an established method of selecting algorithms to solve hard search problems. Despite this, to date no systematic comparison and evaluation of the different techniques has been performed and the performance of existing systems has not been critically compared with other approaches. We compare the performance of a large number of different machine learning techniques from different machine learning methodologies on five data sets of hard algorithm selection problems from the literature. In addition to well-established approaches, for the first time we also apply statistical relational learning to this problem. We demonstrate that there is significant scope for improvement both compared with existing systems and in general. To guide practitioners, we close by giving clear recommendations as to which machine learning techniques are likely to achieve good performance in the context of algorithm selection problems. In particular, we show that linear regression and alternating decision trees have a very high probability of achieving better performance than always selecting the single best algorithm.
TL;DR: The KNN method is shown to be invariant to parameter K in KNN algorithm and in two simulated examples outperforms other neuro-fuzzy approaches in both performance and network compactness.
Abstract: Three new learning algorithms for Takagi-Sugeno-Kang fuzzy system based on training error and genetic algorithm are proposed. The first two algorithms are consisted of two phases. In the first phase, the initial structure of neuro-fuzzy network is created by estimating the optimum points of training data in input-output space using KNN (for the first algorithm) and Mean-Shift methods (for the second algorithm) and keeps adding new neurons based on an error-based algorithm. Then in the second phase, redundant neurons are recognized and removed using a genetic algorithm. The third algorithm then builds the network in one phase using a modified version of error algorithm used in the first two methods. The KNN method is shown to be invariant to parameter K in KNN algorithm and in two simulated examples outperforms other neuro-fuzzy approaches in both performance and network compactness.
TL;DR: This work proposes a hyper-heuristic evolutionary algorithm for automatically generating decision-tree induction algorithms, named HEAD-DT, and shows that it can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure.
Abstract: Decision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing decision trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes a hyper-heuristic evolutionary algorithm for automatically generating decision-tree induction algorithms, named HEAD-DT. We perform extensive experiments in 20 public data sets to assess the performance of HEAD-DT, and we compare it to traditional decision-tree algorithms such as C4.5 and CART. Results show that HEAD-DT can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure.
TL;DR: This paper uses ADTree classification algorithm, Simple K-means Algorithm & Apriori Association Rule algorithm as different machine learning algorithm to find the best combination of algorithm in recommending the courses to students in E-learning.
Abstract: Data Mining is the extraction of hidden predictive information from large database which can be used in various commercial applications like bioinformatics, Ecommerce etc. Association Rule, classification and clustering are three different algorithms in data mining. Course Recommender System plays an important role in identifying the behavior of students interested in particular set of courses. We collect the data regarding the course enrollment for specific set of data. For collecting this data, we use the learning management system like Moodle. After collecting the data, we apply the different combination of data mining algorithm like classification & association rule algorithm, clustering & association rule algorithm, association rule mining in classified & clustered data, combining clustering & classification algorithm in association rule algorithms or simply the association rule algorithm. Here in this paper we use ADTree classification algorithm, Simple K-means Algorithm & Apriori Association Rule algorithm as different machine learning algorithm. So we propose the five different methods to find the best combination of algorithm in recommending the courses to students in E-learning. We compare the result of this combined approach as well as only the association rule algorithm & present the best combination of algorithm for recommendation of courses in E-learning according to our simulation.
TL;DR: This paper presents a study of the machine learning algorithm and its associated features for the purpose of building a highly accurate meta-recognition system for security and surveillance applications, and achieves levels of accuracy well beyond those of the statistical algorithm, as well as the popular “cohort” model for postrecognition score analysis.
Abstract: In this paper, we consider meta-recognition, an approach for postrecognition score analysis, whereby a prediction of matching accuracy is made from an examination of the tail of the scores produced by a recognition algorithm. This is a general approach that can be applied to any recognition algorithm producing distance or similarity scores. In practice, meta-recognition can be implemented in two different ways: a statistical fitting algorithm based on the extreme value theory, and a machine learning algorithm utilizing features computed from the raw scores. While the statistical algorithm establishes a strong theoretical basis for meta-recognition, the machine learning algorithm is more accurate in its predictions in all of our assessments. In this paper, we present a study of the machine learning algorithm and its associated features for the purpose of building a highly accurate meta-recognition system for security and surveillance applications. Through the use of feature- and decision-level fusion, we achieve levels of accuracy well beyond those of the statistical algorithm, as well as the popular “cohort” model for postrecognition score analysis. In addition, we also explore the theoretical question of why machine learning-based algorithms tend to outperform statistical meta-recognition and provide a partial explanation. We show that our proposed methods are effective for a variety of different recognition applications across security and forensics-oriented computer vision, including biometrics, object recognition, and content-based image retrieval.
TL;DR: A meta-learning approach for periodic algorithm selection when data distribution may change over time is presented, which exploits the knowledge obtained from the induction of models for different data chunks to improve the general predictive performance.
Abstract: When users have to choose a learning algorithm to induce a model for a given dataset, a common practice is to select an algorithm whose bias suits the data distribution. In real-world applications that produce data continuously this distribution may change over time. Thus, a learning algorithm with the adequate bias for a dataset may become unsuitable for new data following a different distribution. In this paper we present a meta-learning approach for periodic algorithm selection when data distribution may change over time. This approach exploits the knowledge obtained from the induction of models for different data chunks to improve the general predictive performance. It periodically applies a meta-classifier to predict the most appropriate learning algorithm for new unlabeled data. Characteristics extracted from past and incoming data, together with the predictive performance from different models, constitute the meta-data, which is used to induce this meta-classifier. Experimental results using data of a travel time prediction problem show its ability to improve the general performance of the learning system. The proposed approach can be applied to other time-changing tasks, since it is domain independent.
TL;DR: This work presents a technique based on the well-established Machine Learning technique of stacking that combines the two approaches into a new hybrid approach and predicts the best algorithm based on predicted run times.
Abstract: Many state of the art Algorithm Selection systems use Machine Learning to either predict the run time or a similar performance measure of each of a set of algorithms and choose the algorithm with the best predicted performance or predict the best algorithm directly. We present a technique based on the well-established Machine Learning technique of stacking that combines the two approaches into a new hybrid approach and predicts the best algorithm based on predicted run times. We demonstrate significant performance improvements of up to a factor of six compared to the previous state of the art. Our approach is widely applicable and does not place any restrictions on the performance measure used, the way to predict it or the Machine Learning used to predict the best algorithm. We investigate different ways of deriving new Machine Learning features from the predicted performance measures and evaluate their effectiveness in increasing performance further. We use five different regression algorithms for performance prediction on five data sets from the literature and present strong empirical evidence that shows the effectiveness of our approach.
TL;DR: The result proves that the proposed User-based Slope One algorithm can improve the recommendation quality of the recommender system and deal with user-related tasks better and is more accurate than the Sl slope One algorithm and the Weighted Slopes One algorithm.
Abstract: In the field of collaborative filtering recommendation, the accuracy requirement of the recommendation algorithm always makes it complex and hard to realize. Slope One algorithm, as a recent proposed algorithm, was not only easy to achieve, but also efficient and effective. However, Slope One algorithm performs not so well when dealing with personalized recommendation tasks which concern the relationship of users because Slope One Scheme and most of its improved algorithms are item-based collaborative filtering algorithms. To solve these problems, we proposed a User-based Slope One algorithm. The algorithm contains three parts. First, we should calculate the similarity of users. Second, we define a new variable to indicate the relationship between items and users. Third, we add this new variable into the weight of Weighted Slope One algorithm and get the final recommendation expression. We carry a lot of experiments with the MovieLens data set, and the result proves that our algorithm performs more accurate than the Slope One algorithm and the Weighted Slope One algorithm. Furthermore, it shows that User-based Slope One algorithm can improve the recommendation quality of the recommender system and deal with user-related tasks better.
TL;DR: The weighted majority vote rule is proposed in which weights are represented by interval-valued fuzzy set (IVFS) and in this approach the weights have a lower and upper membership function.
Abstract: This paper presents the recognition algorithm with random selection of features. In the proposed procedure of classification the choice of weights is one of the main problems. In this paper we propose the weighted majority vote rule in which weights are represented by interval-valued fuzzy set (IVFS). In our approach the weights have a lower and upper membership function. The described algorithm was tested on one data set from UCI repository. The obtained results are compared with the most popular majority vote and the weighted majority vote rule.
TL;DR: A rigorous proof is given that two versions of this learning algorithm correctly learn in the limit, and an empirical performance analysis is presented that concludes that IDS is an efficient algorithm for software engineering applications of automata learning.
Abstract: We present a new algorithm IDS for incremental learning of deterministic finite automata (DFA). This algorithm is based on the concept of distinguishing sequences introduced in (Angluin81). We give a rigorous proof that two versions of this learning algorithm correctly learn in the limit. Finally we present an empirical performance analysis that compares these two algorithms, focussing on learning times and different types of learning queries. We conclude that IDS is an efficient algorithm for software engineering applications of automata learning, such as testing and model inference.
TL;DR: An exact anytime algorithm for the inverse power index problem that runs in exponential time is proposed and can be used to find a weighted voting game that optimizes any exponential time computable function.
Abstract: We study the inverse power index problem for weighted voting games: the problem of finding a weighted voting game in which the power of the players is as close as possible to a certain target distribution. Our goal is to find algorithms that solve this problem exactly. Thereto, we study various subclasses of simple games, and their associated representation methods. We survey algorithms and impossibility results for the synthesis problem, i.e., converting a representation of a simple game into another representation. We contribute to the synthesis problem by showing that it is impossible to compute in polynomial time the list of ceiling coalitions of a game from its list of roof coalitions, and vice versa. Then, we proceed by studying the problem of enumerating the set of weighted voting games. We present first a naive algorithm for this, running in doubly exponential time. Using our knowledge of the synthesis problem, we then improve on this naive algorithm, and we obtain an enumeration algorithm that runs in quadratic exponential time. Moreover, we show that this algorithm runs in output-polynomial time, making it the best possible enumeration algorithm up to a polynomial factor. Finally, we propose an exact anytime algorithm for the inverse power index problem that runs in exponential time. By the genericity of our approach, our algorithm can be used to find a weighted voting game that optimizes any exponential time computable function. We implement our algorithm for the case of the normalized Banzhaf index, and we perform experiments in order to study performance and error convergence.
TL;DR: The result proves that the attack behaviors can be more efficiently found from the network data by the semi-supervised FCM clustering algorithm.
Abstract: The intrusion detection algorithm based on the supervised learning has a high detection rate, but all the labeled data which hard to collect are needed when the algorithm used Meanwhile the intrusion detection algorithm based on the unsupervised learning has a high False Positive Rate In this paper a semi-supervised learning algorithm for intrusion detection is proposed combined with the Fuzzy C-Means algorithm The sensitivity to the initial values and the probability of trapping in local optimum are greatly reduced by using few labeled data to improve the learning ability of the FCM algorithm The KDD CUP99 data set is adopted as the experimental subject The result proves that the attack behaviors can be more efficiently found from the network data by the semi-supervised FCM clustering algorithm
TL;DR: This paper proposes automatic selection of neighboring instances as defined by a dynamic local region unique to each instance, as opposed to the traditional approach of considering the manually specified k nearest neighbors.
Abstract: kNN is a popular lazy-learning algorithm used for a wide variety of machine learning applications. One problem with this algorithm is the choice of k value. Different k values can have a large impact on the predictive accuracy of the algorithm, and picking a good value is generally unintuitive by looking at the data set. Cross-validation over multiple folds is often used to find the best value for k in kNN based on prediction results. In this paper, we propose automatic selection of neighboring instances as defined by a dynamic local region unique to each instance, as opposed to the traditional approach of considering the manually specified k nearest neighbors. Removing the need to select an appropriate k value removes the cross-validation step, which improves the computational performance of the algorithm. Classification accuracy achieved by this approach is only slightly lower than the results of using kNN with an optimally selected k value.
TL;DR: This paper analyzes network recovery algorithms, which allow computer networks to properly function in spite of failures, and demonstrates that although the main components of various check-point/recovery algorithms are recursive algorithms, check- point/reCOvery algorithms, as a whole, are super-recursive second-level algorithms.
Abstract: In this paper, we analyze network recovery algorithms, which allow computer networks to properly function in spite of failures. In this analysis, we use methods and tools of the theory of super-recursive algorithms. The concept of algorithm of the second level is introduced and studied. It is demonstrated that although the main components of various check-point/recovery algorithms are recursive algorithms, check-point/recovery algorithms, as a whole, are super-recursive second-level algorithms. Treating network recovery algorithms as second level algorithms is oriented at developing more powerful algorithms by combining existing ones in a common schema.
TL;DR: This paper aims at designing better performing feature-projection based classification algorithms and presents two new algorithms, which are batch supervised learning algorithms and represent induced classification knowledge as feature intervals.
Abstract: This paper aims at designing better performing feature-projection based classification algorithms and presents two new such algorithms. These algorithms are batch supervised learning algorithms and represent induced classification knowledge as feature intervals. In both algorithms, each feature participates in the classification by giving real-valued votes to classes. The prediction for an unseen example is the class receiving the highest vote. The first algorithm, OFP.MC, learns on each feature pairwise disjoint intervals which minimize feature classification error. The second algorithm, GFP.MC, constructs feature intervals by greedily improving the feature classification error. The new algorithms are empirically evaluated on twenty datasets from the UCI repository and are compared with the existing feature-projection based classification algorithms (FIL.IF, VFI5, CFP, k-NNFP, and NBC). The experiments demonstrate that the OFP.MC algorithm outperforms other feature-projection based classification algorithms. The GFP.MC algorithm is slightly inferior to the OFP.MC algorithm, but, if it is used for datasets with large number of instances, then it reduces the space requirement of the OFP.MC algorithm. The new algorithms are insensitive to boundary noise unlike the other feature-projection based classification algorithms considered here.
TL;DR: A new hybrid learning algorithm, called DTGP, to construct cost-sensitive classifiers that uses a decision tree as its basic classifier and the constructed decision tree will be pruned by a genetic programming algorithm using a fitness function that is sensitive to misclassification costs.
Abstract: In this paper, we introduce a new hybrid learning algorithm, called DTGP, to construct cost-sensitive classifiers. This algorithm uses a decision tree as its basic classifier and the constructed decision tree will be pruned by a genetic programming algorithm using a fitness function that is sensitive to misclassification costs. The proposed learning algorithm has been examined through six cost-sensitive problems. The experimental results show that the proposed learning algorithm outperforms in comparison to some other known learning algorithms like C4.5 or naive Bayesian.
TL;DR: This paper proposes a transfer learning algorithm for document categorization based on clustering and describes the main idea and the step of the algorithm, and uses experiment to test the algorithm and compare the algorithm with no-transfer algorithm.
Abstract: Traditional machine learning and data mining have achieved significant success in many knowledge engineering areas including classification, regression clustering and so on, but a major assumption in them is that the training and test data must be in the same feature space and follow the same distribution. However, in real applications, this assumption couldn't be satisfied for ever. In this case, the role of transfer learning can be highlight, because transfer learning does not make the same distributional assumptions as the traditional machine learning, and reduces the dependencies of the target task and training data, has a wider migration of knowledge. In this paper we will propose a transfer learning algorithm for document categorization based on clustering. We describe the main idea and the step of the algorithm. Then use experiment to test the algorithm and compare the algorithm with no-transfer algorithm. the experiment demonstrate that the algorithm we proposed in this paper is better than the others in some extent.
TL;DR: Example analysis and experimental results show that the algorithm can reduce the weighted frequent item sets formation process of computation, and improve weighted frequentitem sets generation efficiency.
Abstract: This paper presents a new algorithm for mining weighted frequent item sets without generating candidate.A weight set of attributes is normalized to avoid weighted approval rate greater than 1.The new algorithm is testified to satisfy weighted downward closure property.An effectively mining pruning strategy based on weighed Fp-tree is structured.Example analysis and experimental results show that the algorithm can reduce the weighted frequent item sets formation process of computation,and improve weighted frequent item sets generation efficiency.
TL;DR: A new algorithm which is covering algorithm based on competition (CAC), sphere neighborhoods can be adjusted gradually, the ill-suited sphere neighborhoods will be removed and the whole neural network will performs more stable.
TL;DR: This work consists in proposing an algorithm which learns each class characteristics in a sequential way: each new observation will improve object knowledge, particularly well suited to real time applications such as shape recognition or classification.
Abstract: The issue addressed in this paper is the unsupervised learning of observed shapes. More precisely, we are aiming at learning the main features of an object seen in different scenarios. We adapt the statistical framework from [1] to propose a model in which an object is described by independent classes representing its variability. Our work consists in proposing an algorithm which learns each class characteristics in a sequential way: each new observation will improve our object knowledge. This algorithm is particularly well suited to real time applications such as shape recognition or classification, but turns out to be a challenging problem. Indeed, the so-called classic machine learning algorithms in missing data problems such as the Expectation Maximization algorithm (EM) are not designed to learn from sequentially acquired observations. Moreover, the so-called hidden data simulation in a mixture model can not be achieved in a proper way using the classic Markov Chain Monte Carlo (MCMC) algorithms, such as the Gibbs sampler. Our proposal, among other, takes advantage from the contribution of Cappe and Moulines [2] for a sequential adaptation of the EM algorithm and from the work of Carlin and Chib [3] for the hidden data posterior distribution simulation.
TL;DR: A new algorithm to delivers more effectiveness for the encrypted peer-to-peer flow detection algorithm and has been prototyped, prototyped and evaluated on a test-bed.
Abstract: Peer-to-peer flow detection algorithm has been studied for several years. Port-based classification, regular expression, graphlet and various machine learning based algorithms have been proposed as solutions. Unfortunately, all previous algorithms have been failed in various aspects especially for the encrypted peer-to-peer traffic. In this paper, we present a new algorithm to delivers more effectiveness. We have also prototyped our algorithm and evaluate on a test-bed. The performance evaluation has demonstrated the better effectiveness of our algorithm in comparison to the previous ones.
TL;DR: A discretization algorithm based on attribute importance and incompatible degrees is proposed and implemented and it proves the correctness and validity of the algorithm.
Abstract: This paper analyzes the advantages and disadvantages of equipment, combat simulation data association rules commonly used in the analysis of discrete algorithms, and it proposed and implemented a discretization algorithm based on attribute importance and incompatible degrees, through theoretical research and analysis of algorithm, and experimental comparison, it proves the correctness and validity of the algorithm.
TL;DR: The Integral Chi2 Algorithm based on the Chi2 algorithm is proposed, which outperforms existing algorithms in overall and meanings of probability and statistics are associated with criterion of interval merging.
Abstract: The ChiMerg algorithm and its extensions have been shown to be efficient and effective for discretization of continuous attributes. However, all these algorithms have a vital drawback, i.e., the sense of probability is not fully carried out in two merged intervals. To overcome this drawback, this paper proposes the Integral Chi2 Algorithm based on the Chi2 algorithm. In the proposed algorithm, the meanings of probability and statistics are associated with criterion of interval merging. Extensive experiments are conducted to evaluate the performance of the proposed algorithm by comparing with existing algorithms. The experimental results show that our algorithm outperforms existing algorithms in overall.
TL;DR: Experiments show that the improved BP algorithm compared with the standard BP algorithm improves the network accuracy, and accelerates the training speed and becomes a better parameter estimation method.
Abstract: Focusing on the deficiencies of the existing IRT parameter estimation algorithm, the Resilient Back propagation algorithm and variable learning rate learning algorithm are used in the basis of artificial neural network algorithm to improve the network convergence speed, and the genetic algorithm is used to solve the local minima problem, then the improved BP algorithm is generated Finally, the standard BP algorithm and the improved BP algorithm are realized through MATLAB Experiments show that the improved BP algorithm compared with the standard BP algorithm improves the network accuracy, and accelerates the training speed and becomes a better parameter estimation method
TL;DR: To resolve the learning problem in which the instances are labeled by vectors, with the destination of direction error minimization between the direction represented by prediction vector and the direction representation by actual vector, an ensemble learning algorithm for direction prediction was proposed.
Abstract: To resolve the learning problem in which the instances are labeled by vectors,with the destination of direction error minimization between the direction represented by prediction vector and the direction represented by actual vector,an ensemble learning algorithm for direction prediction was proposedThe methods to construct multiple prediction functions and to combine them to realize the optimized prediction of instance directions were put forwardThis algorithm is very generalWhen the different classes are labeled by the different direction vectors of axes,the proposed algorithm is degenerated to real AdaBoost algorithm for multi-class classification,guaranteeing that the training error of the combination classifier can be reduced while the number of trained classifiers increasesWhen the instances are labeled by the vector composed of the classification costs of all classes,the proposed algorithm is degenerated to an ensemble learning algorithm for cost-sensitive classification which can minimize average classification costThe theoretical analysis and experimental results show that the proposed algorithm is reasonable and effective
TL;DR: The scope for improvements in the VCD algorithm has been identified and the enhancement proposed to eliminate the user inputs for the parameters λ and zp has been implemented and compared based on the outcomes of the two experiments.
Abstract: The scope for improvements in the VCD algorithm [1] has been identified in this paper The enhancement proposed in this paper considers to eliminate the user inputs for the parameters λ and z p The base algorithm and the proposed algorithm have been implemented and compared based on the outcomes of the two experiments The traditional Silhouette coefficient has also been modified as an evaluation parameter Results of both algorithms are compared by independent t-test Although no significant difference has been observed in the performances of the base algorithm and the proposed algorithm in the experiment-1, the user need not do parameter setting for the enhanced algorithm As most clustering algorithms are sensitive to the order of the data input, the results of experiment-2 show that the outcome of the proposed algorithm is not affected by the order of data input More over the proposed algorithm yields cluster outcomes in hierarchical manner
TL;DR: This work proposes the first algorithm that utilizes transfer learning for the label space, presents theoretical verification of the method and demonstrates the effectiveness of the framework with several real-world experiments.
Abstract: Small datasets pose a tremendous challenge in machine learning due to the few available training examples compounded with the relative rarity of certain labels which can potentially impede the development of a representative hypothesis. We define ``Rare Datasets" as ones with low samples/features ratio and a skewed label distribution. Since a generalized training model can not be theoretically guaranteed, a method to leverage similar data is needed. We propose the first algorithm that utilizes transfer learning for the label space, present theoretical verification of our method and demonstrate the effectiveness of our framework with several real-world experiments. In addition, we formally describe what constitutes a "Rare Dataset" and present a detailed characterization of related methods.
TL;DR: W weighted boolean association rules mining algorithm and weighted fuzzy association rulesmining algorithm are presented, which use pruning strategy of Apriori algorithm so that improve the efficiency of frequent itemsets generated.
Abstract: Aiming at the problem that most of weighted association rules mining algorithms have not the anti-monotonicity, this paper presents a weighted support-confidence framework which supports anti-monotonicity. On this basis, weighted boolean association rules mining algorithm and weighted fuzzy association rules mining algorithm are presented, which use pruning strategy of Apriori algorithm so that improve the efficiency of frequent itemsets generated. Experimental results show that both algorithms have good performance.
TL;DR: A flexible, large-scale experimental framework for a metacontroller that supports explorations through algorithm-parameter space and recommend algorithm for a given dataset and shows that its framework offers a friendly way of setting up a machine learning experiment while providing accurate ranking of recommended algorithms based on past behaviors.
Abstract: We are working on the problem of developing a flexible, generic metal earning process that supports algorithm selection based on studying the algorithms' past performance behaviors. State of the art machine learning systems display limitations in that they require a great deal of human supervision to select an effective algorithm with corresponding options for a specific domain. Additionally, very little guidance is available for algorithm-parameter selection and the number of available choices is overwhelming. In this paper, we develop a flexible, large-scale experimental framework for a metacontroller that supports explorations through algorithm-parameter space and recommend algorithm for a given dataset. First, we aim to facilitate an easy to use process to create a search space for algorithm selection by automatically exploring some possible combinations of algorithms and key parameters. Secondly, our goal is to come up with an algorithm recommendation by looking at the past behaviors of related datasets. Our main contribution is the implemented framework itself which is based on the use of a wide variety of strategies to automatically generate a search space and recommend algorithms for a specific dataset. We evaluate our system with 40 major algorithms on 20 datasets from the UCI repository. Each dataset is represented by 25 data characteristics. We generate and run 7510 combinations of algorithm, parameters and datasets. Our experiments show that our framework offers a friendly way of setting up a machine learning experiment while providing accurate ranking of recommended algorithms based on past behaviors. Specifically, 88% of recommended algorithm rankings significantly correlated with the true rankings for a given dataset.