TL;DR: This work uses an evolutionary algorithm to evolve instances that are uniquely easy or hard for each algorithm, thus providing a more direct method for studying the relative strengths and weaknesses of each algorithm.
Abstract: The suitability of an optimisation algorithm selected from within an algorithm portfolio depends upon the features of the particular instance to be solved. Understanding the relative strengths and weaknesses of different algorithms in the portfolio is crucial for effective performance prediction, automated algorithm selection, and to generate knowledge about the ideal conditions for each algorithm to influence better algorithm design. Relying on well-studied benchmark instances, or randomly generated instances, limits our ability to truly challenge each of the algorithms in a portfolio and determine these ideal conditions. Instead we use an evolutionary algorithm to evolve instances that are uniquely easy or hard for each algorithm, thus providing a more direct method for studying the relative strengths and weaknesses of each algorithm. The proposed methodology ensures that the meta-data is sufficient to be able to learn the features of the instances that uniquely characterise the ideal conditions for each algorithm. A case study is presented based on a comprehensive study of the performance of two heuristics on the Travelling Salesman Problem. The results show that prediction of search effort as well as the best performing algorithm for a given instance can be achieved with high accuracy.
TL;DR: It is observed that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, the algorithm PG-AC-TLC is not implementable on larger road networks.
Abstract: We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.
TL;DR: A dynamic factor is incorporated into TrAdaBoost to make it meet its intended design of incorporating the advantages of both AdaBoost and the "Weighted Majority Algorithm", and is applied as a "correction factor" that significantly improves the classification performance.
Abstract: Instance-based transfer learning methods utilize labeled examples from one domain to improve learning performance in another domain via knowledge transfer. Boosting-based transfer learning algorithms are a subset of such methods and have been applied successfully within the transfer learning community. In this paper, we address some of the weaknesses of such algorithms and extend the most popular transfer boosting algorithm, TrAdaBoost. We incorporate a dynamic factor into TrAdaBoost to make it meet its intended design of incorporating the advantages of both AdaBoost and the "Weighted Majority Algorithm". We theoretically and empirically analyze the effect of this important factor on the boosting performance of TrAdaBoost and we apply it as a "correction factor" that significantly improves the classification performance. Our experimental results on several real-world datasets demonstrate the effectiveness of our framework in obtaining better classification results.
TL;DR: In this article, the authors proposed a method for transforming a non-uniform local algorithm into a uniform one, and the resulting algorithm enjoys the same asymptotic running time as the original local algorithm.
Abstract: Numerous sophisticated local algorithm were suggested in the literature for various fundamental problems. Notable examples are the MIS and (Δ+1)-coloring algorithms by Barenboim and Elkin [6], by Kuhn [22], and by Panconesi and Srinivasan [33], as well as the OΔ2-coloring algorithm by Linial [27]. Unfortunately, most known local algorithms (including, in particular, the aforementioned algorithms) are non-uniform, that is, they assume that all nodes know good estimations of one or more global parameters of the network, e.g., the maximum degree Δ or the number of nodes n.This paper provides a rather general method for transforming a non-uniform local algorithm into a uniform one. Furthermore, the resulting algorithm enjoys the same asymptotic running time as the original non-uniform algorithm. Our method applies to a wide family of both deterministic and randomized algorithms. Specifically, it applies to almost all of the state of the art non-uniform algorithms regarding MIS and Maximal Matching, as well as to many results concerning the coloring problem. (In particular, it applies to all aforementioned algorithms.)To obtain our transformations we introduce a new distributed tool called pruning algorithms, which we believe may be of independent interest.
TL;DR: This paper primarily makes some relevant introduction of Adaboost, and conducts an analysis and research of several aspects of the algorithm itself.
Abstract: The AdaBoost algorithm enables weak classifiers to enhance their performance by establishing the set of multiple classifiers, and since it automatically adapts to the error rate of the basic algorithm in training through dynamic regulation of the weight of each sample, a wide range of concern has been aroused. This paper primarily makes some relevant introduction of Adaboost, and conducts an analysis and research of several aspects of the algorithm itself.
TL;DR: This paper proposes two algorithms that can obtain better results within the given time period in the Quality-of-Service (QoS)-aware replica placement problem in a general graph model and uses random heuristic algorithms to generate initial population to avoid enormous useless searching.
TL;DR: Simulation results demonstrate that the 3PSMO outperforms the 2PSMO algorithm significantly in both executing time and computation complexity, which implies that the maximum can be attained more efficiently by 3PS MO algorithm.
TL;DR: A new algorithm for learning polyhedral classifiers which is a Perception like algorithm which updates the parameters only when the current classifier misclassifies any training data is proposed.
Abstract: In this paper we propose a new algorithm for learning polyhedral classifiers which we call as Polyceptron. It is a Perception like algorithm which updates the parameters only when the current classifier misclassifies any training data. We give both batch and online version of Polyceptron algorithm. Finally we give experimental results to show the effectiveness of our approach.
TL;DR: The first Greedy algorithm approaches the question on finding those nodes which are the most sensitive to variations in pressure and are thereby ideal places to monitor the hydraulic state of a water distribution network.
Abstract: Positioning sensors in a water supply network is a NP–hard task. We propose three algorithms – one based on integer linear programming (ILP) and the other two based on the Greedy paradigm. We apply these algorithms to real case networks and com-pare the results of these algorithms with the results of an algorithm based on NSGA II, a genetic algorithm. We come to the conclusion that our algorithms outperform NSGA II in every single case. The algorithm based on linear integer programming may be applied as a competitor to the algorithm implemented in TEVA –SPOT (Ber-ry, 2009), while the first Greedy algorithm may replace the ILP algorithm in large networks due to its faster running time. The second Greedy algorithm approaches the question on finding those nodes which are the most sensitive to variations in pressure and are thereby ideal places to monitor the hydraulic state of a water distribution network. KEYWORDS Graph Theory, Sensor location layout, Greedy Algorithm, Genetic Algorithm, Integ-er Linear Programming, Sensitivity
TL;DR: This paper describes and analyzes a new algorithm for learning linear or kernel predictors with respect to the 0-1 loss function, and proves a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn such classifiers in time polynomial in L.
Abstract: Some of the most successful machine learning algorithms, such as Support Vector Machines, are based on learning linear and kernel predictors with respect to a convex loss function, such as the hinge loss. For classification purposes, a more natural loss function is the 0-1 loss. However, using it leads to a non-convex problem for which there is no known efficient algorithm. In this paper, we describe and analyze a new algorithm for learning linear or kernel predictors with respect to the 0-1 loss function. The algorithm is parameterized by L, which quantifies the effective width around the decision boundary in which the predictor may be uncertain. We show that without any distributional assumptions, and for any fixed L, the algorithm runs in polynomial time, and learns a classifier which is worse than the optimal such classifier by at most e. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn such classifiers in time polynomial in L.
TL;DR: An Adaptive Variable Learning Rate EBP algorithm is proposed to attack the challenging problem of reducing the convergence time in an EBP algorithms, aiming to have a high-speed convergence in comparison with standard E BP algorithm.
Abstract: A critical issue of Neural Network based large-scale data mining algorithms is how to speed up their learning algorithm. This problem is particularly challenging for Error Back-Propagation (EBP) algorithm in Multi-Layered Perceptron (MLP) Neural Networks due to their significant applications in many scientific and engineering problems. In this paper, we propose an Adaptive Variable Learning Rate EBP algorithm to attack the challenging problem of reducing the convergence time in an EBP algorithm, aiming to have a high-speed convergence in comparison with standard EBP algorithm. The idea is inspired from adaptive filtering, which leaded us into two semi-similar methods of calculating the learning rate. Mathematical analysis of AVLR-EBP algorithm confirms its convergence property. The AVLR-EBP algorithm is utilized for data classification applications. Simulation results on many well-known data sets shall demonstrate that this algorithm reaches to a considerable reduction in convergence time in comparison to the standard EBP algorithm. The proposed algorithm, in classifying the IRIS, Wine, Breast Cancer, Semeion and SPECT Heart datasets shows a reduction of the learning epochs relative to the standard EBP algorithm.
TL;DR: It is shown that the authors' suboptimal solutions can be interpreted as the solution of aperturbed optimization problem from the original one and some theoretical analyses of the algorithm are provided based on this novel interpretation.
Abstract: We consider a suboptimal solution path algorithm for the Support Vector Machine. The solution path algorithm is an effective tool for solving a sequence of a parametrized optimization problems in machine learning. The path of the solutions provided by this algorithm are very accurate and they satisfy the optimality conditions more strictly than other SVM optimization algorithms. In many machine learning application, however, this strict optimality is often unnecessary, and it adversely affects the computational efficiency. Our algorithm can generate the path of suboptimal solutions within an arbitrary user-specified tolerance level. It allows us to control the trade-off between the accuracy of the solution and the computational cost. Moreover, We also show that our suboptimal solutions can be interpreted as the solution of a \emph{perturbed optimization problem} from the original one. We provide some theoretical analyses of our algorithm based on this novel interpretation. The experimental results also demonstrate the effectiveness of our algorithm.
TL;DR: The results show that this algorithm can significantly reduce the size and the cost of the test-suite, and achieved higher effectiveness of test-Suite minimization.
Abstract: Ant colony algorithm is a bionic optimization algorithm, it can solve combinatorial problems effectively. For the problem of the test suite reduction, this algorithm could find the balance point between the speed and the accuracy of solution. Unlike other existing algorithms, this algorithm used test cost criteria, as well as the test coverage criteria. Finally, the paper presented the results, the results is given by the others classical algorithms compared with this algorithms. The results show that this algorithm can significantly reduce the size and the cost of the test-suite, and achieved higher effectiveness of test-suite minimization.
TL;DR: The experimental results show that the proposed algorithm is highly efficient computationally and that in terms of diversity, it consistently outperforms the two competitive algorithms and converges to the optimal solutions on cases run with the exhaustive algorithm in under 100 ms.
Abstract: This paper proposes a new approach, and studies an algorithm to address the Maximum Diversity Problem (MDP) of recommendations for composite products or services. First, the proposed approach is based on constructing and using a multi-dimensional diversity feature space, which is separate from the utility space used for utility elicitation. Second, we introduce a randomized algorithm, which is based on iterative relaxation of selections by the Greedy algorithm with an exponential probability distribution. The algorithm produces a competitive solution with respect to finding a diverse set from candidate recommendations. Finally, we conduct an experimental study to compare the efficacy and efficiency of the proposed algorithm with two broadly used diversity algorithms, as well as with the exhaustive algorithm, which we could only compute for sets of up to seven returned recommendations. The experimental results show that the proposed algorithm is highly efficient computationally and that in terms of diversity, it consistently outperforms the two competitive algorithms and converges to the optimal solutions on cases run with the exhaustive algorithm in under 100 ms.
TL;DR: By relaxing the worst case competitive ratio of the online algorithm to 2+e, where e is an arbitrary small constant, the algorithm automatically tunes itself to slackness degree and gives better performance than the optimal 2-competitive algorithm for real world inputs.
Abstract: We consider the classical power management problem: There is a device which has two states ON and OFF and one has to develop a control algorithm for changing between these states as to minimize (energy) cost when given a sequence of service requests. Although an optimal 2-competitive algorithm exists, that algorithm does not have good performance in many practical situations, especially in case the device is not used frequently. To take the frequency of device usage into account, we construct an algorithm based on the concept of "slackness degree." Then by relaxing the worst case competitive ratio of our online algorithm to 2+e, where e is an arbitrary small constant, we make the algorithm flexible to slackness. The algorithm thus automatically tunes itself to slackness degree and gives better performance than the optimal 2-competitive algorithm for real world inputs. In addition to worst case competitive ratio analysis, a queueing model analysis is given and computer simulations are reported, confirming that the performance of the algorithm is high.
TL;DR: In this study, the improved NaiveBayes algorithm was the base classification, and the base classifiers were fused by the AdaBoost algorithm with improved weight for voting to improve the classification performance for minority class in an unbalanced dataset.
Abstract: To improve the classification performance for minority class in an unbalanced dataset,an improved AdaBoost algorithm(UnAdaBoost algorithm) for an unbalanced dataset was proposed.This algorithm could make the base classification better in order to raise the classification efficienly for the minority class,while to a certain extent losing the accuracy for the majority class.This algorithm could also ensemble the base classifications to make up loss of accuracy in majority class.The performance for the minority class could be improved and the accuracy for majority class would not be lost.In this study,the improved NaiveBayes algorithm was the base classification,and the base classifiers were fused by the AdaBoost algorithm with improved weight for voting.Experimental results showed that the UnAdaBoost algorithm was effective for an unbalanced dataset compared with the AdaBoost algorithm.
TL;DR: This paper develops an algorithm that can efficiently and exactly update the weighted SVM solutions for arbitrary change of instance weights and introduces a parametrization which allows us to find the breakpoints in high-dimensional space easily.
Abstract: An instance-weighted variant of the support vector machine (SVM) has attracted considerable attention recently since they are useful in various machine learning tasks such as non-stationary data analysis, heteroscedastic data modeling, transfer learning, learning to rank, and transduction. An important challenge in these scenarios is to overcome the computational bottleneck—instance weights often change dynamically or adaptively, and thus the weighted SVM solutions must be repeatedly computed. In this paper, we develop an algorithm that can efficiently and exactly update the weighted SVM solutions for arbitrary change of instance weights. Technically, this contribution can be regarded as an extension of the conventional solution-path algorithm for a single regularization parameter to multiple instance-weight parameters. However, this extension gives rise to a significant problem that breakpoints (at which the solution path turns) have to be identified in high-dimensional space. To facilitate this, we introduce a parametric representation of instance weights which allows us to find the breakpoints in high-dimensional space easily. Despite its simplicity, our parametrization covers various important machine learning tasks and it widens the applicability of the solution-path algorithm. Through extensive experiments on various practical applications, we demonstrate the usefulness of the proposed algorithm.
TL;DR: This paper proposes a machine learning based approach for the real-time selection of computational resources (algorithms) based on both the high level objectives of the robot as well as on the low level environmental requirements (image quality, etc.).
Abstract: In robotics, it is a common problem that for a given task many algorithms are available. For a particular environmental context and some computational constraints some algorithms will perform better and others will perform worse. Consequently, a robot, evolving in a real world environment where both the context and the constraints change in real time, should be able to select in real time algorithms that will provide it with the most accurate world description as well as will allow it to extract the currently most vital information and artifacts. In this paper we propose a machine learning based approach for the real-time selection of computational resources (algorithms) based on both the high level objectives of the robot as well as on the low level environmental requirements (image quality, etc.). The learning mechanism described is using a Genetic Algorithm and the learning method is based on supervised learning; an initial set of algorithms with input data is provided as examples that are used for learning.
TL;DR: The impact factors of in-degree and out-degree are introduced into community detection, and the directed weighted degree is used to measure the importance of the node to meet the trend of standard entropy better.
Abstract: In this paper, the impact factors of in-degree and out-degree are introduced into community detection, and the directed weighted degree is used to measure the importance of the node. Based on the core nodes, a community detecting algorithm for directed and weighted networks is proposed. Then the community detection on the blog site of Sciencenet is conducted with standard structure entropy as a measure. Experimental results demonstrate that in directed and weighted networks, the proposed algorithm is efficient with shorter execution time. By comparing with the classical algorithm, the detecting results of our algorithm meet the trend of standard entropy better. It means the algorithm proposed is improved to some extent.
TL;DR: Experimental results show that the proposed k-medoid algorithm may reduce the number of distance calculations by a factor of more lhan a thousand limes when compared to existing algorithms while producing clusters of comparable quality.
Abstract: Scalable data mining algorithms have become crucial to efficiently support KDD processes on large datasets. The k-medoid is one of the partitioning algorithms used for the purpose of clustering. We show that basic k-medoid algorithm is very much time consuming for large dataset. Instead we present the advanced algorithm which performs much better than known algorithm. In addition to presenting detailed experimental results for advanced k-medoid algorithm, we also conduct an experimental study with real life data sets to demonstrate the effectiveness of our technique. We address the task of scaling up k-medoids based algorithm through the utilization of memoization technique. Experimental results based on several datasets, including synthetic and real data, show that the proposed algorithm may reduce the number of distance calculations by a factor of more lhan a thousand limes when compared to existing algorithms while producing clusters of comparable quality.
TL;DR: A new algorithm based on incremental learning is introduced that composes new knowledge from new training data with previous knowledge by combining classifiers based on weighted majority voting and outperforms other related incremental algorithms and non-incremental algorithms.
Abstract: The voluminous of the e-mails are spam. Several algorithms are represented for spam detection based on batch learning. In this paper, a new algorithm based on incremental learning is introduced. The algorithm composes new knowledge from new training data with previous knowledge by combining classifiers based on weighted majority voting. The experiment results show that the proposed algorithm outperforms other related incremental algorithms and non-incremental algorithms.
TL;DR: A new approach HAP-G growth (Hub-Averaging Pattern-Growth) has been proposed for WARM without pre-assigned weights and for large datasets, there is drastic reduction in the computational time for the proposed algorithm and at the same time drift effect is reduced to a great extent.
Abstract: The concept of finding frequent itemsets without pre-assigned weights is of great importance in Association Rule Mining (ARM). The prime advantage of this approach is that weights can be derived from the dataset itself rather than being given by domain expert. The modification of Apriori algorithm for Weighted Association Rule Mining (WARM) without pre-assigned weights using HITS algorithm has been attempted in the past. However, drift effect is a major limitation of HITS algorithm. In this paper, a new approach HAP-Growth (Hub-Averaging Pattern-Growth) has been proposed for WARM without pre-assigned weights. HAP-Growth algorithm generates frequent itemsets using Hub-Averaging in conjunction with pattern tree approach. Performance of the proposed algorithm has been compared with HITS algorithm in conjunction with pattern tree approach and the existing algorithm. Experimental studies have been carried out on large number of synthetic datasets of varying sizes (generated using IBM Synthetic Data Generator) and real life datasets (taken from UCI Machine Learning Repository and other sources). It is observed that for large datasets, there is drastic reduction in the computational time for the proposed algorithm and at the same time drift effect is reduced to a great extent.
TL;DR: This paper generally analyzed and branch out algorithms to perceive their limitations and delimitation and concluded that Greedy algorithm is comparatively better than other algorithms regarding the optimal solution.
Abstract: An algorithm solves the complex problems more efficiently and consistently. The traditional ways of solving the problems, have been replaced, by several new algorithms. The selection of an appropriate algorithm for any given chore is an imperative issue because different algorithms are based on the different concepts. One problem can be solved in more than one way; in this regards many alternative algorithms are developed with computational proficiency. This review presents evaluation and utilization of different algorithms such as Simple Recursive Algorithm, Backtracking Algorithm, Divide and Conquer Algorithm, Dynamic Algorithm, Branch and Bound Algorithm, Brute Force Algorithm and Randomized Algorithm. This paper generally analyzed and branch out algorithms to perceive their limitations and delimitation. This review emphasizes the effects and consumption of different algorithms in different image processing applications. Minimum Spanning Tree (MST), the most functional algorithm, described exclusively by the undirected graph in which all nodes are connected. Greedy algorithms expresses as a simple solution algorithm that choose a local optimum solution at each step to achieve a global optimum. We considered the drawbacks and advantage various algorithms and concluded that Greedy algorithm is comparatively better than other algorithms regarding the optimal solution.
TL;DR: A new algorithm named REx-1C is derived from REX-1 algorithm that uses entropy in order to test effects of crossentropy on the learning phenomenon (by using accuracy and rule number) and it was observed that REX -1C algorithm produced better results compared to Rules-3 Plus, Rules-6, REX1 and C5.0 algorithms in respect to accuracy.
Abstract: This study suggests a new method for selecting attributes in algorithms used for generating rules for data mining. The most common measure resorted for selection of attribute is entropy. Entropy is defined as a measure of uncertainty. According to this, the entropy of a system is higher as the uncertainty in the system. Usually the entropy is used to measure uncertainty of C4.5, CN2, CART etc. Attributes in data mining and the cross-entropy is not used frequently. Therefore a new algorithm named REX-1C is derived from REX-1 algorithm that uses entropy in order to test effects of crossentropy on the learning phenomenon (by using accuracy and rule number). Twenty data sets of different specifications and sizes which are commonly used in the machine learning field and sampled from real life were chosen to test the success of said algorithm. Using those data sets, effects of norms on accuracy of the algorithm and number of rules it produces were calculated and results were compared to Rules-3 Plus, Rules-6, REX-1 and C5.0 algorithms. According to the results achieved, it was observed that REX-1C algorithm produced better results compared to Rules-3 Plus, Rules-6, REX-1 and C5.0 algorithms in respect to accuracy.
TL;DR: This research uses the COD (Classifier Output Difference) distance metric for measuring how similar or different learning algorithms are, and constructs a distance matrix from the individual COD values, and uses the matrix to show the spectrum of differences among families of learning algorithms.
Abstract: Many learning algorithms have been developed to solve various problems. Machine learning practitioners must use their knowledge of the merits of the algorithms they know to decide which to use for each task. This process often raises questions such as: (1) If performance is poor after trying certain algorithms, which should be tried next? (2) Are some learning algorithms the same in terms of actual task classification? (3) Which algorithms are most different from each other? (4) How different? (5) Which algorithms should be tried for a particular problem? This research uses the COD (Classifier Output Difference) distance metric for measuring how similar or different learning algorithms are. The COD quantifies the difference in output behavior between pairs of learning algorithms. We construct a distance matrix from the individual COD values, and use the matrix to show the spectrum of differences among families of learning algorithms. Results show that individual algorithms tend to cluster along family and functional lines. Our focus, however, is on the structure of relationships among algorithm families in the space of algorithms, rather than on individual algorithms. A number of visualizations illustrate these results. The uniform numerical representation of COD data lends itself to human visualization techniques.
TL;DR: This paper will show some improvement to K-Means algorithm, including how to choose initial center points, and how to calculate the means, which will improve the algorithm at a certain extent.
Abstract: K-Means algorithm is one of the mostly used foundation algorithm in data mining, it base on a greedy clustering algorithm. This paper will introduce this algorithm and analysis. Then prove the correctness of the algorithm. And then show the productivity of this algorithm. And at last, this paper will show some improvement to K-Means algorithm, including how to choose initial center points, and how to calculate the means. This will improve the algorithm at a certain extent.
TL;DR: The paper is given an ILP model to solve the problem MDBCS, as well as the genetic algorithm, which calculates a good enough solution for the input graph with a greater number of nodes.
Abstract: The problems degree-limited graph of nodes considering the weight of the vertex or weight of the edges, with the aim to find the optimal weighted graph in terms of certain restrictions on the degree of the vertices in the subgraph. This class of combinatorial problems was extensively studied because of the implementation and application in network design, connection of networks and routing algorithms. It is likely that solution of MDBCS problem will find its place and application in these areas. The paper is given an ILP model to solve the problem MDBCS, as well as the genetic algorithm, which calculates a good enough solution for the input graph with a greater number of nodes. An important feature of the heuristic algorithms is that can approximate, but still good enough to solve the problems of exponential complexity. However, it should solve the problem heuristic algorithms may not lead to a satisfactory solution, and that for some of the problems, heuristic algorithms give relatively poor results. This is particularly true of problems for which no exact polynomial algorithm complexity. Also, heuristic algorithms are not the same, because some parts of heuristic algorithms differ depending on the situation and problems in which they are used. These parts are usually the objective function (transformation), and their definition significantly affects the efficiency of the algorithm. By mode of action, genetic algorithms are among the methods directed random search space solutions are looking for a global optimum.
TL;DR: For solving the defects of Bayesian network structure learning algorithm based on the conditional independence test, the paper proposes an improved algorithm that adds the mutual information between node x and y that is effective and feasible.
Abstract: For solving the defects of Bayesian network structure learning algorithm based on the conditional independence test,the paper proposes an improved algorithm that adds the mutual information between node x and y.The algorithm takes into account adequately the three existing graphical structures in the theory of D-separate.The algorithm can reduce the triangulated clique and the probability of the cyclic route in a directed graph.The network structure produced by the algorithm is closer to solution.The experimental results show that the algorithm is effective and feasible.
TL;DR: A reduction algorithm combined SVM with KNN algorithm is presented and the experiment results show that the algorithm can reduce the number of training dataset and support vectors on the condition of keeping the classification accuracy of the original training dataset.
Abstract: Support vector machine is a new field of machine learning. Generalization accuracy and response time are two important criterions of support vector machine used in practical application. It is hoped that it will minimum the number of training dataset and support vectors, simplify the algorithm realization on the condition of keeping classification accuracy. Based on the above consideration, a reduction algorithm combined SVM with KNN algorithm is presented. The experiment results show that the algorithm can reduce the number of training dataset and support vectors on the condition of keeping the classification accuracy of the original training dataset.
TL;DR: The proposed modification, CN2-R, substitutes the star concept of the original algorithm with a technique of randomly generated complexes in order to substantially improve on running times without significant loss in accuracy.
Abstract: Among the rule induction algorithms, the classic CN2 is still one of the most popular ones; a great amount of enhancements and improvements to it is to witness this. Despite the growing computing capacities since the algorithm was proposed, one of the main issues is resource demand. The proposed modification, CN2-R, substitutes the star concept of the original algorithm with a technique of randomly generated complexes in order to substantially improve on running times without significant loss in accuracy.