TL;DR: The new penalized K-means algorithm is proposed, and results show that this method is better than K-Means algorithm in some perspectives.
Abstract: K-means algorithm is a popular method in cluster analysis. After reviewing different K-means algorithms, we propose the new penalized K-means algorithm. Originally inspired by the maximum likelihood (ML) method, a prior probability distribution assumed by classic K-means algorithm about the clustering data set was discovered, and then the new objective function for the penalized K-means algorithm was introduced. By minimizing this function with genetic algorithm, results show that this method is better than K-means algorithm in some perspectives.
TL;DR: A new algorithm is presented which provides the simple greedy method with a recent path heuristic, and, though some other methods have a better theoretical performance, it ranks among the best algorithms.
Abstract: We present a systematic study of approximation algorithms for the maximum weight matching problem. This includes a new algorithm which provides the simple greedy method with a recent path heuristic. Surprisingly, this quite simple algorithm performs very well, both in terms of running time and solution quality, and, though some other methods have a better theoretical performance, it ranks among the best algorithms.
TL;DR: This paper develops a fully dynamic distributed algorithm for maintaining sparse spanners that improves drastically the quiescence time and improves significantly upon the state-of-the-art algorithm in all efficiency parameters.
Abstract: Currently, there are no known explicit algorithms for the great majority of graph problems in the dynamic distributed message-passing model. Instead, most state-of-the-art dynamic distributed algorithms are constructed by composing a static algorithm for the problem at hand with a simulation technique that converts static algorithms to dynamic ones. We argue that this powerful methodology does not provide satisfactory solutions for many important dynamic distributed problems, and this necessitates developing algorithms for these problems from scratch.In this paper we develop a fully dynamic distributed algorithm for maintaining sparse spanners. Our algorithm improves drastically the quiescence time of the state-of-the-art algorithm for the problem. Moreover, we show that the quiescence time of our algorithm is optimal up to a small constant factor. In addition, our algorithm improves significantly upon the state-of-the-art algorithm in all efficiency parameters, specifically, it has smaller quiescence message and space complexities, and smaller local processing time. Finally, our algorithm is self-contained and fairly simple, and is, consequently, amenable to implementation on unsophisticated network devices.
TL;DR: A new algorithm for proving the validity or invalidity of a pre/postcondition pair for a program by iteratively randomly selecting a program point and updating the current abstract state representation to make it more locally consistent.
Abstract: In this paper, we propose a new algorithm for proving the validity or invalidity of a pre/postcondition pair for a program. The algorithm is motivated by the success of the algorithms for probabilistic inference developed in the machine learning community for reasoning in graphical models. The validity or invalidity proof consists of providing an invariant at each program point that can be locally verified. The algorithm works by iteratively randomly selecting a program point and updating the current abstract state representation to make it more locally consistent (with respect to the abstractions at the neighboring points). We show that this simple algorithm has some interesting aspects: (a) It brings together the complementary powers of forward and backward analyses; (b) The algorithm has the ability to recover itself from excessive under-approximation or over-approximation that it may make. (Because the algorithm does not distinguish between the forward and backward information, the information could get both under-approximated and over-approximated at any step.) (c) The randomness in the algorithm ensures that the correct choice of updates is eventually made as there is no single deterministic strategy that would provably work for any interesting class of programs. In our experiments we use this algorithm to produce the proof of correctness of a small (but non-trivial) example. In addition, we empirically illustrate several important properties of the algorithm.
TL;DR: An incremental learning algorithm designed to learn in challenging non-stationary environments, where the underlying data distribution that governs the classification problem changes at an unknown rate, is described and shown to be able to track the changing environment.
Abstract: We describe an incremental learning algorithm designed to learn in challenging non-stationary environments, where the underlying data distribution that governs the classification problem changes at an unknown rate. The algorithm is based on a multiple classifier system that generates a new classifier every time a new dataset becomes available from the changing environment. We consider the particularly challenging form of this problem, where we assume that the previously generated data points are no longer available, even if some of those points may still be relevant in the new environment. The algorithm employs a strategic weighting mechanism to determine the error of each classifier on the current data distribution, and then combines the classifiers using a dynamically weighted majority voting. We describe the implementation details of algorithm, and track its performance as a function of the environment's rate of change. We show that the algorithm is able to track the changing environment, even when the environment changes drastically over a short period of time.
TL;DR: The K-centers clustering algorithm is proposed to handle mixed type data, focusing on effects of attribute values with different frequencies on clustering accuracy, and a new update method for centroids is proposed in this paper.
Abstract: The K-modes and K-prototypes algorithms both apply the frequency-based update method for centroids, regarding attribute values with the highest frequency but neglecting other attribute values, which affects the accuracy of clustering results. To solve this problem, the K-centers clustering algorithm is proposed to handle mixed type data. As the extension to the K-prototypes algorithms, hard and fuzzy K-centers algorithm, focusing on effects of attribute values with different frequencies on clustering accuracy, a new update method for centroids is proposed in this paper. Experiments on many UCI machine-learning databases show that the K-centers algorithm can cluster categorical and mixed-type data more efficiently and effectively than the K-modes and K-prototypes algorithms.
TL;DR: An algorithm that enumerates all strings that produce a score higher than a given score threshold when aligned against a weighted pattern and then searches for all these strings using a standard exact multipattern algorithm is presented.
Abstract: We consider the matching of weighted patterns against an unweighted text. We adapt the shift-add algorithm for this problem. We also present an algorithm that enumerates all strings that produce a score higher than a given score threshold when aligned against a weighted pattern and then searches for all these strings using a standard exact multipattern algorithm. We show that both of these approaches are faster than previous algorithms on patterns of moderate length and high significance levels while the good performance of the shift-add algorithm continues with lower significance levels.
TL;DR: Experimental result shows that GBAN algorithm performs better than TAN algorithm and has a better accuracy when the relationship between attributes of a data set is relatively complicated.
Abstract: The paper addresses the problem of classification. A restricted BAN classifier learning algorithm - GBAN based on genetic algorithm is proposed. Genetic algorithm is used in this new algorithm to study the network structure, this can reduce complexity of calculation substantially. Meanwhile, the network structure of TAN classifier is extended by restricting the complexity of the structure of BAN classifier., and then a restricted BAN classifier is obtained. To learn the structure of this kind classifier, fitness function based on logarithm likelihood and the corresponding genetic operator are designed, network structure code scheme is also designed. As a result, this algorithm can converges on the overall optimal structure. Experimental result shows that GBAN algorithm performs better than TAN algorithm and has a better accuracy when the relationship between attributes of a data set is relatively complicated.
TL;DR: A novel Bayesian network learning algorithm MRMRG, Max Relevance-Min Redundancy Greedy, is proposed which has much better efficiency and accuracy than most of existing algorithms on limited datasets.
Abstract: Existing algorithms for learning Bayesian network require a lot of computation on high dimensional itemsets which affects accuracy especially on limited datasets and takes up a large amount of time. To address the above problem, we propose a novel Bayesian network learning algorithm MRMRG, Max Relevance-Min Redundancy Greedy. MRMRG algorithm is a variant of K2 which is a well- known BN learning algorithm. We also analyze the time complexity of MRMRG. The experimental results show that MRMRG algorithm has much better efficiency and accuracy than most of existing algorithms on limited datasets.
TL;DR: A kind of improvement BP neural network algorithm begins from the training algorithm, through the variation tendency of the error, adjust right value by enhancing the network speed of convergence dynamically and has experimentally verified the validity of the algorithm.
Abstract: The BP algorithm is at present applies the most widespread neural network study algorithm, but primitive algorithm speed of convergence slow, the training process is easy to fall into the partial minimum as well as the choice of the hidden level pitch point is difficult. In view of these questions many improved measures proposed, this paper proposed a kind of improvement BP neural network algorithm, in the BP algorithm foundation, begins from the training algorithm, through the variation tendency of the error, adjust right value by enhancing the network speed of convergence dynamically;through mathematical reasoning, has theoretically verified the validity of the algorithm. Carries on the simulation with the MATLAB software to this improvement algorithm, and carries on the comparison with the other methods, finally indicated that the improved algorithm has the very good effect in the aspects of the speed of convergence and noise control, has experimentally verified the validity of the algorithm.
TL;DR: A novel algorithm, CarpeDiem, is presented that significantly improves on the time complexity of Viterbi algorithm, preserving the optimality of the result.
Abstract: In this paper we present a novel algorithm, CarpeDiem. It significantly improves on the time complexity of Viterbi algorithm, preserving the optimality of the result. This fact has consequences on Machine Learning systems that use Viterbi algorithm during learning or classification. We show how the algorithm applies to the Supervised Sequential Learning task and, in particular, to the HMPerceptron algorithm. We illustrate CarpeDiem in full details, and provide experimental results that support the proposed approach.
TL;DR: A new Bayesian network learning algorithm MRMRG, Max Relevance-Min Redundancy Greedy, is proposed which has much better efficiency and better accuracy than most of existing learning algorithms for limited sample datasets.
Abstract: Existing algorithms for learning Bayesian network require a lot of computation on high dimensional itemsets which affects reliability, robustness and accuracy of these algorithms and takes up a large amount of time. To address the above problem, we propose a new Bayesian network learning algorithm MRMRG, Max Relevance-Min Redundancy Greedy. MRMRG algorithm is a variant of K2 which is a well-known BN learning algorithm. We also analyze the time complexity of MRMRG. The experimental results show that MRMRG algorithm has much better efficiency. It is also shown that MRMRG algorithm has better accuracy than most of existing learning algorithms for limited sample datasets.
TL;DR: WRS achieves the better results than classical rough set in class imbalance learning, and the evaluation of extracted rules has greater influence than the selection of attributes on weighted rough set learning.
Abstract: The class imbalance problem has been said recently to hinder the performance of learning systems. Most of traditional learning algorithms are designed with the assumption of well-balanced datasets, and are biased towards the majority class and thus may predict poorly the minority class examples. In this paper, we develop weighted rough sets (WRS) to deal with this problem. In weighted rough sets, weighted entropy is introduced and extended to compute the information content introduced by attributes. A forward greedy weighted attribute reduction algorithm based on the weighted entropy and a weighted rule extraction algorithm are provided. The factors of weighted strength, weighted certainty and weighted cover are employed to evaluate the extracted rules. Finally, a decision algorithm based on the weighted strength factor is constructed. Based on weighted rough sets, a series of experiments on class imbalance learning are conducted on 20 UCI data sets. In the meaning of AUC and minority class accuracy, WRS achieves the better results than classical rough set in class imbalance learning. Moreover, the evaluation of extracted rules has greater influence than the selection of attributes on weighted rough set learning.
TL;DR: An improved BP neural network training algorithm based on genetic algorithm was proposed in this paper, and the convergence speed and precision are better than that of standard BP algorithm and BP algorithm with momentum.
Abstract: In order to overcome the disadvantage of neural networks that their structure and parameters were decided stochastically or by one’s experience,an improved BP neural network training algorithm based on genetic algorithm was proposed in this paperIn the algorithm,the global property of genetic algorithm and the parallelism of neural network were combinedFirstly,Genetic algorithm was used to evolve and design the structure,the initial weights and thresholds,the training ratio and momentum factor of neural network,and a better searching space was found out in the solution spaceThen,training samples were used to search for the optimal solution again by the evolved neural networkThe availability of this algorithm was proved by solving the XOR problem with it,the convergence speed and precision of this algorithm are better than that of standard BP algorithm and BP algorithm with momentum
TL;DR: A new learning algorithm with boosting that attempts to boost correctly learned data by learning incorrectly learned data repeatedly and is shown to show the effectiveness of the proposed algorithm.
Abstract: There have been proposed many learning algorithms for fuzzy reasoning models based on the steepest descend method. However, any learning algorithm known as a superior one does not always work well. This paper proposes a new learning algorithm with boosting. Boosting is a general method which attempts to boost the accuracy of any given learning algorithm. The proposed method consists of three sub-learners. The first sub-learner is constructed by performing the conventional learning algorithm with randomly selected data from given data space. The second sub-learner is constructed by performing the conventional learning algorithm with the data selected with equal probability from correctly and incorrectly learned data in the first learning. The third sub-learner is constructed with the data for which either the first or the second sub-learner is incorrectly learned. The output for any input data is given as decision by majority among the outputs of three sub-learners. That is, the method attempts to boost correctly learned data by learning incorrectly learned data repeatedly. In order to show the effectiveness of the proposed algorithm, numerical simulations are performed.
TL;DR: The concept of knowledge states is introduced; many well-known algorithms can be viewed as knowledge state algorithms and the knowledge state approach can be used to to construct competitive randomized online algorithms and study the tradeoff between competitiveness and memory.
Abstract: We introduce the concept of knowledge states; many well-known algorithms can be viewed as knowledge state algorithms. The knowledge state approach can be used to to construct competitive randomized online algorithms and study the tradeoff between competitiveness and memory. A knowledge state simply states conditional obligations of an adversary, by fixing a work function, and gives a distribution for the algorithm. When a knowledge state algorithm receives a request, it then calculates one or more "subsequent" knowledge states, together with a probability of transition to each. The algorithm then uses randomization to select one of those subsequents to be the new knowledge state. We apply the method to the paging problem. We present optimally competitive algorithm for paging for the cases where the cache sizes are k=2 and k=3. These algorithms use only a very limited number of bookmarks.
TL;DR: The new SQKF algorithm is intended to solve two problems in multi-robot Q-learning: Credit assignment and Behavior conflicts and empirical results show that the algorithm has better performance than the conventional single-agent Q- Learning algorithm or the Team Q- learning algorithm in the multi- robot domain.
TL;DR: The weighted association rule algorithm is put forward based on Fp-tree and using Web log file, making use of the Web page frequency visited by users as its weight, the algorithm is implemented in the personalization recommendation.
Abstract: Conventional association rule mining does not consider the importance of each item,so in fact applying it lacks some pertinency.Based on the weighted support of New-Apriori algorithm and combining the Fp-growth algorithm ideas,the weighted association rule algorithm is put forward based on Fp-tree.And the general process of personalization recommendation of the association rule is given.By using Web log file,making use of the Web page frequency which is visited by users as its weight,the algorithm is implemented in the personalization recommendation.The experimental results also show the algorithm has high veracity and efficiency.
TL;DR: A new fuzzy clustering algorithm is applied to a prediction tool of a third generation (3G) cellular radio network and it is shown that the differences observed between simulations and measurements can be considerably diminished and the generalization capacity is enhanced.
Abstract: We have used measurements taken on real network to enhance the performance of our radio network planning tool. A distribution learning technique is adopted to realize this challenged task. To ensure better generalization capabilities of the learning algorithm, a preprocessing of data is required and involves the use of a clustering algorithm that divides the whole learning space into subspaces. In this paper we apply a new fuzzy clustering algorithm to a prediction tool of a third generation (3G) cellular radio network. Results show that the differences observed between simulations and measurements can be considerably diminished and the generalization capacity is enhanced thanks to the proposed clustering algorithm. This algorithm performs well than classical c-means algorithm. We can then predict with enhanced accuracy new configuration for which we don't have measurements, as long they are not very different from learned configurations.
TL;DR: The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically, by using a generalized average operator instead of the general optimal operator max to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view.
Abstract: A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.
TL;DR: An efficient BN learning algorithm, which use the combination of EMI method and a scoring function based on mutual information theory, which is much more efficient than two EM based algorithms, SEM and EM-EA.
Abstract: At present, most of the algorithms for learning Bayesian Networks (BNs) use EM algorithm to deal with incomplete data. They are of low efficiency because EM algorithm has to perform iterative process of probability reasoning to complete the incomplete data. In this paper we present an efficient BN learning algorithm, which use the combination of EMI method and a scoring function based on mutual information theory. The algorithm first uses EMI method to estimate, from incomplete data, probability distributions over local structures of BNs, then evaluates BN structures with the scoring function and searches for the best one. The detailed procedure of the algorithm is depicted in the paper. The experimental results on Asia and Alarm networks show that when achieving high accuracy, the algorithm is much more efficient than two EM based algorithms, SEM and EM-EA algorithms.
TL;DR: A learning algorithm that combines linearleast-squares with gradient descent is presented and its performance is illustrated by its application to several examples in which it is compared with other learning algorithms and well known data sets.
Abstract: Ever since the first gradient-based algorithm, the brilliant backpropagation proposed by Rumelhart, a variety of new training algorithms have emerged to improve different aspects of the learning process for feed-forward neural networks. One of these aspects is the learning speed. In this paper, we present a learning algorithm that combines linearleast-squares with gradient descent. The theoretical basis for the method is given and its performance is illustrated by its application to several examples in which it is compared with other learning algorithms and well known data sets. Results show the proposed algorithm improves the learning speed of the basic backpropagation algorithm in several orders of magnitude, while maintaining good optimization accuracy. Its performance and low computational cost makes it an interesting alternative even for second order methods, specially when dealing large networks and training sets.
TL;DR: In this paper, the adaptive/cooperative analysis (ACA) framework is used to analyze on-line algorithms for paging and list update problems, and it is shown that the ability of the adaptive algorithm to ignore pathological worst cases can lead to more efficient in practice.
Abstract: On-line algorithms are usually analyzed using competitive analysis, in which the performance
of on-line algorithm on a sequence is normalized by the performance of the optimal on-line
algorithm on that sequence. In this paper we introduce adaptive/cooperative analysis as an
alternative general framework for the analysis of on-line algorithms. This model gives promising
results when applied to two well known on-line problems, paging and list update. The idea is
to normalize the performance of an on-line algorithm by a measure other than the performance
of the on-line optimal algorithm OPT. We show that in many instances the perform of OPT
on a sequence is a coarse approximation of the difficulty or complexity of a given input. Using
a finer, more natural measure we can separate paging and list update algorithms which were
otherwise undistinguishable under the classical model. This createas a performance hierarchy of
algorithms which better reflects the intuitive relative strengths between them. Lastly, we show
that, surprisingly, certain randomized algorithms which are superior to MTF in the classical
model are not so in the adaptive case. This confirms that the ability of the on-line adaptive
algorithm to ignore pathological worst cases can lead to algorithms that are more efficient in
practice.
TL;DR: It is shown that learning updating in the form of matrix transformation permits finite-scale learning, and at the same time, maintains the orthonormal property of the separation matrix, opposite to the case with gradient-based algorithms.
Abstract: This paper presents a new type of algorithm for solving independent compo- nent analysis (ICA) problems. Instead of being based on additive updating, which is used in conventional algorithms, this new algorithm is based on an effective updating scheme in which learning updating acts as a series of orthonormal matrix transformations (i.e., power iteration). The criterion for the independence between outputs is based on diago- nality of a non-linearized covariance matrix, which is defined by ICA outputs and their non-linear mappings, and the Bussgang property. One attractive feature of the algorithm is that it does not include any predetermined parameters, such as a learning step size, as do gradient-based algorithms, which is especially expected for ICA applications with unknown types of sources (but with the condition that at most one source is Gaussian distributed). Another feature is that the convergence rate is faster, even for very short observations. If the same ICA criteria are applied to the proposed and gradient-based algorithms, the relationship between these algorithms is quite similar to the relationship between the least-mean-square (LMS) algorithm and the recursive least-square (RLS) al- gorithm in the batch mode for supervised adaptive filtering. In this paper, we also analyze the algorithm mathematically to determine why and how the algorithm works. We show that learning updating in the form of matrix transformation permits finite-scale learning, and at the same time, maintains the orthonormal property of the separation matrix. This is essentially different from the case with gradient-based algorithms, which permits only a small-scale learning that is controlled by the learning step size. We also analyze the relationship between the new algorithm with other well-known algorithms, such as the Bussgang algorithm, the non-linear principal component analysis (PCA), and the Fas- tICA.
TL;DR: The analysis of theory and numerical simulation illustrate that the TS-PageRank algorithm can avoid the problem of topic-drift and improve the quanlity of web search effectively without adding any other extra text information or increasing the degree of time and space complexity.
Abstract: The PageRank algorithm is a key algorithm used in famous search engine Google,but there exists a bad problem of topic-drift,which results in too many web pages without any correlation with the user's search topic in the list of web pages searched by the algorithm. After analysing the PageRank algorithm and its modified algorithm,a similarity model based on virtual file vector and similar degree of cosine,and put forward a TS-PageRank algorithm frame.We can get many different TS-PageRank algorithms and form a set of TS-PageRank algorithm,if we use different similarity model in the frame. The analysis of theory and numerical simulation illustrate that the TS-PageRank algorithm can avoid the problem of topic-drift and improve the quanlity of web search effectively without adding any other extra text information or increasing the degree of time and space complexity.
TL;DR: A general framework for active algorithm selection is presented by extending the idea of the Hedge algorithm by incorporating the correlation information among unlabeled examples to accurately estimate the change in the weighted loss function, and Maximum Entropy Discrimination to automatically determine the combination weights used by the hedge algorithm.
Abstract: Most previous studies on active learning focused on the problem of model selection, i.e., how to identify the optimal classification model from a family of predefined models using a small, carefully selected training set. In this paper, we address the problem of active algorithm selection. The goal of this problem is to efficiently identify the optimal learning algorithm for a given dataset from a set of algorithms using a small training set. In this study, we present a general framework for active algorithm selection by extending the idea of the Hedge algorithm. It employs the worst case analysis to identify the example that can effectively increase the weighted loss function defined in the Hedge algorithm. We further extend the framework by incorporating the correlation information among unlabeled examples to accurately estimate the change in the weighted loss function, and Maximum Entropy Discrimination to automatically determine the combination weights used by the Hedge algorithm. Our empirical study with the datasets of WCCI 2006 performance prediction challenge shows promising performance of the proposed framework for active algorithm selection.
TL;DR: This paper points out a kind of weighted fuzzy reasoning algorithm based on the knowledge presentation weighted fuzzy Petri net which can describe fuzzy production rules which has advantages compared with the current results obtained by the existed algorithm.
Abstract: This paper points out a kind of weighted fuzzy reasoning algorithm.The algorithm is based on the knowledge presentation weighted fuzzy Petri net which can describe fuzzy production rules.The algorithm is suit for a kind of rule-based systems,that means it can deal with the weighted fuzzy Petri net model which results from the original system.After giving the model of the weighted fuzzy Petri net,one can calculate the fuzzy tokens of the appointed places which correspond to the true values of the relevant propositions by using this algorithm.Through the practical application,the algorithm has advantages compared with the current results obtained by the existed algorithm.
TL;DR: The generalisation of meta-learning to other domains including forecasting, optimisation, bioinformatics, etc will be explored, helping in the design of better algorithms, as well as automated algorithm selection methods.
Abstract: Summary form only given: The goal of meta-learning is to model the relationships between the performance of various learning algorithms and the characteristics of problems being learned. In this sense, we are focused on learning about learning. Under what conditions can we expect a certain algorithm to perform well? The field of meta-learning has been very well developed in the machine learning community over the last 15 years or so, where the focus has been on the study of supervised learning methods such as support vector machines and neural networks, and their performance on classification problems. But the goal of seeking a greater understanding of the relationship between problem characteristics and algorithm performance is not limited to machine learning or classification problems. In this talk we will explore the generalisation of meta-learning to other domains including forecasting, optimisation, bioinformatics, etc. The common factor in these diverse fields is the availability of a large number of algorithms for solving the problems, the availability of large benchmark datasets, and the existence of suitable metrics to characterise the properties of the datasets. In each case, great insights into the conditions under which various algorithms performs best can be derived using a meta-learning framework, helping in the design of better algorithms, as well as automated algorithm selection methods.
TL;DR: This work is using machine learning to construct a failure-susceptibility ranking of feeders that supply electricity to the boroughs of New York City, and is able to adapt to a changing environment by periodically building and adding new machine learning models (or "experts") based on the latest data.
Abstract: We are using machine learning to construct a failure-susceptibility ranking of feeders that supply electricity to the boroughs of New York City. The electricity system is inherently dynamic and driven by environmental conditions and other unpredictable factors, and thus the ability to cope with concept drift in real time is central to our solution. Our approach builds on the ensemble-based notion of learning from expert advice as formulated in the continuous version of the Weighted Majority algorithm (16). Our method is able to adapt to a changing environment by periodically building and adding new machine learning models (or "experts") based on the latest data, and letting the online learning framework choose what experts to use as predictors based on recent performance. Our system is cur- rently deployed and being tested by New York City's electricity distribution company.
TL;DR: A novel algorithm for finding the k-nearest patterns, denominated k-tuned approximating and eliminating search algorithm (kTAESA) is proposed, which is used to implement kNN classifiers, which are applied to three databases from the UCI machine learning benchmark repository.
Abstract: The computational cost associated to the k-nearest neighbor classifier depends on the amount of available patterns, which makes this method impractical in many real-time applications. This fact makes interesting the study of fast algorithms for finding the k-nearest patterns, like, for example, the kLAESA algorithm. In this paper we propose a novel algorithm for finding the k-nearest patterns, denominated k-tuned approximating and eliminating search algorithm (kTAESA). The algorithm is used to implement kNN classifiers, which are applied to three databases from the UCI machine learning benchmark repository. Results are compared with those achieved by the exhaustive search, the kAESA and the kLAESA algorithms, in terms of number of distances to evaluate, number of simple operations (sums, comparisons and products) needed to classify each pattern, and amount of required memory. Results demonstrate the best performance of the proposal, mainly when the number of operations is considered.