TL;DR: In this paper, the authors studied on-line learning in the linear regression framework and proposed adaptive tunings for generalized linear regression and Weighted Majority over a finite set of experts.
TL;DR: It is shown that appropriate techniques can empower users to create models that compete with classifiers built by state-of-the-art learning algorithms, and that small expert-defined models offer the additional advantage that they will generally be more intelligible than those generated by automatic techniques.
Abstract: According to standard procedure, building a classifier using machine learning is a fully automated process that follows the preparation of training data by a domain expert. In contrast, interactive machine learning engages users in actually generating the classifier themselves. This offers a natural way of integrating background knowledge into the modelling stage—as long as interactive tools can be designed that support efficient and effective communication. This paper shows that appropriate techniques can empower users to create models that compete with classifiers built by state-of-the-art learning algorithms. It demonstrates that users—even users who are not domain experts—can often construct good classifiers, without any help from a learning algorithm, using a simple two-dimensional visual interface. Experiments on real data demonstrate that, not surprisingly, success hinges on the domain: if a few attributes can support good predictions, users generate accurate classifiers, whereas domains with many high-order attribute interactions favour standard machine learning techniques. We also present an artificial example where domain knowledge allows an “expert user” to create a much more accurate model than automatic learning algorithms. These results indicate that our system has the potential to produce highly accurate classifiers in the hands of a domain expert who has a strong interest in the domain and therefore some insights into how to partition the data. Moreover, small expert-defined models offer the additional advantage that they will generally be more intelligible than those generated by automatic techniques.
TL;DR: In this paper, the authors define the edit distance of two distributions of strings given by two weighted automata and present a synchronization algorithm for weighted transducers which, combined with ǫ-removal, can be used to normalize weighted automaton with bounded delays.
Abstract: The edit-distance of two strings is the minimal cost of a sequence of symbol insertions, deletions, or substitutions transforming one string into the other. The definition is used in various contexts to give a measure of the difference or similarity between two strings. This definition can be extended to measure the similarity between two sets of strings. In particular, when these sets are represented by automata, their edit-distance can be computed using the general algorithm of composition of weighted transducers combined with a single-source shortest-paths algorithm. More generally, in some applications such as speech recognition and computational biology, the strings may represent a range of alternative hypotheses with associated probabilities. Thus, we introduce the definition of the edit-distance of two distributions of strings given by two weighted automata. We show that general weighted automata algorithms over the appropriate semirings can be used to compute the edit-distance of two weighted automata exactly. The algorithm for computing exactly the edit-distance of weighted automata can be used to improve the word accuracy of automatic speech recognition systems. More generally, the algorithm can be extended to provide an edit-distance automaton useful for rescoring and other post-processing purposes in the context of large-vocabulary speech recognition. In the course of the presentation of our algorithm, we also introduce a new and general synchronization algorithm for weighted transducers which, combined with Ɛ-removal, can be used to normalize weighted transducers with bounded delays.
TL;DR: This algorithm is applied to detecting computer-user mouse pressure patterns during episodes likely to be frustrating to the user and achieves on average 10.6% user-independent test error rate.
Abstract: In this paper, we propose a new context-sensitive Bayesian learning algorithm. By modeling the distributions of data locations by a mixture of Gaussians, the new algorithm can utilize different classifier complexities for different contexts/locations and, at the same time, keep the optimality of Bayesian solutions. This algorithm is also an online learning algorithm, efficient in training, and easy for incorporating new knowledge from data sets available in the future. We apply this algorithm to detecting computer-user mouse pressure patterns during episodes likely to be frustrating to the user By modeling user identity as hidden context, this algorithm achieves on average 10.6% user-independent test error rate.
TL;DR: A text classification algorithm is developed that classifies financial news article by using a combination of a reduced but highly informative word feature sets and a variant of weighted majority algorithm, which shows a better performance in terms of accuracy.
Abstract: : In the application domain of stock portfolio management, software agents that evaluate the risks associated with the individual companies of a portfolio should be able to read electronic news articles that are written to give investors an indication of the financial outlook of a company. There is a positive correlation between news reports on a company' financial outlook and the company' attractiveness as an investment. However, because of the volume of such reports, it is impossible for financial analysts or investors to track and read each one. Therefore, it would be very helpful to have a system that automatically classifies news reports that reflect positively or negatively on a company' financial outlook. To accomplish this task, we treat the analysis of news articles as a text classification problem. We developed a text classification algorithm that classifies financial news article by using a combination of a reduced but highly informative word feature sets and a variant of weighted majority algorithm. By clustering words represented in latent semantic vector space by LSA into groups with similar concepts, we are able to find semantically coherent word groups. A learning method with unlabeled data Self-Confident sampling was proposed to handle the problem of expensive data labeling. Vote entropy is the criterion that information-theoretically assigns a label to an unlabeled document. In comparison with naive Bayes classification boosted by Expectation Maximization (EM), the proposed method showed a better performance in terms of accuracy. Two criteria are used to evaluate methods: (1) how well they improve their performances with unlabeled data after being initially trained on a small number of human-labeled articles and (2) how well they classify the latest financial news articles which are mostly not seen during the training.
TL;DR: The Al-Alaoui algorithm, a weighted mean-square error approach to pattern recognition, is extended to multilayer neural networks which may be used as nonlinear classifiers and speeds up the convergence of the back-propagation algorithm.
Abstract: The Al-Alaoui algorithm is a weighted mean-square error (MSE) approach to pattern recognition. It employs cloning of the erroneously classified samples to increase the population of their corresponding classes. The algorithm was originally developed for linear classifiers. In this paper, the algorithm is extended to multilayer neural networks which may be used as nonlinear classifiers. It is also shown that the application of the Al-Alaoui algorithm to multilayer neural networks speeds up the convergence of the back-propagation algorithm.
TL;DR: This article introduced the training algorithm for the newest branch of statistic learning theory, SVM(Support Vector Machine), which can be classified into three categories: the first is the Decomposition Algorithm, whose delegate is SVMlight, the second is sequence algorithm, the third is online training algorithm.
Abstract: This article introduced the training algorithm for the newest branch of statistic learning theory, SVM(Support Vector Machine), which can be classified into three categories: the first is the Decomposition Algorithm, whose delegate is SVMlight , the second is sequence algorithm, the third is online training algorithm. All the three kinds of algorithms' advantages and disadvantages were analysed. And other algorithms and multi class algorithms are introduced too. The future direction and application of SVM in pattern recognition and data mining, and so on were introduced.
TL;DR: A novel approach to incremental support vector machine (SVM) learning algorithm is presented, in which useless sample is discarded and knowledge is accumulated and this algorithm is more effective than traditional SVM while the classification precision is also guaranteed.
Abstract: This paper presents a novel approach to incremental support vector machine (SVM) learning algorithm. It analyses the possible change of support vector set after new samples are added to training set. Based on the analysis result, a novel algorithm is presented. In this algorithm useless sample is discarded and knowledge is accumulated. The experiment result shows that this algorithm is more effective than traditional SVM while the classification precision is also guaranteed.
TL;DR: This study examines, through Monte-Carlo simulations, the relative efficiency of a local search algorithm to 8 stochastic global algorithms to show that even ignoring the computational requirements of the global algorithms, there is little evidence to support the use of theglobal algorithms examined for training neural networks.
Abstract: Training a neural network is a difficult optimization problem because of the nonconvex objective function. Therefore, as an alternative to local search algorithms, many global search algorithms have been used to train neural networks. However, local search algorithms are more efficient with computational resources, and therefore numerous random restarts with a local algorithm may be more effective than a global algorithm at obtaining a low value of the objective function. This study examines, through Monte-Carlo simulations, the relative efficiency of a local search algorithm to 8 stochastic global algorithms: 2 simulated annealing algorithms, 1 simple random stochastic algorithm, 1 genetic algorithm and 4 evolutionary strategy algorithms. The results show that even ignoring the computational requirements of the global algorithms, there is little evidence to support the use of the global algorithms examined for training neural networks.
TL;DR: This paper proposes a novel learning algorithm for constructing data classifiers with radial basis function (RBF) networks that works by constructing one RBF network to approximate the probability density function of each class of objects in the training data set.
Abstract: This paper proposes a novel learning algorithm for constructing data classifiers with radial basis function (RBF) networks. The RBF networks constructed with the proposed learning algorithm generally are able to deliver the same level of classification accuracy as the support vector machines (SVM). One important advantage of the proposed learning algorithm, in comparison with the support vector machines, is that the proposed learning algorithm normally takes far less time to figure out optimal parameter values with cross validation. A comparison with the SVM is of interest, because it has been shown in a number of recent studies that the SVM generally is able to deliver higher level of accuracy than the other existing data classification algorithms. The proposed learning algorithm works by constructing one RBF network to approximate the probability density function of each class of objects in the training data set. The main distinction of the proposed learning algorithm is how it exploits local distributions of the training samples in determining the optimal parameter values of the basis functions. As the proposed learning algorithm is instance-based, the data reduction issue is also addressed in this paper. One interesting observation is that, for all three data sets used in data reduction experiments, the number of training samples remaining after a naive data reduction mechanism is applied is quite close to the number of support vectors identified by the SVM software.
TL;DR: This study revealed that Differential Evolution (DE) algorithms are a class of evolutionary algorithms that do not share several theoretical and practical limitations that other Genetic Algorithms have, and are significantly more efficient than other genetic algorithms, when applied to multi-modal optimal control problems.
Abstract: If optimal control problems are solved by means of gradient based local search methods, convergence to local solutions is likely. Recently, there has been an increasing interest in the use of global optimisation algorithms to solve optimal control problems, which are expected to have local solutions. Evolutionary Algorithms (EAs) are global optimisation algorithms that have mainly been applied to solve static optimisation problems. Only rarely Evolutionary Algorithms have been used to solve optimal control problems. This may be due to the belief that their computational efficiency is insufficient to solve this type of problems. In addition, the application of Evolutionary Algorithms is a relatively young area of research. As demonstrated in this thesis, Evolutionary Algorithms exist which have significant advantages over other global optimisation methods for optimal control, while their efficiency is comparable. The purpose of this study was to investigate and search for efficient evolutionary algorithms to solve optimal control problems that are expected to have local solutions. These optimal control problems are called multi-modal. An important additional requirement for the practical application of these algorithms is that they preferably should not require any algorithm parameter tuning. Therefore algorithms with less algorithm parameters should be preferred. In addition guidelines for the choice of algorithm parameter values, and the possible development of automatic algorithm parameter adjustment strategies, are important issues. This study revealed that Differential Evolution (DE) algorithms are a class of evolutionary algorithms that do not share several theoretical and practical limitations that other Genetic Algorithms have. As a result they are significantly more efficient than other Genetic Algorithms, such as Breeder Genetic Algorithms (BGA), when applied to multi-modal optimal control problems. Their efficiency is comparable to the efficiency of Iterative Dynamic Programming (IDP), a global optimisation approach specifically designed for optimal control. Moreover the DE algorithms turned out to be significantly less sensitive to problems concerning the selection or tuning of algorithm parameters and the initialisation of the algorithm. Although it is not a DE algorithm, the GENOCOP algorithm is considered to be one of the most efficient genetic algorithms with real-valued individuals and specialized evolutionary operators. This algorithm was the starting point of our research. In Chapter 2 it was applied to some optimal control problems from chemical engineering. These problems were high dimensional, non-linear, multivariable, multi-modal and non-differentiable. Basically with GENOCOP the same solutions were obtained as with Iterative Dynamic Programming. Moreover GENOCOP is more successful in locating the global solution in comparison with other local optimisation algorithms. GENOCOP'S efficiency however is rather poor and the algorithm parameter tuning rather complicated. This motivated us to seek for more efficient evolutionary algorithms. Mathematical arguments found in the literature state that DE algorithms outperform other Evolutionary Algorithms in terms of computational efficiency. Therefore in Chapter 3, DE algorithms, generally used to solve continuous parameter optimisation problems, were used to solve two multi-modal (benchmark) optimal control problems. Also some Breeder Genetic Algorithms (BGA) were applied to solve these problems. The results obtained with these algorithms were compared to one another, and to the results obtained with IDP. The comparison confirmed that DE algorithms stand out in terms of efficiency as compared to the Breeder Genetic algorithms. Moreover, in contrast to the majority of Evolutionary Algorithms, which have many algorithm parameters that need to be selected or tuned, DE has only three algorithm parameters that have to be selected or tuned. These are the population size (µ), the crossover constant (CR) and the differential variation amplification (F). The population size plays a crucial role in solving multi-modal optimal control problems. Selecting a smaller population size enhances the computational efficiency but reduces the probability of finding the global solution. During our investigations we tried to find the best trade-off. One of the most efficient DE algorithms is denoted by DE/best/2/bin . All the investigated DE algorithms solved the two benchmark multi-modal optimal control problems properly and efficiently. The computational efficiency achieved by the DE algorithms in solving the first low multi-modal problem, was comparable to that of IDP. When applied to the second highly multi-modal problem, the computational efficiency of DE was slightly inferior to the one of IDP, after tuning of the algorithm parameters. However, the selection or tuning of the algorithm parameters for IDP is more difficult and more involved. From our investigation the following guidelines were obtained for the selection of the DE algorithm parameters. Take the population size less than or equal to two times the number of variables to be optimised that result from the control parameterisation of the original optimal control problem ( µ ≤ 2n u ). Highly multi-modal optimal control problems require a large value of the differential variation amplification ( F ≥0.9) and a very small or zero value for the crossover constant (0≤ CR ≤0.2). Low multi-modal optimal control problems need a medium value for the differential variation amplification (0.4≤ CR ≤0.6) and a large or medium value for the crossover constant (0.2≤ CR ≤0.5). In contrast to IDP, finding near-optimal values for the algorithm parameters is very simple for DE algorithms. Generally, the DE algorithm parameters are kept constant during the optimization process. A more effective and efficient algorithm may be obtained if they are adjusted on-line. In Chapter 4, a strategy that on-line adjusts the differential variation amplification ( F ) and the crossover constant ( CR ) using a measure of the diversity of the individuals in the population, was proposed. Roughly, the proposed strategy takes large values for F and small values for CR at the beginning of the optimization in order to promote a global search. When the population approaches the solution, F is decreased in order to promote a local search, and the crossover parameter CR is enlarged to increase the speed of convergence. When implemented on the DE algorithm DE/rand/1/bin and applied to the two benchmark multi-modal optimal control problems, the computational efficiency significantly improved and also the probability of locating the global solution. To judge the opportunities and advantages of using Evolutionary Algorithms to solve problems related to optimal control, in Chapter 5 several engineering applications concerning optimal greenhouse cultivation control are considered. In Chapter 5.1 genetic algorithms with binary individuals (Simple Genetic Algorithm) and floating-point representation (GENOCOP) for the individuals are used to estimate some of the parameters of a two-state dynamic model of a lettuce crop, the so-called NICOLET model. This model is intended to predict dry weight and nitrate content of lettuce at harvest time. Parameter estimation problems usually suffer from local minima. This study showed that Evolutionary Algorithms are suitable to calibrate the parameters of a dynamic model. However the required computation time is significant. Partly this is due to the high computational load of a single objective function evaluation, which for parameter optimisation problems involves a system simulation. Even though parameter optimisation is very often performed off-line, thus making computation time perhaps less important, more efficient evolutionary algorithms like DE are to be preferred. In Chapter 5.2 an optimal control problem of nitrate concentration in a lettuce crop was solved by means of two different algorithms. The ACW (Adjustable Control-variation Weight) gradient algorithm, which searches for local solutions, and the DE algorithm DE/best/2/bin that searches for a global solution. The dynamic system is a modified two-state dynamic model of a lettuce crop (NICOLET B3) and the control problem has a fixed final time and control and terminal state constraints. The DE algorithm was extended in order to deal with this.The results showed that this problem probably does not have local solutions and that the control parameterisation required by the DE algorithm causes some difficulties in accurately approximating the continuous solution obtained by the ACW algorithm. On the other hand the computational efficiency of the evolutionary algorithm turned out to be impressive. An almost natural conclusion therefore is to combine a DE algorithm with a gradient algorithm. In Chapter 5.3 the combination of a DE algorithm and a first order gradient algorithm is used to solve an optimal control problem. The DE algorithm is used to approximate the global solution sufficiently close after which the gradient algorithm can converge to it efficiently. This approach was successfully tried on the optimal control of nitrate in lettuce, which unfortunately in this case, seems to have no local solutions. Still the feasibility of this approach, which is important for all types of optimal control problems of which it is unknown a-priori whether they have local solutions, was clearly demonstrated. Finally, in Chapter six this thesis ends with an overall discussion, conclusions and suggestions for future research.
TL;DR: A method for improving accuracy of rules generated by inductive machine learning algorithm by generating the ensemble of classifiers using the CLIP4 algorithm and combines them using a voting scheme is described.
Abstract: Machine learning, one of the data mining and knowledge discovery tools, addresses automated extraction of knowledge from data, expressed in the form of production rules. The paper describes a method for improving accuracy of rules generated by inductive machine learning algorithm by generating the ensemble of classifiers. It generates multiple classifiers using the CLIP4 algorithm and combines them using a voting scheme. The generation of a set of different classifiers is performed by injecting controlled randomness into the learning algorithm, but without modifying the training data set. Our method is based on the characteristic properties of the CLIP4 algorithm. The case study of the SPECT heart image analysis system is used as an example where improving accuracy is very important. Benchmarking results on other well-known machine learning datasets, and comparison with an algorithm that uses boosting technique to improve its accuracy are also presented. The proposed method always improves the accuracy of the results when compared with the accuracy of a single classifier generated by the CLIP4 algorithm, as opposed to using boosting. The obtained results are comparable with other state-of-the-art machine learning algorithms.
TL;DR: This paper presents a new independent-MDL-based approach to learn Bayesian network structures that limits the searching space by using a set of lower order independence tests, thus executing the MDL- based searching algorithm B B- MDL to obtain the final graph.
Abstract: Bayesian network structure learning based on model selection is an NP-hard problem And none of the presented algorithms is perfectly successful in solving the problem of searching efficiency, especially for complex system learning Presented in this paper is a new independent-MDL-based approach to learn Bayesian network structures The proposed algorithm limits the searching space by using a set of lower order independence tests, thus executing the MDL-based searching algorithm B B-MDL to obtain the final graph The precision analysis of algorithm is presented And the problem of parameter design is also concerned The result of the experiment shows that the new algorithm I-B B-MDL is more efficient in time consumption than B B-MDL algorithm
TL;DR: An improved algorithm for mining the weighted association rules is provided based on the mining algorithms of MINWAL(O) and MINWal(W) that can effectively consider the importance of Boolean attributes and the amount of attributes in the rule.
Abstract: Algorithms for mining the weighted association rules are discussed in this paper. As far as Boolean attributes are concerned, an improved algorithm for mining the weighted association rules is provided based on the mining algorithms of MINWAL(O) and MINWAL(W). This algorithm can effectively consider the importance of Boolean attributes and the amount of attributes in the rule. As for quantitative attributes, they are divided into several fuzzy sets by the competitive agglomeration algorithm, and then the algorithm for mining weighted fuzzy association rules is provided. This algorithm can effectively consider the importance of quantitative attributes and the amount of attributes in the rule, and can be fit for large database.
TL;DR: A new evolutionary algorithm is described for the single machine total weighted tardiness problem: a cluster forming and two local search stages that improves the accuracy of the approximation of the solutions with a local search procedure while periodically generating new solutions.
Abstract: In this paper a new evolutionary algorithm is described for the single machine total weighted tardiness problem. The operation of this method can be divided in three stages: a cluster forming and two local search stages. In the first stage it approaches some locally optimal solutions by grouping based on similarity. In the second stage it improves the accuracy of the approximation of the solutions with a local search procedure while periodically generating new solutions. In the third stage the algorithm continues the application of the local search procedure. We tested our algorithm on all the benchmark problems of ORLIB. The algorithm managed to find within an acceptable time limit the best-known solution for the problems, or found solutions within 1% of the best-known solutions in 99% of the tasks.
TL;DR: The extensive simulations demonstrate that the CNN-ART algorithm does outperform other algorithms Re LBG, SOFM and DCL.
Abstract: In this paper, a novel unsupervised competitive learning algorithm, called the centroid neural network adaptive resonance theory (CNN-ART) algorithm, is to be proposed to relieve the dependence on the initial codewords of the codebook in contrast to the conventional algorithms with vector quantization in lossy image compression. The design of the CNN-ART algorithm is mainly based on the adaptive resonance theory (ART) structure, and then a gradient-descent based learning rule is derived so that the CNN-ART algorithm does not require a predetermined schedule for learning rate. The appropriate initial weights obtained from the CNN-ART algorithm can be applied as an initial codebook of the Linde-Buzo-Gray (LBG) algorithm such that the compression performance can be greatly improved. In this paper, the extensive simulations demonstrate that the CNN-ART algorithm does outperform other algorithms Re LBG, SOFM and DCL.
TL;DR: The Weighted Majority algorithm applied to an ensemble of branch predictors yields a prediction scheme that results in a 5-11% reduction in mispredictions, and it is demonstrated that a variant of the Weighting Majority algorithm that is simplified for efficient hardware implementation still achieves misprediction rates that are within 1.2% of the ideal case.
Abstract: The problem of predicting the outcome of a conditional branch instruction is a prerequisite for high performance in modern processors. It has been shown that combining different branch predictors can yield more accurate prediction schemes, but the existing research only examines selection-based approaches where one predictor is chosen without considering the actual predictions of the available predictors. The machine learning literature contains many papers addressing the problem of predicting a binary sequence in the presence of an ensemble of predictors or experts. We show that the Weighted Majority algorithm applied to an ensemble of branch predictors yields a prediction scheme that results in a 5-11% reduction in mispredictions. We also demonstrate that a variant of the Weighted Majority algorithm that is simplified for efficient hardware implementation still achieves misprediction rates that are within 1.2% of the ideal case.
TL;DR: A cut model based real time tracking algorithm is introduced and results show that, when target is very small in the model, the tracking algorithm has better matching precision and real time.
Abstract: The correlation matching algorithm determines the matching model and grabbed image by calculating their correlation values Because the algorithm has high precision and strong adaptability and is robust to linear change of gray scale, it is widely used in projects as a classic matching algorithm But correlation match computing is very time consuming and is difficult to satisfy the real time requirement Besides, when target is only a small part in model, it is hard to confirm the model position In this paper, a cut model based real time tracking algorithm is introduced The model is segmented into several parts and there is a different weight coefficient in every part The matching score is the weighted sum of the matching scores of each part An weighted model is set in algorithm in order to resist disturbing and it is accelerated by pyramid algorithm Results show that, when target is very small in the model, the tracking algorithm has better matching precision and real time
TL;DR: When these algorithms are combined in complex ways, their performance is much better than when they are used alone or in pairs, and so there is strong evidence that the current approach to optimization followed by many current practitioners could be improved on if more complex algorithm topologies were used.
Abstract: In this paper we show how the performance of two meta-heuristic algorithms and two simple search routines varies as these algorithms are applied singly, in pairwise combinations, and in larger, finer-grained combinations. The area of application is f6 and f17, two well-known optimization benchmark problems. Our conclusion is that when these algorithms are combined in complex ways, their performance is much better than when they are used alone or in pairs, and so there is strong evidence that the current approach to optimization followed by many current practitioners with, for instance, an evolutionary algorithm succeeded by a hill-climber, could be improved on if more complex algorithm topologies were used.
TL;DR: A novel parallel learning algorithm based on the combination of the EM algorithm and the naive Bayes classifier for text classification task and results indicate that the proposed parallel algorithm is capable of handling large document collections.
Abstract: Text classification is the process of classifying documents into predefined categories based on their content. Existing supervised learning algorithms to automatically classify text need sufficient labeled documents to learn accurately. Applying the Expectation-Maximization (EM) algorithm to this problem is an alternative approach that utilizes a large pool of unlabeled documents to augment the available labeled documents. Unfortunately, the time needed to learn with these large unlabeled documents is too high. This paper introduces a novel parallel learning algorithm for text classification task. The parallel algorithm is based on the combination of the EM algorithm and the naive Bayes classifier. Our goal is to improve the computational time in learning and classifying process. We studied the performance of our parallel algorithm on a large Linux PC cluster called PIRUN Cluster. We report both timing and accuracy results. These results indicate that the proposed parallel algorithm is capable of handling large document collections.
TL;DR: This work describes a new family of topic-ranking algorithms for multi-labeled documents and outlines the formal analysis of the algorithm in the mistake bound model, which is the first to report performance results with the entire new Reuters corpus.
Abstract: We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for the algorithms stems from recent advances in online learning algorithms. The algorithms we present are simple to implement and are time and memory efficient. We evaluate the algorithms on the Reuters-21578 corpus and the new corpus released by Reuters in 2000. On both corpora the algorithms we present outperform adaptations to topic-ranking of Rocchio's algorithm and the Perceptron algorithm. We also outline the formal analysis of the algorithm in the mistake bound model. To our knowledge, this work is the first to report performance results with the entire new Reuters corpus.