Top 12 papers published in the topic of Weighted Majority Algorithm in 1994

Showing papers on "Weighted Majority Algorithm published in 1994"

Journal Article•10.1006/INCO.1994.1009•

The weighted majority algorithm

[...]

Nick Littlestone¹, Manfred K. Warmuth¹•Institutions (1)

01 Feb 1994-Information & Computation

TL;DR: A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm, which is robust in the presence of errors in the data, and is called the Weighted Majority Algorithm.

...read moreread less

Abstract: We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case where the learner has reason to believe that one of some pool of known algorithms will perform well, but the learner does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. We call this method the Weighted Majority Algorithm. We show that this algorithm is robust in the presence of errors in the data. We discuss various versions of the Weighted Majority Algorithm and prove mistake bounds for them that are closely related to the mistake bounds of the best algorithms of the pool. For example, given a sequence of trials, if there is an algorithm in the pool A that makes at most m mistakes then the Weighted Majority Algorithm will make at most c(log |A| + m) mistakes on that sequence, where c is fixed constant.

...read moreread less

2,219 citations

Journal Article•10.1023/A:1018348209754•

On-line prediction and conversion strategies

[...]

Nicolò Cesa-Bianchi¹, Yoav Freund², David P. Helmbold³, Manfred K. Warmuth³•Institutions (3)

University of Milan¹, Bell Labs², University of California, Santa Cruz³

27 Oct 1994

TL;DR: A deterministic algorithm using binomial weights that has a better worst case mistake bound than the best deterministic algorithms using exponential weights is presented.

...read moreread less

Abstract: We study the problem of deterministically predicting boolean values by combining the boolean predictions of several experts. Previous on-line algorithms for this problem predict with the weighted majority of the experts'' predictions. These algorithms give each expert an exponential weight beta^m where beta is a constant in [0,1) and m is the number of mistakes made by the expert in the past. We show that it is better to use sums of binomials as weights. In particular, we present a deterministic algorithm using binomial weights that has a better worst case mistake bound than the best deterministic algorithm using exponential weights. The binomial weights naturally arise from a version space argument. We also show how both exponential and binomial weighting schemes can be used to make prediction algorithms robust against noise.

...read moreread less

79 citations

Unsupervised Classification Learning from Cross-Modal Environmental Structure

[...]

Virginia R. de Sa¹•Institutions (1)

University of Rochester¹

1 Nov 1994

TL;DR: A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised discriminant algorithm and can be used as an efficient method for dealing with learning from data with missing values.

...read moreread less

Abstract: This dissertation addresses the problem of unsupervised learning for pattern classification or category learning. A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised discriminant algorithm. .pp The advantage of supervised learning is that the final error metric is available during training. Unfortunately, when modeling human category learning, or in constructing classifiers for autonomous robots, one must deal with not having an omniscient entity labeling all incoming sensory patterns. We show that we can substitute for the labels by making use of structure between the pattern distributions to different sensory modalities. For example the co-occurrence of a visual image of a cow with a ``moo'''' sound can be used to simultaneously develop appropriate visual features for distinguishing the cow image and appropriate auditory features for recognizing the moo. .pp We model human category learning as a process of minimizing the disagreement between outputs of sensory modalities processing temporally coincident patterns. We relate this mathematically to the optimal goal of minimizing the number of misclassifications in each modality and apply the idea to derive an algorithm for piecewise linear classifiers in which each network uses the output of the other networks as a supervisory signal. .pp Using the Peterson-Barney vowel dataset we show that the algorithm finds appropriate placement for the classification boundaries. The algorithm is then demonstrated on the task of learning to recognize acoustic and visual speech from images of lips and their emanating sounds Performance on these tasks is within 1-7\% of the related supervised algorithm (LVQ2.1). .pp Finally we compare the algorithm to Becker''s IMAX algorithm and give suggestions as to how the algorithm may be implemented in the brain using physiological results concerning the relationship between two types of neural plasticity, LTP and LTD, observed in visual cortical cells. We also show how the algorithm can be used as an efficient method for dealing with learning from data with missing values.

...read moreread less

34 citations

Journal Article•10.1007/BF01205054•

An algorithm to learn read-once threshold formulas, and transformations between learning models

[...]

Nader H. Bshouty¹, Thomas R. Hancock², Lisa Hellerstein³, Marek Karpinski⁴•Institutions (4)

University of Calgary¹, Princeton University², Northwestern University³, University of Bonn⁴

01 Jan 1994-Computational Complexity

TL;DR: A membership query (i.e. black box interpolation) algorithm for exactly identifying the class of read-once formulas over the basis of Boolean threshold functions and a catalogue of generic transformations that can be used to convert an algorithm in one learning model into an algorithms in a different model are presented.

...read moreread less

Abstract: We present a membership query (i.e. black box interpolation) algorithm for exactly identifying the class of read-once formulas over the basis of Boolean threshold functions. We also present a catalogue of generic transformations that can be used to convert an algorithm in one learning model into an algorithm in a different model.

...read moreread less

19 citations

Book Chapter•10.1016/B978-1-55860-335-6.50040-4•

On the Worst-Case Analysis of Temporal-Difference Learning Algorithms.

[...]

Robert E. Schapire¹, Manfred K. Warmuth²•Institutions (2)

Bell Labs¹, University of California²

1 Jan 1994

TL;DR: The worst-case behavior of a family of learning algorithms based on Sutton's method of temporal differences is studied, and general upper bounds on the performance of a slightly modified version of Sutton's so-called TD(A) algorithm are proved.

...read moreread less

Abstract: We study the worst-case behavior of a family of learning algorithms based on Sutton's [7] method of temporal differences. In our on-line learning framework, learning takes place in a sequence of trials, and the goal of the learning algorithm is to estimate a discounted sum of all the reinforcements that will be received in the future. In this setting, we are able to prove general upper bounds on the performance of a slightly modified version of Sutton's so-called TD(A) algorithm. These bounds are stated in terms of the performance of the best linear predictor on the given training sequence, and are proved without making any statistical assumptions of any kind about the process producing the learner's observed training sequence. We also prove lower bounds on the performance of any algorithm for this learning problem, and give a similar analysis of the closely related problem of learning to predict in a model in which the learner must produce predictions for a whole batch of observations before receiving reinforcement.

...read moreread less

11 citations

Journal Article•10.1162/NECO.1994.6.5.927•

Probabilistic winner-take-all learning algorithm for radial-basis-function neural classifiers

[...]

Hossam M. Osman¹, M.M. Fahmy¹•Institutions (1)

Queen's University¹

01 Sep 1994-Neural Computation

TL;DR: When all three algorithms are used to train the hidden layer of radial-basis-function classifiers, experiments indicate that classifierstrained with the probabilistic winner-take-all outperform those trained with the other two algorithms.

...read moreread less

Abstract: This paper proposes a new adaptive competitive learning algorithm called "the probabilistic winner-take-all." The algorithm is based on a learning scheme developed by Agrawala within the statistical pattern recognition literature (Agrawala 1970). Its name stems from the fact that for a given input pattern once each competitor computes the probability of being the one that generated this pattern, the computed probabilities are utilized to probabilistically choose a winner. Then, only this winner is permitted to learn. The learning rule of the algorithm is derived for three different cases. Its properties are discussed and compared to those of two other competitive learning algorithms, namely the standard winner-take-all and the maximum-likelihood soft competition. Experimental comparison is also given. When all three algorithms are used to train the hidden layer of radial-basis-function classifiers, experiments indicate that classifiers trained with the probabilistic winner-take-all outperform those trained with the other two algorithms.

...read moreread less

11 citations

Proceedings Article•10.1109/ANZIIS.1994.396990•

The low prediction accuracy problem in learning

[...]

Honghua Dai¹•Institutions (1)

Monash University, Clayton campus¹

29 Nov 1994

TL;DR: The factors which could prevent a learning algorithm from achieving a higher prediction accuracy rate are presented, and it is indicated that overfitting on low-quality data and being misled by this are two important factors.

...read moreread less

Abstract: Achieving a higher prediction accuracy rate is crucial for all learning algorithms, particularly for real application purposes. This paper presents the factors which could prevent a learning algorithm from achieving a higher prediction accuracy rate, and indicates that overfitting on low-quality data and being misled by this are two important factors. It also presents strategies for dealing with this problem. A new approach, called field learning, is described, by which the learnt rules can overcome this problem and achieve a higher prediction accuracy on new unseen cases. Our experiments show that this approach can achieve a higher prediction accuracy rate on new unseen cases, but it achieved a lower accuracy rate on some of the training data sets. >

...read moreread less

4 citations

Proceedings Article•10.1109/ICPR.1994.577140•

A fast implementation of two-dimensional weighted median filters

[...]

G. Angelopoulos¹, Ioannis Pitas•Institutions (1)

Aristotle University of Thessaloniki¹

9 Oct 1994

TL;DR: Experimental results prove the superiority of the proposed algorithm over that of finding the weighted median either by sorting with the Quick Sort or by selecting the r-th order statistic.

...read moreread less

Abstract: This paper deals with the implementation of a fast algorithm for two-dimensional weighted median filtering. Because of the vast amount of data that must be handled, the development of fast algorithms is very important. A fast running algorithm for weighted median filtering, which is based on using a histogram and updating it, is proposed. Experimental results prove the superiority of the proposed algorithm over that of finding the weighted median either by sorting with the Quick Sort or by selecting the r-th order statistic.

...read moreread less

2 citations

Proceedings Article•10.1145/180139.181020•

An optimal-control application of two paradigms of on-line learning

[...]

V. G. Vovk

16 Jul 1994

TL;DR: This paper describes and compares two paradigms of on-line learning, which are Bayesian and Popperian, and represents Littlestone and Warmuth's Weighted Majority Algorithm and Rivest and Schapire's reset-free algorithm for exact learning of finite automata with membership and equivalence queries.

...read moreread less

Abstract: We describe and compare two paradigms of on-line learning, which we call Bayesian and Popperian. In this paper the Bayesian paradigm is represented by Littlestone and Warmuth's Weighted Majority Algorithm, and the Popperian paradigm is represented by Rivest and Schapire's reset-free algorithm for exact learning of finite automata with membership and equivalence queries. Both algorithms are applied to the problem of optimal control of a finite-state plant in a finite-state environment. The advantage of the control strategy based on the Weighted Majority Algorithm is its robustness and better performance (actually, its performance is nearly optimal in the class of deterministic control strategies), and the advantage of the control strategy based on Rivest and Schapire's algorithm is its computational efficiency.

...read moreread less

1 citations

Journal Article•10.1080/09528139408953785•

Ignoring data may be the only way to learn efficiently

[...]

Rolf Wiehagen¹, Thomas Zeugmann•Institutions (1)

Kaiserslautern University of Technology¹

01 Jan 1994-Journal of Experimental and Theoretical Artificial Intelligence

TL;DR: A natural learning problem is presented and it is proved that it can be solved in polynomial time if and only if the algorithm is allowed to ignore data.

...read moreread less

Abstract: In designing learning algorithms it seems quite reasonable to construct them in a way such that all data the algorithm already has obtained are correctly and completely reflected in the hypothesis the algorithm outputs on these data. However, this approach may totally fail, i.e. it may lead to the unsolvability of the learning problem, or it may exclude any efficient solution of it. In particular, we present a natural learning problem and prove that it can be solved in polynomial time if and only if the algorithm is allowed to ignore data.

...read moreread less

Journal Article•10.1162/NECO.1994.6.2.307•

Smooth on-line learning algorithms for hidden Markov models

[...]

Pierre Baldi¹, Yves Chauvin²•Institutions (2)

California Institute of Technology¹, Stanford University²

01 Mar 1994-Neural Computation

TL;DR: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations, proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent.

...read moreread less

Abstract: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations. Unlike other classical algorithms such as the Baum-Welch algorithm, the algorithms described are smooth and can be used on-line (after each example presentation) or in batch mode, with or without the usual Viterbi most likely path approximation. The algorithms have simple expressions that result from using a normalized-exponential representation for the HMM parameters. All the algorithms presented are proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent. These algorithms can also be casted in the more general EM (Expectation-Maximization) framework where they can be viewed as exact or approximate GEM (Generalized Expectation-Maximization) algorithms. The mathematical properties of the algorithms are derived in the appendix.

...read moreread less

Proceedings Article•10.1145/180139.181176•

Learning linear threshold functions in the presence of classification noise

[...]

Tom Bylander¹•Institutions (1)

University of Texas at San Antonio¹

16 Jul 1994

TL;DR: It is shown that the linear threshold functions are polynomially learnable in the presence of classification noise, i.e., polynomial in n, where n is the number of Boolean attributes, ε and δ are the usual accuracy and confidence parameters, and &sgr; indicates the minimum distance of any example from the target hyperplane.

...read moreread less

Abstract: I show that the linear threshold functions are polynomially learnable in the presence of classification noise, i.e., polynomial in n, 1/e, 1/δ, and 1/s, where n is the number of Boolean attributes, e and δ are the usual accuracy and confidence parameters, and s indicates the minimum distance of any example from the target hyperplane, which is assumed to be lower than the average distance of the examples from any hyperplane. This result is achieved by modifying the Perceptron algorithm—for each update, a weighted average of a sample of misclassified examples and a correction vector is added to the current weight vector. Similar modifications are shown for the Weighted Majority algorithm. The correction vector is simply the mean of the normalized examples. In the special case of Boolean threshold functions, the modified Perceptron algorithm performs O (n2e−2 ) iterations over O(n4e −2ln(n/(δe))) examples. This improves on the previous classification-noise result of Angluin and Laird to a much larger concept class with a similar number of examples, but with multiple iterations over the examples.

...read moreread less