Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Weighted Majority Algorithm
  4. 1994
  1. Home
  2. Topics
  3. Weighted Majority Algorithm
  4. 1994
Showing papers on "Weighted Majority Algorithm published in 1994"
Journal Article•10.1006/INCO.1994.1009•
The weighted majority algorithm

[...]

Nick Littlestone1, Manfred K. Warmuth1•
Harvard University1
01 Feb 1994-Information & Computation
TL;DR: A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm, which is robust in the presence of errors in the data, and is called the Weighted Majority Algorithm.
Abstract: We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case where the learner has reason to believe that one of some pool of known algorithms will perform well, but the learner does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. We call this method the Weighted Majority Algorithm. We show that this algorithm is robust in the presence of errors in the data. We discuss various versions of the Weighted Majority Algorithm and prove mistake bounds for them that are closely related to the mistake bounds of the best algorithms of the pool. For example, given a sequence of trials, if there is an algorithm in the pool A that makes at most m mistakes then the Weighted Majority Algorithm will make at most c(log |A| + m) mistakes on that sequence, where c is fixed constant.

2,219 citations

Journal Article•10.1023/A:1018348209754•
On-line prediction and conversion strategies

[...]

Nicolò Cesa-Bianchi1, Yoav Freund2, David P. Helmbold3, Manfred K. Warmuth3•
University of Milan1, Bell Labs2, University of California, Santa Cruz3
27 Oct 1994
TL;DR: A deterministic algorithm using binomial weights that has a better worst case mistake bound than the best deterministic algorithms using exponential weights is presented.
Abstract: We study the problem of deterministically predicting boolean values by combining the boolean predictions of several experts. Previous on-line algorithms for this problem predict with the weighted majority of the experts'' predictions. These algorithms give each expert an exponential weight beta^m where beta is a constant in [0,1) and m is the number of mistakes made by the expert in the past. We show that it is better to use sums of binomials as weights. In particular, we present a deterministic algorithm using binomial weights that has a better worst case mistake bound than the best deterministic algorithm using exponential weights. The binomial weights naturally arise from a version space argument. We also show how both exponential and binomial weighting schemes can be used to make prediction algorithms robust against noise.

79 citations

Unsupervised Classification Learning from Cross-Modal Environmental Structure

[...]

Virginia R. de Sa1•
University of Rochester1
1 Nov 1994
TL;DR: A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised discriminant algorithm and can be used as an efficient method for dealing with learning from data with missing values.
Abstract: This dissertation addresses the problem of unsupervised learning for pattern classification or category learning. A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised discriminant algorithm. .pp The advantage of supervised learning is that the final error metric is available during training. Unfortunately, when modeling human category learning, or in constructing classifiers for autonomous robots, one must deal with not having an omniscient entity labeling all incoming sensory patterns. We show that we can substitute for the labels by making use of structure between the pattern distributions to different sensory modalities. For example the co-occurrence of a visual image of a cow with a ``moo'''' sound can be used to simultaneously develop appropriate visual features for distinguishing the cow image and appropriate auditory features for recognizing the moo. .pp We model human category learning as a process of minimizing the disagreement between outputs of sensory modalities processing temporally coincident patterns. We relate this mathematically to the optimal goal of minimizing the number of misclassifications in each modality and apply the idea to derive an algorithm for piecewise linear classifiers in which each network uses the output of the other networks as a supervisory signal. .pp Using the Peterson-Barney vowel dataset we show that the algorithm finds appropriate placement for the classification boundaries. The algorithm is then demonstrated on the task of learning to recognize acoustic and visual speech from images of lips and their emanating sounds Performance on these tasks is within 1-7\% of the related supervised algorithm (LVQ2.1). .pp Finally we compare the algorithm to Becker''s IMAX algorithm and give suggestions as to how the algorithm may be implemented in the brain using physiological results concerning the relationship between two types of neural plasticity, LTP and LTD, observed in visual cortical cells. We also show how the algorithm can be used as an efficient method for dealing with learning from data with missing values.

34 citations

Journal Article•10.1007/BF01205054•
An algorithm to learn read-once threshold formulas, and transformations between learning models

[...]

Nader H. Bshouty1, Thomas R. Hancock2, Lisa Hellerstein3, Marek Karpinski4•
University of Calgary1, Princeton University2, Northwestern University3, University of Bonn4
01 Jan 1994-Computational Complexity
TL;DR: A membership query (i.e. black box interpolation) algorithm for exactly identifying the class of read-once formulas over the basis of Boolean threshold functions and a catalogue of generic transformations that can be used to convert an algorithm in one learning model into an algorithms in a different model are presented.
Abstract: We present a membership query (i.e. black box interpolation) algorithm for exactly identifying the class of read-once formulas over the basis of Boolean threshold functions. We also present a catalogue of generic transformations that can be used to convert an algorithm in one learning model into an algorithm in a different model.

19 citations

Book Chapter•10.1016/B978-1-55860-335-6.50040-4•
On the Worst-Case Analysis of Temporal-Difference Learning Algorithms.

[...]

Robert E. Schapire1, Manfred K. Warmuth2•
Bell Labs1, University of California2
1 Jan 1994
TL;DR: The worst-case behavior of a family of learning algorithms based on Sutton's method of temporal differences is studied, and general upper bounds on the performance of a slightly modified version of Sutton's so-called TD(A) algorithm are proved.
Abstract: We study the worst-case behavior of a family of learning algorithms based on Sutton's [7] method of temporal differences. In our on-line learning framework, learning takes place in a sequence of trials, and the goal of the learning algorithm is to estimate a discounted sum of all the reinforcements that will be received in the future. In this setting, we are able to prove general upper bounds on the performance of a slightly modified version of Sutton's so-called TD(A) algorithm. These bounds are stated in terms of the performance of the best linear predictor on the given training sequence, and are proved without making any statistical assumptions of any kind about the process producing the learner's observed training sequence. We also prove lower bounds on the performance of any algorithm for this learning problem, and give a similar analysis of the closely related problem of learning to predict in a model in which the learner must produce predictions for a whole batch of observations before receiving reinforcement.

11 citations

Journal Article•10.1162/NECO.1994.6.5.927•
Probabilistic winner-take-all learning algorithm for radial-basis-function neural classifiers

[...]

Hossam M. Osman1, M.M. Fahmy1•
Queen's University1
01 Sep 1994-Neural Computation
TL;DR: When all three algorithms are used to train the hidden layer of radial-basis-function classifiers, experiments indicate that classifierstrained with the probabilistic winner-take-all outperform those trained with the other two algorithms.
Abstract: This paper proposes a new adaptive competitive learning algorithm called "the probabilistic winner-take-all." The algorithm is based on a learning scheme developed by Agrawala within the statistical pattern recognition literature (Agrawala 1970). Its name stems from the fact that for a given input pattern once each competitor computes the probability of being the one that generated this pattern, the computed probabilities are utilized to probabilistically choose a winner. Then, only this winner is permitted to learn. The learning rule of the algorithm is derived for three different cases. Its properties are discussed and compared to those of two other competitive learning algorithms, namely the standard winner-take-all and the maximum-likelihood soft competition. Experimental comparison is also given. When all three algorithms are used to train the hidden layer of radial-basis-function classifiers, experiments indicate that classifiers trained with the probabilistic winner-take-all outperform those trained with the other two algorithms.

11 citations

Proceedings Article•10.1109/ANZIIS.1994.396990•
The low prediction accuracy problem in learning

[...]

Honghua Dai1•
Monash University, Clayton campus1
29 Nov 1994
TL;DR: The factors which could prevent a learning algorithm from achieving a higher prediction accuracy rate are presented, and it is indicated that overfitting on low-quality data and being misled by this are two important factors.
Abstract: Achieving a higher prediction accuracy rate is crucial for all learning algorithms, particularly for real application purposes. This paper presents the factors which could prevent a learning algorithm from achieving a higher prediction accuracy rate, and indicates that overfitting on low-quality data and being misled by this are two important factors. It also presents strategies for dealing with this problem. A new approach, called field learning, is described, by which the learnt rules can overcome this problem and achieve a higher prediction accuracy on new unseen cases. Our experiments show that this approach can achieve a higher prediction accuracy rate on new unseen cases, but it achieved a lower accuracy rate on some of the training data sets. >

4 citations

Proceedings Article•10.1109/ICPR.1994.577140•
A fast implementation of two-dimensional weighted median filters

[...]

G. Angelopoulos1, Ioannis Pitas•
Aristotle University of Thessaloniki1
9 Oct 1994
TL;DR: Experimental results prove the superiority of the proposed algorithm over that of finding the weighted median either by sorting with the Quick Sort or by selecting the r-th order statistic.
Abstract: This paper deals with the implementation of a fast algorithm for two-dimensional weighted median filtering. Because of the vast amount of data that must be handled, the development of fast algorithms is very important. A fast running algorithm for weighted median filtering, which is based on using a histogram and updating it, is proposed. Experimental results prove the superiority of the proposed algorithm over that of finding the weighted median either by sorting with the Quick Sort or by selecting the r-th order statistic.

2 citations

Proceedings Article•10.1145/180139.181020•
An optimal-control application of two paradigms of on-line learning

[...]

V. G. Vovk
16 Jul 1994
TL;DR: This paper describes and compares two paradigms of on-line learning, which are Bayesian and Popperian, and represents Littlestone and Warmuth's Weighted Majority Algorithm and Rivest and Schapire's reset-free algorithm for exact learning of finite automata with membership and equivalence queries.
Abstract: We describe and compare two paradigms of on-line learning, which we call Bayesian and Popperian. In this paper the Bayesian paradigm is represented by Littlestone and Warmuth's Weighted Majority Algorithm, and the Popperian paradigm is represented by Rivest and Schapire's reset-free algorithm for exact learning of finite automata with membership and equivalence queries. Both algorithms are applied to the problem of optimal control of a finite-state plant in a finite-state environment. The advantage of the control strategy based on the Weighted Majority Algorithm is its robustness and better performance (actually, its performance is nearly optimal in the class of deterministic control strategies), and the advantage of the control strategy based on Rivest and Schapire's algorithm is its computational efficiency.

1 citations

Journal Article•10.1080/09528139408953785•
Ignoring data may be the only way to learn efficiently

[...]

Rolf Wiehagen1, Thomas Zeugmann•
Kaiserslautern University of Technology1
01 Jan 1994-Journal of Experimental and Theoretical Artificial Intelligence
TL;DR: A natural learning problem is presented and it is proved that it can be solved in polynomial time if and only if the algorithm is allowed to ignore data.
Abstract: In designing learning algorithms it seems quite reasonable to construct them in a way such that all data the algorithm already has obtained are correctly and completely reflected in the hypothesis the algorithm outputs on these data. However, this approach may totally fail, i.e. it may lead to the unsolvability of the learning problem, or it may exclude any efficient solution of it. In particular, we present a natural learning problem and prove that it can be solved in polynomial time if and only if the algorithm is allowed to ignore data.
Journal Article•10.1162/NECO.1994.6.2.307•
Smooth on-line learning algorithms for hidden Markov models

[...]

Pierre Baldi1, Yves Chauvin2•
California Institute of Technology1, Stanford University2
01 Mar 1994-Neural Computation
TL;DR: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations, proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent.
Abstract: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations. Unlike other classical algorithms such as the Baum-Welch algorithm, the algorithms described are smooth and can be used on-line (after each example presentation) or in batch mode, with or without the usual Viterbi most likely path approximation. The algorithms have simple expressions that result from using a normalized-exponential representation for the HMM parameters. All the algorithms presented are proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent. These algorithms can also be casted in the more general EM (Expectation-Maximization) framework where they can be viewed as exact or approximate GEM (Generalized Expectation-Maximization) algorithms. The mathematical properties of the algorithms are derived in the appendix.
Proceedings Article•10.1145/180139.181176•
Learning linear threshold functions in the presence of classification noise

[...]

Tom Bylander1•
University of Texas at San Antonio1
16 Jul 1994
TL;DR: It is shown that the linear threshold functions are polynomially learnable in the presence of classification noise, i.e., polynomial in n, where n is the number of Boolean attributes, ε and δ are the usual accuracy and confidence parameters, and &sgr; indicates the minimum distance of any example from the target hyperplane.
Abstract: I show that the linear threshold functions are polynomially learnable in the presence of classification noise, i.e., polynomial in n, 1/e, 1/δ, and 1/s, where n is the number of Boolean attributes, e and δ are the usual accuracy and confidence parameters, and s indicates the minimum distance of any example from the target hyperplane, which is assumed to be lower than the average distance of the examples from any hyperplane. This result is achieved by modifying the Perceptron algorithm—for each update, a weighted average of a sample of misclassified examples and a correction vector is added to the current weight vector. Similar modifications are shown for the Weighted Majority algorithm. The correction vector is simply the mean of the normalized examples. In the special case of Boolean threshold functions, the modified Perceptron algorithm performs O (n2e−2 ) iterations over O(n4e −2ln(n/(δe))) examples. This improves on the previous classification-noise result of Angluin and Laird to a much larger concept class with a similar number of examples, but with multiple iterations over the examples.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve