TL;DR: In this article, an active learning master algorithm based on a known competitive algorithm for the multi-armed bandit problem and a novel semi-supervised performance evaluation statistic is proposed.
Abstract: This paper is concerned with the question of how to online combine an ensemble of active learners so as to expedite the learning progress during a pool-based active learning session. We develop a powerful active learning master algorithm, based a known competitive algorithm for the multi-armed bandit problem and a novel semi-supervised performance evaluation statistic. Taking an ensemble containing two of the best known active learning algorithms and a new algorithm, the resulting new active learning master algorithm is empirically shown to consistently perform almost as well as and sometimes outperform the best algorithm in the ensemble on a range of classification problems.
TL;DR: This work builds an iterative algorithm on top of the Schlesinger–Kozinec algorithm (S–K-algorithm) from 1981 which finds a maximal margin hyperplane with a given precision for separable data and suggests a generalization to the non-linear case using kernel functions.
TL;DR: It is shown that with the methodology currently used in comparative machine learning experiments, the results may often not be reliable because of the role of and interaction between feature selection and algorithm parameter optimization.
Abstract: Comparative machine learning experiments have become an important methodology in empirical approaches to natural language processing (i) to investigate which machine learning algorithms have the 'right bias' to solve specific natural language processing tasks, and (ii) to investigate which sources of information add to accuracy in a learning approach. Using automatic word sense disambiguation as an example task, we show that with the methodology currently used in comparative machine learning experiments, the results may often not be reliable because of the role of and interaction between feature selection and algorithm parameter optimization. We propose genetic algorithms as a practical approach to achieve both higher accuracy within a single approach, and more reliable comparisons.
TL;DR: This dissertation proposes a machine learning-based inductive approach using experimental algorithmic methods and machine learning techniques to solve the algorithm selection problem and discovers multifractal properties of the joint probability distributions of Bayesian networks.
Abstract: The algorithm selection problem aims at selecting the best algorithm for a given computational problem instance according to some characteristics of the instance. In this dissertation, we first introduce some results from theoretical investigation of the algorithm selection problem. We show, by Rice's theorem, the nonexistence of an automatic algorithm selection program based only on the description of the input instance and the competing hardness and algorithm performance based on Kolmogorov complexity to show that algorithm selection for search is also incomputable. Driven by the theoretical results, we propose a machine learning-based inductive approach using experimental algorithmic methods and machine learning techniques to solve the algorithm selection problem.
Experimentally, we have applied the proposed methodology to algorithm selection for sorting and the MPE problem. In sorting, instances with an existing order are easier for some algorithms. We have studied different presortedness measures, designed algorithms to generate permutations with a specified existing order uniformly at random, and applied various learning algorithms to induce sorting algorithm selection models from runtime experimental results. In the MPE problem, the instance characteristics we have studied include size and topological type of the network, network connectedness, skewness of the distributions in Conditional Probability Tables (CPTs), and the proportion and distribution of evidence variables. The MPE algorithms considered include an exact algorithm (clique-tree propagation), two stochastic sampling algorithms (MCMC Gibbs sampling and importance forward sampling), two search-based algorithms (multi-restart hill-climbing and tabu search), and one hybrid algorithm combining both sampling and search (ant colony optimization).
Another major contribution of this dissertation is the discovery of multifractal properties of the joint probability distributions of Bayesian networks. With sufficient asymmetry in individual prior and conditional probability distributions, the joint distribution is not only highly skewed, but it also has clusters of high-probability instantiations at all scales. We present a two phase hybrid random sampling and search algorithm to solve the MPE problem exploiting this clustering property. Since the MPE problem (decision version) is NP-complete, the multifractal meta-heuristic can be applied to solve other NP-hard combinatorial optimization problems as well.
TL;DR: An incremental learning algorithm based on weighted majority voting of an ensemble of classifiers is introduced for supervised neural networks, where the voting weights are updated dynamically based on the current test input of unknown class.
Abstract: An incremental learning algorithm based on weighted majority voting of an ensemble of classifiers is introduced for supervised neural networks, where the voting weights are updated dynamically based on the current test input of unknown class. The algorithm's dynamic voting weight update feature is an enhancement to our previously introduced incremental learning algorithm, Learn++. The algorithm is capable of incrementally learning new information from additional datasets that may later become available, even when the new datasets include instances from additional classes that were not previously seen. Furthermore, the algorithm retains formerly acquired knowledge without requiring access to datasets used earlier, attaining a delicate balance on the stability-plasticity dilemma. The algorithm creates additional ensembles of classifiers based on an iteratively updated distribution function on the training data that favors training with increasingly difficult to learn, previously not learned and/or unseen instances. The final classification is made by weighted majority voting of all classifier outputs in the ensemble, where the voting weights are determined dynamically during actual testing, based on the estimated performance of each classifier on the current test data instance. We present the algorithm in its entirety, as well as its promising simulation results on two real world applications.
TL;DR: This paper advocates the use of graph-based generative probability models and their associated inference and learning algorithms for computer vision and scene analysis and describes how each technique can be applied to an illustrative example of inference and learn in models of multiple, occluding objects.
Abstract: Computer vision is currently one of the most exciting areas of artificial intelligence research, largely because it has recently become possible to record, store and process large amounts of visual data Impressive results have been obtained by applying discriminative techniques in an ad hoc fashion to large amounts of data, eg, using support vector machines for detecting face patterns in images However, it is even more exciting that researchers may be on the verge of introducing computer vision systems that perform realistic scene analysis, decomposing a video into its constituent objects, lighting conditions, motion patterns, and so on In our view, two of the main challenges in computer vision are finding efficient models of the physics of visual scenes and finding efficient algorithms for inference and learning in these models In this paper, we advocate the use of graph-based generative probability models and their associated inference and learning algorithms for computer vision and scene analysis We review exact techniques and various approximate, computationally efficient techniques, including iterative conditional modes, the expectation maximization algorithm, the mean field method, variational techniques, structured variational techniques, Gibbs sampling, the sum-product algorithm and “loopy” belief propagation We describe how each technique can be applied to an illustrative example of inference and learning in models of multiple, occluding objects, and compare the performances of the techniques
TL;DR: The extensive simulations demonstrate that the CNN-ART algorithm does outperform other algorithms like LBG, self-organizing feature map and differential competitive learning.
TL;DR: It is shown that the K-NN algorithm has the options for weight setting, normalization, editing the data and it can be used to develop hybrid systems for data mining and the C4.5 algorithm can generate rules from a single tree with the ability to transform multiple decision trees into a set of classification rules.
Abstract: Summary form only given. Data mining is considered a fast growing technology as a result of the combination of some existing technologies such as machine learning, database systems, statistics and visualization. Some data mining algorithms has been used to offer a solution to classification problems in databases. To explain this task, comparison between the k-nearest neighbor (K-NN) and C4.5 algorithms in terms of their performance as classifier is carried out. While the K-NN is a supervised learning algorithm, C4.5 is an inductive learning algorithm. It is shown that the K-NN algorithm has the options for weight setting, normalization, editing the data and it can be used to develop hybrid systems for data mining. It is also shown the C4.5 algorithm can generate rules from a single tree with the ability to transform multiple decision trees into a set of classification rules and it can be used to better scale up rule generation in terms of size and number of rules and learning time.
TL;DR: A DNA-based massively parallel exhaustive search is applied to solving the computational learning problems of DNF (disjunctive normal form) Boolean formulae and it is shown that the class of k-term DNF formULae and theclass of general DNFformulae are efficiently learnable on DNA computer.
Abstract: We apply a DNA-based massively parallel exhaustive search to solving the computational learning problems of DNF (disjunctive normal form) Boolean formulae. Learning DNF formulae from examples is one of the most important open problems in computational learning theory and the problem of learning 3-term DNF formulae is known as intractable if RP ≠ NP. We propose new methods to encode any k-term DNF formula to a DNA strand, evaluate the encoded DNF formula for a truth-value assignment by using hybridization and primer extension with DNA polymerase, and find a consistent DNF formula with the given examples. By employing these methods, we show that the class of k-term DNF formulae (for any constant k) and the class of general DNF formulae are efficiently learnable on DNA computer.
Second, in order for the DNA-based learning algorithm to be robust for errors in the data, we implement the weighted majority algorithm on DNA computers, called DNA-based majority algorithm via amplification (DNAMA), which take a strategy of ``amplifying'' the consistent (correct) DNA strands. We show a theoretical analysis for the mistake bound of the DNA-based majority algorithm via amplification, and imply that the amplification to ``double the volumes'' of the correct DNA strands in the test tube works well.
TL;DR: It is shown that owing to "hill-climbing" problem solving, the characteristics of the classifier made with the help of the new algorithm became significantly better.
Abstract: We describe experiments with machine learning algorithms (ID3, C4.5, Bagged-C4.5, Boosted-C4.5 and Naive Bayes) and an algorithm made on the basis of a combination of genetic algorithms (GA) and ID3. To perform the experiments, the latter algorithm is implemented as an extension of the MLC++ library of Stanford University. The behaviour of the algorithm is tested using 24 databases including the databases with a large number of attributes. It is shown that owing to "hill-climbing" problem solving, the characteristics of the classifier made with the help of the new algorithm became significantly better. The behaviour of the algorithm is examined when constructing pruned classifiers. The ways to improve standard machine learning algorithms are suggested
TL;DR: The main contribution of this work is to prove that, in the on-line mistake-bounded model of learning, a multi-class sub-expert learning algorithm has the same mistake bounds as a related two class linear-threshold algorithm.
Abstract: We present a new type of multi-class learning algorithm called a linear-max algorithm. Linear-max algorithms learn with a special type of attribute called a sub-expert. A sub-expert is a vector attribute that has a value for each output class. The goal of the multi-class algorithm is to learn a linear function combining the sub-experts and to use this linear function to make correct class predictions. The main contribution of this work is to prove that, in the on-line mistake-bounded model of learning, a multi-class sub-expert learning algorithm has the same mistake bounds as a related two class linear-threshold algorithm. We apply these techniques to three linear-threshold algorithms: Perceptron, Winnow, and Romma. We show these algorithms give good performance on artificial and real datasets.
TL;DR: It is illustrated how the local algorithm provides a very minimal approach when determining the fixed-points, reminiscent of, but improving upon, what is known as Pending Analysis, by tailoring the general algorithm to concrete examples in such (apparently) diverse areas as type inference, model checking, and strictness analysis.
Abstract: We present a very simple, yet general algorithm for computing simultaneous, minimum fixed-points of monotonic functions, or turning the newpoint slightly, an algorithm for computing minimum solutions to a system of monotonic equations. The algorithm is local (demand-driven, lazy, ... ), i.e. it will try to determine the value of a single component in the simultaneous fixed-point by investigating only certain necessary parts of the description of the monotonic function, or in terms of the equational presentation, it will determine the value of a single variable by investigating only a part of the equational system. In the worst-case this involves inspecting the complete system, and the algorithm will be a logarithmic factor worse than a global algorithm (computing the values of all variables simultaneously). But despite its simplicity the local algorithm has some advantages which promise much better performance on typical cases. The algorithm should be seen as a schema that for any particular application needs to be refined to achieve better efficiency, but the general mechanism remains the same. As such it seems to achieve performance comparable to, and for some examples improving upon, carefully designed ad hoc algorithms, still maintaining the benefits of being local. We illustrate this point by tailoring the general algorithm to concrete examples in such (apparently) diverse areas as type inference, model checking, and strictness analysis. Especially in connection with the last example, strictness analysis, and more generally abstract interpretation, it is illustrated how the local algorithm provides a very minimal approach when determining the fixed-points, reminiscent of, but improving upon, what is known as Pending Analysis. In the case of model checking a specialised version of the algorithm has already improved on earlier known local algorithms.
TL;DR: A faster implementation of the ISOCLUS algorithm is developed, based on a recent acceleration to the k-means algorithm, and it is shown that it is possible to achieve essentially the same results as ISocLUS on large data sets, but with significantly lower running times.
Abstract: Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O(kn) time, where k denotes the current number of centers. Traditional techniques for accelerating nearest neighbor searching involve storing the k centers in a data structure. However, because of the iterative nature of the algorithm, this data structure would need to be rebuilt with each new iteration. Our approach is to store the data points in a kd-tree data structure. The assignment of points to nearest neighbors is carried out by a filtering process, which successively eliminates centers that can not possibly be the nearest neighbor for a given region of space. This algorithm is significantly faster, because large groups of data points can be assigned to their nearest center in a single operation. Preliminary results on a number of real Landsat datasets show that our revised ISOCLUS-like scheme runs about twice as fast.
TL;DR: The experimental results show that quality rankings based on time may be heavily influenced by the choice of operational scenario and code quality, and possible alternative ranking schemes for the specific case of Dijkstra graph search algorithms are explored.
Abstract: : Given two algorithms that perform the same task, one may ask which is better. One simple answer is that the algorithm that delivers the "best" answer is the better algorithm. But what if both algorithms deliver results of similar quality? In this case, a common metric that is utilized to differentiate between the two algorithms is the time to find a solution. Measurements, however, must be performed using an implementation of an algorithm (not an abstract algorithm) and must be taken using specific test data. Because the effects of implementation quality and test data selection may be large, the measured time metric is an insufficient measure of algorithm performance and quality. In this paper we present the specific case of several different implementations of the same Dijkstra graph search algorithm applied to graphs with various branching factors. Our experimental results show that quality rankings based on time may be heavily influenced by the choice of operational scenario and code quality. In addition, we explore possible alternative ranking schemes for the specific case of Dijkstra graph search algorithms.
TL;DR: This algorithm is applied for the image recognition of two special animal fibers and shows that this algorithm is more effective than the traditional SVM while the classification precision is also guaranteed.
TL;DR: An evolutionary algorithm based on a single operator called stochastic weighted learning, i.e., each individual will learn from other individuals specified with stochastically weight coefficients in each generation, for constrained optimization for equality and inequality constraints is proposed.
Abstract: In this paper, we propose an evolutionary algorithm based on a single operator called stochastic weighted learning for continuous optimization. Unlike most other EAs that have different selection strategies, mutation rules and crossover operators, the proposed algorithm uses only one operator that mimics the strategy learning process of rational economic agents, i.e., each agent in a population update its strategy to improve its fitness by learning from other agents' strategies specified with stochastic weight coefficients, to achieve the objective of optimization. Experiment results on several optimization problems and comparisons with other evolutionary algorithms show the efficiency of the proposed algorithm.
TL;DR: The granularity calculating of quotient space theory dealing with classification problems about machine learning can recognize the different sorts of instances, which features are very similar, and improve its generalization of recognition, and reduce the complicacy of calculation.
Abstract: This paper puts forward the use of granularity calculating of quotient space theory dealing with classification problems about machine learning According to the prior knowledge or the clustering of the training examples, training data are reorganized with granularity to form new instances, and to learn from the new instances The different sorts of instances, which are combined in granularity, are classified through different layers of classifiers In this way, the difficulty of the learning is reduced, the capacity of learning from instances is increased, and the classifying accuracy is improved At the same time, the method can recognize the different sorts of instances, which features are very similar, and improve its generalization of recognition, and reduce the complicacy of calculation The detailed procedures of the method using covering algorithm and its experimental results are presented The results show that the method is effective
TL;DR: This short report focuses on the learning of regular grammars (those languages accepted by finite state machines) by making use of heuristic search algorithms to direct the search.
Abstract: When humans efficiently infer complex functions from a relatively few but wellchosen examples, something beyond exhaustive search must probably be at work. Different heuristics are often made use of during this learning process in order to efficiently infer target functions. Our current research focuses on different heuristics through which regular grammars can be efficiently inferred from a minimal amount of examples. A brief introduction to the theory of grammatical inference is given, followed by a brief discussion of the current state of the art in automata learning and methods currently under development which we believe can improve automata learning when using sparse data. 1 Grammatical Inference A typical definition for learning would be the act, process, or experience of gaining knowledge. Within the field of machine learning this process of gaining knowledge is achieved by applying a number of techniques, mainly those relying on heuristic search algorithms, rule-based systems, neural networks and genetic algorithms. This short report focuses on the learning of regular grammars (those languages accepted by finite state machines) by making use of heuristic search algorithms to direct the search. The process of learning grammars from a given set of data is referred to as grammatical inference (GI). Automata learning is the process of generalizing from a finite set of labelled examples, the language (FSA) which generated them. Let us say that we’ve got the +ve example set {10, 20, 30, 80}. Positive since these examples are labelled ”accepted” by the target language. We can immediately infer that the target language is that of integers divisible by 10 (or rather strings whose length is divisible by 10). However, by overgeneralizing we can also infer that the language is that of even integers (strings whose length is divisible by 2). Both are correct; however as we’ll be outlining in the next section, this example illustrates how vital the training sample is (both +ve and -ve samples), for efficient, correct grammatical inference. The field of grammatical inference finds practical applications within areas such as syntactic pattern recognition, adaptive intelligent agents, computational biology, natural language acquisition and knowledge discovery as illustrated in [6]. In the next section we will be discussing some theoretical background. 2 Preliminaries Automata learning or identification can be formally expressed as a decision problem. 100 Sandro Spina Given an integer n and two disjoint sets of words D+ and D− over a finite alphabet Σ, does there exist a DFA consistent with D+ and D−and having a number of states less than or equal to n The most classical and frequently used paradigm for language learning is that proposed by Gold [3], namely language identification in the limit. There are two main variations of this paradigm. In the first one the learner can make use of as much data as necessary. The learning algorithm is supplied with a growing sequence of examples compatible with the target automata. At each step the learner proposes a hypothesis DFA, representing the guessed solution. The algorithm is said to have the the identification in the limit property if the hypothesis (consistent with all learning data) remains unchanged for a finite number of guesses. In the second case the number of available learning examples is fixed and the learning algorithm must propose one hypothesis from this set of examples. This algorithm is said to have the identification in the limit property if, for any target machine A, it is possible to define a set D A of training examples called the representative sample (characteristic set) of L(A) [4]. Our work currently focuses on this second variation, were we’re currently focusing on determining any lower bounds for the sparsity of the training data in order to be able to identify certain classes of regular languages. Gold [3] has proved that this decision problem is NP-complete, however if the sets D+ and D− are somehow representative of the target automaton, there exist a number of algorithms that solve the considered problem in deterministic polynomial time. In the next section we’ll be describing two main GI algorithms. 3 Learning Algorithms The first algorithm is due to Trakhtenbrot and Barzdin [5]. A uniformly complete data set is required for their algorithm to find the smallest DFA that recognizes the language. Their algorithm was rediscovered by Gold in 1978 and applied to the grammatical inference problem, however in this case uniformly complete samples are not required. A second algorithm, RPNI (Regular Positive and Negative Inference) was proposed by Oncina and Garcia in 1992. Lang [5] proposed another algorithm that behaves exactly in the same way as RPNI during the same year. The RPNI algorithm is based on merging states in the prefix tree acceptor of the sample. Both algorithms are based on searching for equivalent states. These algorithms had a major impact in the field, since now languages of infinite size became learnable. Lang proved empirically that the average case is tractable. Different control strategies (heuristics) can be adopted to explore the space of DFA constructions. At each step a number of possible merges are possible, thus the merging order of equivalent states determines the correctness of the generated target language hypothesis. To make things clear let us consider the regular expression ab∗a, with D+ = {aba, aa, abbbba} and D− = {b, ab, abbb}. The Augmented Prefix Tree Acceptor (APTA) for these training sets is shown in figure 1. Note that final (accepting) states are labelled 1, non-final (rejecting) states are labelled 0 and unknown states are marked ?. The task of the learning algorithms is to determine the correct labelling for the states marked with a ?. The learning algorithm proceeds by merging states in the APTA, until no more merges are possible. Rodney Price [2] proposed an evidence driven heuristic for merging states. Essentially this algorithm (EDSM) works as follows : 1. Evaluate all possible pairings of nodes within the APTA A Risk Driven State Merging Algorithm for Learning DFAs 101
TL;DR: Results suggest that the ensemble method learns drifting concepts almost as well as the base algorithms learn each concept individually, which is the best overall results for these problems to date.
Abstract: Algorithms for tracking concept drift are important for many applications. We present a general method based on the weighted majority algorithm for using any online learner for concept drift. Dynamic weighted majority (DWM) maintains an ensemble of base learners, predicts using a weighted-majority vote of these "experts", and dynamically creates and deletes experts in response to changes in performance. We empirically evaluated two experimental systems based on the method using incremental naive Bayes and incremental tree inducer [ITI] as experts. For the sake of comparison, we also included Blum's implementation of weighted majority. On the STAGGER concepts and on the SEA concepts, results suggest that the ensemble method learns drifting concepts almost as well as the base algorithms learn each concept individually. Indeed, we report the best overall results for these problems to date.
TL;DR: This paper describes a family of additive ultraconservative algorithms where each algorithm in the family updates its prototypes by finding a feasible solution for a set of linear constraints that depend on the instantaneous similarity-scores.
Abstract: In this paper we study a paradigm to generalize online classification algorithms for binary classification problems to multiclass problems. The particular hypotheses we investigate maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarity-score between each prototype and the input instance and sets the predicted label to be the index of the prototype achieving the highest similarity. To design and analyze the learning algorithms in this paper we introduce the notion of ultraconservativeness. Ultraconservative algorithms are algorithms that update only the prototypes attaining similarity-scores which are higher than the score of the correct label's prototype. We start by describing a family of additive ultraconservative algorithms where each algorithm in the family updates its prototypes by finding a feasible solution for a set of linear constraints that depend on the instantaneous similarity-scores. We then discuss a specific online algorithm that seeks a set of prototypes which have a small norm. The resulting algorithm, which we term MIRA (for Margin Infused Relaxed Algorithm) is ultraconservative as well. We derive mistake bounds for all the algorithms and provide further analysis of MIRA using a generalized notion of the margin for multiclass problems. We discuss the form the algorithms take in the binary case and show that all the algorithms from the first family reduce to the Perceptron algorithm while MIRA provides a new Perceptron-like algorithm with a margin-dependent learning rate. We then return to multiclass problems and describe an analogous multiplicative family of algorithms with corresponding mistake bounds. We end the formal part by deriving and analyzing a multiclass version of Li and Long's ROMMA algorithm. We conclude with a discussion of experimental results that demonstrate the merits of our algorithms.
TL;DR: In this paper, a problem space genetic algorithm was proposed to solve single machine total weighted tardiness scheduling problems, which utilizes global and time-dependent local dominance rules to improve the neighborhood structure of the search space.
Abstract: We propose a problem space genetic algorithm to solve single machine total weighted tardiness scheduling problems. The proposed algorithm utilizes global and time-dependent local dominance rules to improve the neighborhood structure of the search space. They are also a powerful exploitation (intensifying) tool since the global optimum is one of the local optimum solutions. Furthermore, the problem space search method significantly enhances the exploration (diversification) capability of the genetic algorithm. In summary, we can improve both solution quality and robustness over the other local search algorithms reported in the literature.
TL;DR: A meta-learning method that uses a k-Nearest Neighbor algorithm to identify the datasets that are most similar to the one at hand and leads to significantly better rankings than the baseline ranking method.
Abstract: We present a meta-learning method to support selection of candidate learning algorithms. It uses a k-Nearest Neighbor algorithm to identify the datasets that are most similar to the one at hand. The distance between datasets is assessed using a relatively small set of data characteristics, which was selected to represent properties that affect algorithm performance. The performance of the candidate algorithms on those datasets is used to generate a recommendation to the user in the form of a ranking. The performance is assessed using a multicriteria evaluation measure that takes not only accuracy, but also time into account. As it is not common in Machine Learning to work with rankings, we had to identify and adapt existing statistical techniques to devise an appropriate evaluation methodology. Using that methodology, we show that the meta-learning method presented leads to significantly better rankings than the baseline ranking method. The evaluation methodology is general and can be adapted to other ranking problems. Although here we have concentrated on ranking classification algorithms, the meta-learning framework presented can provide assistance in the selection of combinations of methods or more complex problem solving strategies.
TL;DR: This work can combine the two master algorithms to a single algorithm which guarantees both rejection and acceptance competitiveness, and shows building on known techniques that given a collection of k algorithms, one master algorithm is constructed which performs similar to the best algorithm among the k for the acceptance problem.
Abstract: Resource allocation and admission control are critical tasks in a communication network, that often must be performed online. Algorithms for these types of problems have been considered both under benefit models (e.g., with a goal of approximately maximizing the number of calls accepted) and under cost models (e.g., with a goal of approximately minimizing the number of calls rejected). Unfortunately, algorithms designed for these two measures can often be quite different, even polar opposites (e.g., [1, 8]). In this work we consider the problem of combining algorithms designed for each of these objectives in a way that simultaneously is good under both measures. More formally, we are given an algorithm A which is cA competitive w.r.t. the number of accepted calls and an algorithm R which is cR competitive w.r.t. the number of rejected calls. We derive a combined algorithm whose competitive ratio is O(cRcA) for rejection and O(cA2) for acceptance. We also show building on known techniques that given a collection of k algorithms, we can construct one master algorithm which performs similar to the best algorithm among the k for the acceptance problem and another master algorithm which performs similar to the best algorithm among the k for the rejection problem. Using our main result we can combine the two master algorithms to a single algorithm which guarantees both rejection and acceptance competitiveness.
TL;DR: A new family of topic- ranking algorithms for multi-labeled documents that achieve state-of-the-art results and outperforms topic-ranking adaptations of Rocchio's algorithm and of the Perceptron algorithm are described.
Abstract: We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for the algorithms stem from recent advances in online learning algorithms. The algorithms are simple to implement and are also time and memory efficient. We provide a unified analysis of the family of algorithms in the mistake bound model. We then discuss experiments with the proposed family of topic-ranking algorithms on the Reuters-21578 corpus and the new corpus released by Reuters in 2000. On both corpora, the algorithms we present achieve state-of-the-art results and outperforms topic-ranking adaptations of Rocchio's algorithm and of the Perceptron algorithm.