TL;DR: This work designs efficient on-line algorithms that predict nearly as well as the best pruning of a planar decision graph and implicitly maintains one weight for each of the prunings.
Abstract: We design efficient on-line algorithms that predict nearly as well as the best pruning of a planar decision graph. We assume that the graph has no cycles. As in the previous work on decision trees, we implicitly maintain one weight for each of the prunings (exponentially many). The method works for a large class of algorithms that update its weights multiplicatively. It can also be used to design algorithms that predict nearly as well as the best convex combination of prunings.
TL;DR: It is shown that, when both positive and negative data are available, restrictions on the accessibility of the input data do not limit the learning capabilities if and only if the relevant iterative learners are allowed to query the history of the learning process or to store at least one carefully selected data element.
Abstract: This paper provides a systematic study of incremental learning from noise-free and from noisy data. As usual, we distinguish between learning from positive data and learning from positive and negative data, synonymously called learning from text and learning from informant. Our study relies on the notion of noisy data introduced by Stephan.The basic scenario, named iterative learning, is as follows. In every learning stage, an algorithmic learner takes as input one element of an information sequence for some target concept and its previously made hypothesis and outputs a new hypothesis. The sequence of hypotheses has to converge to a hypothesis describing the target concept correctly.We study the following refinements of this basic scenario. Bounded example-memory inference generalizes iterative inference by allowing an iterative learner to additionally store an a priori bounded number of carefully chosen data elements, while feedback learning generalizes it by allowing the iterative learner to additionally ask whether or not a particular data element did already appear in the input data seen so far.For the case of learning from noise-free data, we show that, when both positive and negative data are available, restrictions on the accessibility of the input data do not limit the learning capabilities if and only if the relevant iterative learners are allowed to query the history of the learning process or to store at least one carefully selected data element. This insight nicely contrasts the fact that, in case only positive data are available, restrictions on the accessibility of the input data seriously affect the learning capabilities of all versions of incremental learners.For the case of learning from noisy data, we present characterizations of all kinds of incremental learning in terms being independent from learning theory. The relevant conditions are purely structural ones. Surprisingly, when learning from noisy text and noisy informant is concerned, even iterative learners are exactly as powerful as unconstrained learning devices.
TL;DR: An overview of an inter-disciplinary research project whose goal is to elucidate the complex phenomenon of expressive music performance with the help of machine learning and automated discovery methods is given.
Abstract: The paper gives an overview of an inter-disciplinary research project whose goal is to elucidate the complex phenomenon of expressive music performance with the help of machine learning and automated discovery methods. The general research questions that guide the project are laid out, and some of the most important results achieved so far are briefly summarized (with an emphasis on the most recent and still very speculative work). A broad view of the discovery process is given, from data acquisition issues through data visualization to inductive model building and pattern discovery. It is shown that it is indeed possible for a machine to make novel and interesting discoveries even in a domain like music. The report closes with a few general lessons learned and with the identification of a number of open and challenging research problems.
TL;DR: In this paper, a linear ordered term tree (LOMT) is proposed to represent structural features common to semistructured data, which is a rooted tree pattern consisting of ordered tree structures and internal structured variables with distinct variable labels.
Abstract: In the fields of data mining and knowledge discovery, many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t are ordered and t has edge labels. In order to represent structural features common to such semistructured data, we propose a linear ordered term tree, which is a rooted tree pattern consisting of ordered tree structures and internal structured variables with distinct variable labels. For a set of edge labels Λ, let OTTΛ be the set of all linear ordered term trees. For a linear ordered term tree t in OTTΛ, the term tree language of t, denoted by LΛ (t), is the set of all ordered trees obtained from t by substituting arbitrary ordered trees for all variables in t. Given a set of ordered trees S, the minimal language problem for OTTLΛ = {LΛ (t) | t ∈ OTTΛ} is to find a linear ordered term tree t in OTTΛ such that LΛ (t) is minimal among all term tree languages which contain all ordered trees in S. We show that the class OTTLΛ is polynomial time inductively inferable from positive data, by giving a polynomial time algorithm for solving the minimal language problem for OTTLΛ.
TL;DR: A boosting algorithm is constructed, which is the first both smooth and adaptive booster, and eventually a boosting "tandem", which allows solving adaptively problems whose solution is based on smooth boosting, preserving the original solution's complexity.
Abstract: We construct a boosting algorithm, which is the first both smooth and adaptive booster. These two features make it possible to achieve performance improvement for many learning tasks whose solution use a boosting technique.Originally, the boosting approach was suggested for the standard PAC model; we analyze possible applications of boosting in the model of agnostic learning (which is "more realistic" than PAC). We derive a lower bound for the final error achievable by boosting in the agnostic model; we show that our algorithm actually achieves that accuracy (within a constant factor of 2): When the booster faces distribution D, its final error is bounded above by 1/1/2-s errD(F) + ?, where errD? (F) + s is an upper bound on the error of a hypothesis received from the (agnostic) weak learner when it faces distribution D? and ? is any real, so that the complexity of the boosting is polynomial in 1/?. We note that the idea of applying boosting in the agnostic model was first suggested by Ben-David, Long and Mansour and the above accuracy is an exponential improvement w.r.t. s over their result ( 1/1/2-s errD(F)2(1/2-s)2/ ln(1/s-1) + ?).Eventually, we construct a boosting "tandem", thus approaching in terms of O the lowest number of the boosting iterations possible, as well as in terms of O the best possible smoothness. This allows solving adaptively problems whose solution is based on smooth boosting (like noise tolerant boosting and DNF membership learning), preserving the original solution's complexity.
TL;DR: This paper discusses the main principles of learning graphical models from data and considers briefly some algorithms that have been proposed for this task as well as data preprocessing methods and evaluation measures.
Abstract: Data Mining, or Knowledge Discovery in Databases, is a fairly young research area that has emerged as a reply to the flood of data we are faced with nowadays. It tries to meet the challenge to develop methods that can help human beings to discover useful patterns in their data. One of these techniques -- and definitely one of the most important, because it can be used for such frequent data mining tasks like classifier construction and dependence analysis -- is learning graphical models from datasets of sample cases. In this paper we review the ideas underlying graphical models, with a special emphasis on the less well known possibilistic networks. We discuss the main principles of learning graphical models from data and consider briefly some algorithms that have been proposed for this task as well as data preprocessing methods and evaluation measures.
TL;DR: The conventional view on nondeterminism in patterns inspired by formal language theory is transformed into an approach that meets the requirements of inductive inference and will lead to some useful learnability criteria for classes of terminal-free extended pattern languages.
Abstract: The question of learnability of the class of extended pattern languages is considered to be one of the eldest and outstanding open problems in inductive inference of formal languages. This paper provides an appropriate answer presenting a subclass - the terminal-free extended pattern languages - that is not learnable in the limit. In order to achieve this result we will have to limit the respective alphabet of terminal symbols to exactly two letters.In addition we will focus on the impact of ambiguity of pattern languages on inductive inference of terminal-free extended pattern languages. The conventional view on nondeterminism in patterns inspired by formal language theory is transformed into an approach that meets the requirements of inductive inference. These studies will lead to some useful learnability criteria for classes of terminal-free extended pattern languages.
TL;DR: A new combinatorial characterization of polynomial learnability from equivalence queries is proved and two models of query learning in which there is a probability distribution on the instance space are proposed.
Abstract: We prove a new combinatorial characterization of polynomial learnability from equivalence queries, and state some of its consequences relating the learnability of a class with the learnability via equivalence and membership queries of its subclasses obtained by restricting the instance space. Then we propose and study two models of query learning in which there is a probability distribution on the instance space, both as an application of the tools developed from the combinatorial characterization and as models of independent interest.
TL;DR: It is possible to construct a TCM (guaranteed to be well-calibrated even if the assumption is wrong) that performs asymptotically as well as the best region predictor under P.
Abstract: Transductive Confidence Machine (TCM) is a way of converting standard machine-learning algorithms into algorithms that output predictive regions rather than point predictions. It has been shown recently that TCM is well-calibrated when used in the on-line mode: at any confidence level 1 - ?, the long-run relative frequency of errors is guaranteed not to exceed ? provided the examples are generated independently from the same probability distribution P. Therefore, the number of "uncertain" predictive regions (i.e., those containing more than one label) becomes the sole measure of performance. The main result of this paper is that for any probability distribution P (assumed to generate the examples), it is possible to construct a TCM (guaranteed to be well-calibrated even if the assumption is wrong) that performs asymptotically as well as the best region predictor under P.
TL;DR: In this paper, the authors consider using online large margin classification algorithms in a setting where the target classifier may change over time and show that an aggressive tuning is often useful even if the goal is just to minimise the number of mistakes.
Abstract: We consider using online large margin classification algorithms in a setting where the target classifier may change over time The algorithms we consider are Gentile's ALMA, and an algorithm we call NORMA which performs a modified online gradient descent with respect to a regularised risk The update rule of ALMA includes a projection-based regularisation step, whereas NORMA has a weight decay type of regularisation For ALMA we can prove mistake bounds in terms of the total distance the target moves during the trial sequence For NORMA, we need the additional assumption that the movement rate stays sufficiently low uniformly over time In addition to the movement of the target, the mistake bounds for both algorithms depend on the hinge loss of the target Both algorithms use a margin parameter which can be tuned to make them mistake-driven (update only when classification error occurs) or more aggressive (update when the confidence of the classification is below the margin) We get similar mistake bounds both for the mistake-driven and a suitable aggressive tuning Experiments on artificial data confirm that an aggressive tuning is often useful even if the goal is just to minimise the number of mistakes
TL;DR: It is proved that if arbitrary unions of pattern languages with fixed length substitutions can be learned efficiently then DNFs are efficiently learnable in the mistake bound model.
Abstract: We present efficient on-line algorithms for learning unions of a constant number of tree patterns, unions of a constant number of one-variable pattern languages, and unions of a constant number of pattern languages with fixed length substitutions. By fixed length substitutions we mean that each occurrence of variable xi must be substituted by terminal strings of fixed length l(xi). We prove that if arbitrary unions of pattern languages with fixed length substitutions can be learned efficiently then DNFs are efficiently learnable in the mistake bound model. Since we use a reduction to Winnow, our algorithms are robust against attribute noise. Furthermore, they can be modified to handle concept drift. Also, our approach is quite general and we give results to learn a class that generalizes pattern languages.
TL;DR: The embeddability of a given concept class into a class of Euclidean half spaces of low dimension, or of arbitrarily large dimension but realizing a large margin, is studied.
Abstract: This paper discusses theoretical limitations of classification systems that are based on feature maps and use a separating hyperplane in the feature space. In particular, we study the embeddability of a given concept class into a class of Euclidean half spaces of low dimension, or of arbitrarily large dimension but realizing a large margin. New bounds on the smallest possible dimension or on the largest possible margin are presented. In addition, we present new results on the rigidity of matrices and briefly mention applications in complexity and learning theory.
TL;DR: A mathematics of which foundation itself is learning theoretic will be introduced, called Limit-Computable Mathematics, which was originally introduced as a means for "Proof Animation," which is expected to make interactive formal proof development easier.
Abstract: Learning theoretic aspects of mathematics and logic have been studied by many authors. They study how mathematical and logical objects are algorithmically "learned" (inferred) from finite data. Although the subjects of studies are mathematical objects, the objective of the studies are learning. In this paper, a mathematics of which foundation itself is learning theoretic will be introduced. It is called Limit-Computable Mathematics. It was originally introduced as a means for "Proof Animation," which is expected to make interactive formal proof development easier. Although the original objective was not learning theoretic at all, learning theory is indispensable for our research.
TL;DR: In this paper, a simple method to perform gradient descent efficiently is proposed for learning to swim by a complex simulated articulated robot, with 4 control variables and 12 independent state variables, and it was tested successfully on an original task of learning to swim by an articulated robot.
Abstract: Local linear function approximators are often preferred to feedforward neural networks to estimate value functions in reinforcement learning. Still, motor tasks usually solved by this kind of methods have a low-dimensional state space. This article demonstrates that feed-forward neural networks can be applied successfully to high-dimensional problems. The main difficulties of using backpropagation networks in reinforcement learning are reviewed, and a simple method to perform gradient descent efficiently is proposed. It was tested successfully on an original task of learning to swim by a complex simulated articulated robot, with 4 control variables and 12 independent state variables.
TL;DR: We show that connections can actually be made at a fundamental level, and result in a parametrized logic that needs topological notions for its early developments, and notions from learning theory for interpretation and applicability.
Abstract: Many connections have been established between learning and logic, or learning and topology, or logic and topology Still, the connections are not at the heart of these fields Each of them is fairly independent of the others when attention is restricted to basic notions and main results We show that connections can actually be made at a fundamental level, and result in a parametrized logic that needs topological notions for its early developments, and notions from learning theory for interpretation and applicabilityOne of the key properties of first-order logic is that the classical notion of logical consequence is compact We generalize the notion of logical consequence, and we generalize compactness to s-weak compactness where s is an ordinal The effect is to stratify the set of generalized logical consequences of a theory into levels, and levels into layers Deduction corresponds to the lower layer of the first level above the underlying theory, learning with less than s mind changes to layer s of the first level, and learning in the limit to the first layer of the second level Refinements of Borel-like hierarchies provide the topological tools needed to develop the framework
TL;DR: The class RPk of sets of at most k regular patterns is shown to be polynomial time inferable from positive examples using the efficient algorithm of the MINL problem due to Arimura et al.[5], provided the number of constant symbols is greater than k + 1.
Abstract: A regular pattern is a string of constant symbols and distinct variables. A semantics of a set P of regular patterns is a union L(P) of erasing pattern languages generated by patterns in P. The paper deals with the class RPk of sets of at most k regular patterns, and an efficient learning from positive examples of the language class defined by RPk. In efficient learning languages, the complexity for the MINL problem to find one of minimal languages containing a given sample is one of very important keys. Arimura et al.[5] introduced a notion of compactness w.r.t. containment for more general framework, called generalization systems, than RPk of language description which guarantees the equivalency between the semantic containment L(P) ? L(Q) and the syntactic containment P ? Q, where ? is a syntactic subsumption over the generalization systems.Under the compactness, the MINL problem reduces to finding one of minimal sets in RPk for a given sample under the subsumption ?. They gave an efficient algorithm to find such minimal sets under the assumption of compactness and some conditions.We first show that for each k ? 1, the class RPk has compactness if and only if the number of constant symbols is greater than k+1. Moreover, we prove that for each P ? RPk, a finite subset S2(P) is a characteristic set of L(P) within the class, where S2(P) consists of strings obtained from P by substituting strings with length two for each variable. Then our class RPk is shown to be polynomial time inferable from positive examples using the efficient algorithm of the MINL problem due to Arimura et al.[5], provided the number of constant symbols is greater than k + 1.
TL;DR: It is shown that the cost associated with the well-known add-one rule equals ln(1+(m-1)/(n+1)) thereby extending a result of Forster and Warmuth [3,2] to m ?
Abstract: We consider a problem that is related to the "Universal Encoding Problem" from information theory. The basic goal is to find rules that map "partial information" about a distribution X over an m-letter alphabet into a guess X for X such that the Kullback-Leibler divergence between X and X is as small as possible. The cost associated with a rule is the maximal expected Kullback-Leibler divergence between X and X. First, we show that the cost associated with the well-known add-one rule equals ln(1+(m-1)/(n+1)) thereby extending a result of Forster and Warmuth [3,2] to m ? 3. Second, we derive an absolute (as opposed to asymptotic) lower bound on the smallest possible cost. Technically, this is done by determining (almost exactly) the Bayes error of the add-one rule with a uniform prior (where the asymptotics for n ? ? was known before). Third, we hint to tools from approximation theory and support the conjecture that there exists a rule whose cost asymptotically matches the theoretical barrier from the lower bound.
TL;DR: In this paper, the authors investigate reflective inductive inference of recursive functions and compare the learning power of reflective IIMs with each other as well as with the one of standard IIMs.
Abstract: In this paper, we investigate reflective inductive inference of recursive functions. A reflective IIM is a learning machine that is additionally able to assess its own competence.First, we formalize reflective learning from arbitrary example sequences. Here, we arrive at four different types of reflection: reflection in the limit, optimistic, pessimistic and exact reflection.Then, for learning in the limit, for consistent learning of three different types and for finite learning, we compare the learning power of reflective IIMs with each other as well as with the one of standard IIMs.Finally, we compare reflective learning from arbitrary input sequences with reflective learning from canonical input sequences. In this context, an open question regarding total-consistent identification could be solved: it holds T-CONSa ? T-CONS.
TL;DR: In this article, it was shown that if the curvature of the boundary of the set of superpredictions for a game vanishes in a nontrivial way, then there is no predictive complexity for the game.
Abstract: This paper shows that if the curvature of the boundary of the set of superpredictions for a game vanishes in a nontrivial way, then there is no predictive complexity for the game. This is the first result concerning the absence of complexity for games with convex sets of superpredictions. The proof is further employed to show that for some games there are no certain variants of weak predictive complexity. In the case of the absolute-loss game we reach a tight demarcation between the existing and non-existing variants of weak predictive complexity.
TL;DR: It is proved that B to be not in NUM is necessary for making learnability independent of the underlying complexity measure, and this result extends the criticism of Jain et al. (J.
Abstract: Blum and Blum (Inform. and Control 28 (1975) 125-155) showed that a class B of suitable recursive approximations to the halting problem K is reliably EX-learnable but left it open whether or not B is in NUM. By showing B to be not in NUM we resolve this old problem.Moreover, variants of this problem obtained by approximating any given recursively enumerable set A instead of the halting problem K are studied. All corresponding function classes U(A) are still EX-inferable but may fail to be reliably EX-learnable, for example if A is non-high and hypersimple.Blum and Blum (1975) considered only approximations to K defined by monotone complexity functions. We prove this condition to be necessary for making learnability independent of the underlying complexity measure. The class B of all recursive approximations to K generated by all total complexity functions is shown to be not even behaviorally correct learnable for a class of natural complexity measures. On the other hand, there are complexity measures such that B is EX-learnable. A similar result is obtained for all classes U(A).For natural complexity measures, B is shown to be not robustly learnable, but again there are complexity measures such that B and, more generally, every class U(A) is robustly EX-learnable. This result extends the criticism of Jain et al. (J. Comput. System Sci. 62(1) (2001) 178-212), since the classes defined by artificial complexity measures turn out to be robustly learnable while those defined by natural complexity measures are not robustly learnable.
TL;DR: In this article, a new formal framework of learning by consistency queries is introduced and studied, and the theoretical approach is implemented as the core technology of a prototypical development system named LExIKON which supports interactive information extraction in practically relevant cases exactly in the way described in the present paper.
Abstract: A new formal framework of learning - learning by consistency queries - is introduced and studied. The theoretical approach outlined here is implemented as the core technology of a prototypical development system named LExIKON which supports interactive information extraction in practically relevant cases exactly in the way described in the present paper.The overall scenario of learning by consistency queries for information extraction is formalized and different constraints on the query learners are discussed and formulated. The principle learning power of the resulting types of query learners is analyzed by comparing it to the power of well-known types of standard learning devices including unconstrained inductive inference machines as well as consistent, total, finite, and iterative learners.
TL;DR: In this article, the authors extend the notion of general dimension, a combinatorial characterization of learning complexity for arbitrary query protocols, to encompass approximate learning and derive close upper and lower bounds on the number of statistical queries needed.
Abstract: We extend the notion of general dimension, a combinatorial characterization of learning complexity for arbitrary query protocols, to encompass approximate learning. This immediately yields a characterization of the learning complexity in the statistical query model. As a further application, we consider approximate learning of DNF formulas and we derive close upper and lower bounds on the number of statistical queries needed. In particular, we show that with respect to the uniform distribution, and for any constant error parameter ? < 1/2, the number of statistical queries needed to approximately learn DNF formulas (over n variables and s terms) with tolerance ? = ?(1/s) is n?(log s).
TL;DR: In this article, it is shown that a significant improvement in accuracy can be obtained if class probabilities are calculated based on the intersection of the overlapping rules, or in case of an empty intersection, based on as few intersecting regions as possible.
Abstract: Several rule induction schemes generate hypotheses in the form of unordered rule sets. One important problem that has to be addressed when classifying examples with such hypotheses is how to deal with overlapping rules that predict different classes. Previous approaches to this problem calculate class probabilities based on the union of examples covered by the overlapping rules (as in CN2) or assumes rule independence (using naive Bayes). It is demonstrated that a significant improvement in accuracy can be obtained if class probabilities are calculated based on the intersection of the overlapping rules, or in case of an empty intersection, based on as few intersecting regions as possible.
TL;DR: This article showed that the simple bottom-up counterpart to the top-down hill-climbing algorithm is unable to learn in domains with dispersed examples, and that guided greedy generalization is impossible if the seed example differs in more than one attribute value from its nearest neighbor.
Abstract: In this paper, we close the gap between the simple and straight-forward implementations of top-down hill-climbing that can be found in the literature, and the rather complex strategies for greedy bottom-up generalization. Our main result is that the simple bottom-up counterpart to the top-down hill-climbing algorithm is unable to learn in domains with dispersed examples. In particular, we show that guided greedy generalization is impossible if the seed example differs in more than one attribute value from its nearest neighbor. We also perform an empirical study of the commonness of this problem is in popular benchmark datasets, and present average-case and worst-case results for the probability of drawing a pathological seed example in binary domains.
TL;DR: The central topic of the paper is the learnability of the recursively enumerable subspaces of V?/V, where V?
Abstract: The central topic of the paper is the learnability of the recursively enumerable subspaces of V?/V, where V? is the standard recursive vector space over the rationals with countably infinite dimension, and V is a given recursively enumerable subspace of V?. It is shown that certain types of vector spaces can be characterized in terms of learnability properties: V?/V is behaviourally correct learnable from text iff V is finitely dimensional, V?/V is behaviourally correct learnable from switching type of information iff V is finite-dimensional, 0-thin, or 1-thin. On the other hand, learnability from an informant does not correspond to similar algebraic properties of a given space. There are 0-thin spaces W1 and W2 such that W1 is not explanatorily learnable from informant and the infinite product (W1)? is not behaviourally correct learnable, while W2 and the infinite product (W2)? are both explanatorily learnable from informant.
TL;DR: In this article, the authors established versions of Descartes' rule of signs for radial basis function (RBF) neural networks and derived tight bounds for the Vapnik-Chervonenkis (VC) dimension and pseudo-dimension of these networks.
Abstract: We establish versions of Descartes' rule of signs for radial basis function (RBF) neural networks. These RBF rules of signs provide tight bounds for the number of zeros of univariate networks with certain parameter restrictions. Moreover, they can be used to derive tight bounds for the Vapnik-Chervonenkis (VC) dimension and pseudo-dimension of these networks. In particular, we show that these dimensions are no more than linear. This result contrasts with previous work showing that RBF neural networks with two and more input nodes have superlinear VC dimension. The rules give rise also to lower bounds for network sizes, thus demonstrating the relevance of network parameters for the complexity of computing with RBF neural networks.
TL;DR: A further refinement is given, so-called minimized residue hypotheses, which constitute an interesting trade-off between these two types of smallness, which can be regarded as 'logical smallness' of hypotheses, in contrast to 'syntacticalSmallness'.
Abstract: In the field of deductive logic, relevant logic has been investigated for a long time, as a means to derive only conclusions which are related to all premises. Our proposal is to apply this concept of relevance as a criterion of appropriateness to hypotheses in inductive logic, and in this paper we present some special hypotheses called residue hypotheses, which satisfy such kind of appropriateness. This concept of relevance is different from those often introduced in the field of Inductive Logic Programming. While those aimed at the reduction of search spaces, which went hand in hand with postulating criteria which restricted the appropriateness of formulae as hypotheses, the relevance concept presented in this paper can be regarded as 'logical smallness' of hypotheses, in contrast to 'syntactical smallness'. We also give a further refinement, so-called minimized residue hypotheses, which constitute an interesting trade-off between these two types of smallness. We also give some results on bottom clauses and relevance.
TL;DR: The notion of general dimension is used to show that any p-evaluatable concept class withp olynomial query complexity can be learned in polynomial time with the help of an oracle in thePolynomial hierarchy, where the complexity of the required oracle depends on the query-types used by the learning algorithm.
Abstract: We use the notion of general dimension to show that any p-evaluatable concept class withp olynomial query complexity can be learned in polynomial time with the help of an oracle in the polynomial hierarchy, where the complexity of the required oracle depends on the query-types used by the learning algorithm. In particular, we show that for subset and superset queries an oracle in ?3P suffices. Since the concept class of DNF formulas has polynomial query complexity with respect to subset and superset queries with DNF formulas as hypotheses, it follows that DNF formulas are properly learnable in polynomial time with subset and superset queries and the help of an oracle in ?3P. We also show that the required oracle in our main theorem cannot be replaced by an oracle in a lower level of the polynomial-time hierarchy, unless the hierarchy collapses.
TL;DR: This work considers the problem of Probably Approximately Correct (PAC) learning embedded midbit functions, where the set S ⊂ {x1,...,xn} of relevant variables on which the midbit depends is unknown to the learner.
Abstract: A midbit function on l binary inputs x1,...,xl outputs the middle bit in the binary representation of x1+...+xl. We consider the problem of Probably Approximately Correct (PAC) learning embedded midbit functions, where the set S ⊂ {x1,...,xn} of relevant variables on which the midbit depends is unknown to the learner.To motivate this problem, we first point out that a result of Green et al. implies that a polynomial time learning algorithm for the class of embedded midbit functions would immediately yield a fairly efficient (quasipolynomial time) (PAC) learning algorithm for the entire complexity class ACC. We then give two different subexponential learning algorithms, each of which learns embedded midbit functions under any probability distribution in 2√n log n time. Finally, we give a polynomial time algorithm for learning embedded midbit functions under the uniform distribution.
TL;DR: In this article, the authors study the learnability of a mixture of lines model, which is of great importance in machine vision, computer graphics, and computer aided design applications, and describe an efficient probably approximately correct (PAC) algorithm for solving the problem.
Abstract: In this paper we study the learnability of a mixture of lines model which is of great importance in machine vision, computer graphics, and computer aided design applications The mixture of lines is a partially-probabilistic model for an image composed of line-segments Observations are generated by choosing one of the lines at random and picking a point at random from the chosen line Each point is contaminated with some noise whose distribution is unknown, but which is bounded in magnitude Our goal is to discover efficiently and rather accurately the line-segments that generated the noisy observations We describe and analyze an efficient probably approximately correct (PAC) algorithm for solving the problem Our algorithm combines techniques from planar geometry with simple large deviation tools and is simple to implement