TL;DR: In this article, error-correcting output codes are employed as a distributed output representation to improve the performance of decision-tree algorithms for multiclass learning problems, such as C4.5 and CART.
Abstract: Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k > 2 values (i.e., k "classes"). The definition is acquired by studying collections of training examples of the form (xi, f(xi)). Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of overfitting avoidance techniques such as decision-tree pruning. Finally, we show that--like the other methods--the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.
TL;DR: It is demonstrated that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.
Abstract: Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k > 2 values (i.e., k ``classes''). The definition is acquired by studying collections of training examples of the form [x_i, f (x_i)]. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of overfitting avoidance techniques such as decision-tree pruning. Finally, we show that---like the other methods---the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.
TL;DR: It is shown how to estimate the optimal weights of the ensemble members using unlabeled data and how the ambiguity can be used to select new training data to be labeled in an active learning scheme.
Abstract: Learning of continuous valued functions using neural network ensembles (committees) can give improved accuracy, reliable estimation of the generalization error, and active learning. The ambiguity is defined as the variation of the output of ensemble members averaged over unlabeled data, so it quantifies the disagreement among the networks. It is discussed how to use the ambiguity in combination with cross-validation to give a reliable estimate of the ensemble generalization error, and how this type of ensemble cross-validation can sometimes improve performance. It is shown how to estimate the optimal weights of the ensemble members using unlabeled data. By a generalization of query by committee, it is finally shown how the ambiguity can be used to select new training data to be labeled in an active learning scheme.
TL;DR: In this paper, a generalization of the stability of additive mappings has been proposed in the spirit of Hyers, Ulam, and Rassias, and the stability has been shown to be robust.
TL;DR: The learning and generalization characteristics of the random vector version of the Functional-link net are explored and compared with those attainable with the GDR algorithm and it seems that ‘ overtraining ’ occurs for stochastic mappings.
TL;DR: The problem of model selection, or determination of the number of hidden units, can be approached statistically, by generalizing Akaike's information criterion (AIC) to be applicable to unfaithful models with general loss criteria including regularization terms.
Abstract: The problem of model selection, or determination of the number of hidden units, can be approached statistically, by generalizing Akaike's information criterion (AIC) to be applicable to unfaithful (i.e., unrealizable) models with general loss criteria including regularization terms. The relation between the training error and the generalization error is studied in terms of the number of the training examples and the complexity of a network which reduces to the number of parameters in the ordinary statistical theory of AIC. This relation leads to a new network information criterion which is useful for selecting the optimal network model based on a given training set. >
TL;DR: Inductive Learning Algorithms for complex Systems Modeling is a professional monograph that surveys new types of learning algorithms for modelling complex scientific systems in science and engineering.
Abstract: Introduction: Systems and Cybernetics. Inductive Learning Algorithms: Self-Organization Method. Network Structures. Long Term Quantitative Predictions. Dialogue Language Generalization. Noise Immunity and Convergence: Analogy with Information Theory. Classification and Analysis of Criteria. Improvement of Noise Immunity. Asymptotic Properties of Criteria. Balance Criterion of Predictions. Convergence of Algorithms. Physical Fields and Modeling: Finite-Difference Pattern Schemes. Comparative Studies. Cyclic Processes. Clusterization and Recognition: Self-Organization Modeling and Clustering. Methods of Self-Organization Clustering. Objective Computer Clustering Algorithm. Levels of Discretization and Balance Criterion. Forecasting Methods of Analogues. Applications: Fields of Application. Weather Modeling. Ecological System Studies. Modeling of Economical Systems. Agricultural System Studies. Modeling of Solar Activity. Inductive and Deductive Networks: Self-Organization Mechanism in the Networks. Network Techniques. Generalization. Comparison and Simulation Results. Basic Algorithms and Program Listings: Computational Aspects of Multilayered Algorithm. Computational Aspects of Combinatorial Algorithm. Computational Aspects of Harmonical Algorithm.
TL;DR: This paper substantially improves RFCM by generalizing it to the case of arbitrary (symmetric) dissimilarity data, and is applicable to any numerical relational data that are positive, reflexive (or anti-reflexive) and symmetric.
TL;DR: In this article, a natural category of locally and topologically ringed spaces is defined, which contains both the category of noetherian formal schemes and the categories of rigid analytic varieties as full subcategories.
Abstract: In this paper we construct a natural category ~r of locally and topologically ringed spaces which contains both the category of locally noetherian formal schemes and the category of rigid analytic varieties as full subcategories. This category has applications in algebraic geometry and rigid analytic geometry. The idea of the definition of the category ~r is the following. From a formal point of view there is a certain similarity in constructing formal schemes and rigid analytic varieties. In both cases one starts with a certain class of topological rings (the adic rings in formal geometry and Tate algebras in rigid geometry), defines to every topological ring of this class a locally and topologically ringed space, and glueing of such spaces give formal schemes or rigid analytic varieties. There is a natural class of topological rings which contains both the noetherian adic rings and the Tate algebras and which suggests itself. Namely the class of topological rings which have an open adic subring with a finitely generated ideal of definition. We call such a ring f-adic. The points of the formal scheme SpfA associated with an adic ring A are the open prime ideals of A, and the points of the rigid analytic variety SpA associated with a Tate algebra A are the maximal ideals of A. In both cases one can consider the points as continuous valuations of A. (A valuation v: A ~ F~ U {0} of a topological ring A is called continuous if the mapping v is continuous with respect to the ring topology of A and the order-induced topology of Fv U {0}.) Namely, if p is an open prime ideal of an adic ring A then the trivial valuation vp of A with vp (a) = 0 iff a C p is continuous, and if p is a maximal ideal of a Tate algebra A over a valued field k then the
TL;DR: In this article, it was shown that the motion of a curve selects hierarchies of integrable dynamics, such as the Korteweg-de Vries hierarchy, the Schrodinger hierarchy, and the Schroff hierarchy.
TL;DR: The main objects of the paper are the developing of the concept of convex directions for quasipolynomials and exploiting this concept for construction of testing sets for quAsipoly Nomial families.
Abstract: There are two fundamental results available when we study stability of a polynomial family that is described by convex polytope in the coefficient space: the edge theorem and the theory based on the concept of convex directions. Many known results can be explained with these two results. This paper deals with a generalization of this line of research to the case of quasipolynomials that are entire functions which include both degree of the independent variable and exponential functions. The main objects of the paper are the developing of the concept of convex directions for quasipolynomials and exploiting this concept for construction of testing sets for quasipolynomial families. One of the primary sources of motivation for the class of problems considered in this paper is derived from process control. A typical problem formulation almost always includes a delay element in each subsystem process block. When we interconnect a number of such blocks in a feedback system, the study of robust stability involves quasipolynomials of the sort considered in this paper. >
TL;DR: It is demonstrated that a simple pruning/retraining method effectively improves the generalization performance of recurrent neural networks trained to recognize regular languages and permits the extraction of symbolic knowledge in the form of deterministic finite-state automata which are more consistent with the rules to be learned.
Abstract: The experimental results in this paper demonstrate that a simple pruning/retraining method effectively improves the generalization performance of recurrent neural networks trained to recognize regular languages. The technique also permits the extraction of symbolic knowledge in the form of deterministic finite-state automata (DFA) which are more consistent with the rules to be learned. Weight decay has also been shown to improve a network's generalization performance. Simulations with two small DFA (/spl les/10 states) and a large finite-memory machine (64 states) demonstrate that the performance improvement due to pruning/retraining is generally superior to the improvement due to training with weight decay. In addition, there is no need to guess a 'good' decay rate. >
TL;DR: This paper considers a least-squares approach to function approximation and generalization and shows that better generalization will occur if the error criterion used in training the generalizer is modified by the addition of a specific regularization term.
Abstract: This paper considers a least-squares approach to function approximation and generalization. The particular problem addressed is one in which the training data are noiseless and the requirement is to define a mapping that approximates the data and that generalizes to situations in which data samples are corrupted by noise in the input variables. The least-squares approach produces a generalizer that has the form of a radial basis function network for a finite number of training samples. The finite sample approximation is valid provided that the perturbations due to noise on the expected operating conditions are large compared to the sample spacing in the data space. In the other extreme of small noise perturbations, a particular parametric form must be assumed for the generalizer. It is shown that better generalization will occur if the error criterion used in training the generalizer is modified by the addition of a specific regularization term. This is illustrated by an approximator that has a feedforward architecture and is applied to the problem of point-source location using the outputs of an array of receivers in the focal-plane of a lens. >
TL;DR: A linear-time algorithm for computing a centerpoint of a set ofn points in the plane, which is optimal compared with theO(n log3n) complexity of the previously best-known algorithm.
Abstract: The notion of a centerpoint of a finite set of points in two and higher dimensions is a generalization of the concept of the median of a set of reals. In this paper we present a linear-time algorithm for computing a centerpoint of a set ofn points in the plane, which is optimal compared with theO(n log3n) complexity of the previously best-known algorithm. We use suitable modifications of the hamsandwich cut algorithm in [Me2] and the prune-and-search technique of Megiddo [Me1] to achieve this improvement.
TL;DR: The proposed evaluation technique has been used to investigate the generalization ability of back propagation, radial basis function and probabilistic neural network (PNN) classifiers for three test problems.
Abstract: This correspondence presents a method for evaluation of artificial neural network (ANN) classifiers. In order to find the performance of the network over all possible input ranges, a probabilistic input model is defined. The expected error of the output over this input range is taken as a measure of generalization ability. Two essential elements for carrying out the proposed evaluation technique are estimation of the input probability density and numerical integration. A nonparametric method, which depends on the nearest M neighbors, is used to locally estimate the distribution around each training pattern. An orthogonalization procedure is utilized to determine the covariance matrices of local densities. A Monte Carlo method is used to perform the numerical integration. The proposed evaluation technique has been used to investigate the generalization ability of back propagation (BP), radial basis function (RBF) and probabilistic neural network (PNN) classifiers for three test problems. >
TL;DR: This paper investigates several aspects of the resulting generalized temporal relations, including the ability to query a predecessor relation from a successor relation, and the presented framework for generalization and specialization allows one to precisely characterize and compare temporal relations and the application systems in which they are embedded.
Abstract: A standard relation has two dimensions: attributes and tuples. A temporal relation contains two additional orthogonal time dimensions: valid time records when facts are true in the modeled reality, and transaction time records when facts are stored in the temporal relation. Although there are no restrictions between the valid time and transaction time associated with each fact, in many practical applications the valid and transaction times exhibit restricted interrelationships that define several types of specialized temporal relations. This paper examines areas where different specialized temporal relations are present. In application systems with multiple, interconnected temporal relations, multiple time dimensions may be associated with facts as they flow from one temporal relation to another. The paper investigates several aspects of the resulting generalized temporal relations, including the ability to query a predecessor relation from a successor relation. The presented framework for generalization and specialization allows one to precisely characterize and compare temporal relations and the application systems in which they are embedded. The framework's comprehensiveness and its use in understanding temporal relations are demonstrated by placing previously proposed temporal data models within the framework. The practical relevance of the defined specializations and generalizations is illustrated by sample realistic applications in which they occur. The additional semantics of specialized relations are especially useful for improving the performance of query processing. >
TL;DR: The Small Span Theorem of Juedes and Lutz is taken as an example and a generalization for bounded query reductions is proved and it is shown that, for any c≥1 the class of nc-generic sets has p-measure 1.
Abstract: Recently Lutz [14,15] introduced a polynomial time bounded version of Lebesgue measure. He and others (see e.g. [11,13,14,15,16,17,18,20]) used this concept to investigate the quantitative structure of Exponential Time (E=DTIME(2lin)). Previously, Ambos-Spies, Fleischhack and Huwig [2,3] introduced polynomial time bounded genericity concepts and used them for the investigation of structural properties of NP (under appropriate assumptions) and E. Here we relate these concepts to each other. We show that, for any c≥1 the class of nc-generic sets has p-measure 1. This allows us to simplify and extend certain p-measure 1-results. To illustrate the power of generic sets we take the Small Span Theorem of Juedes and Lutz [11] as an example and prove a generalization for bounded query reductions.
TL;DR: In this paper, a generalization of the Filippov theorem for stochastic differential inclusions is presented, with an application to linearization of differential-inclusions and to infinitesimal behavior of solutions.
Abstract: We prove a generalization of the Filippov Theorem, [4], for stochastic differential inclusions, and present an application to linearization of differential inclusions and to infinitesimal behaviour of solutions
TL;DR: A formal analysis of the universal suffrage operator is presented, providing theoretical explanations of the experimentally observed behaviour, and a long term control strategy, called “Tories and Whigs”, is proposed in order to overcome the problem of lethal matings between uncompatible disjuncts.
Abstract: REGAL is a Distributed Genetic Algorithm designed for learning concept descriptions from examples, in First Order Logic In particular, each individual in the population represents a conjunctive formula in VL2 language In order to increase the efficiency of the generalization process, REGAL has been provided with a new selection operator, called Universal Suffrage operator, which guarantees (in probability) to maintain a population covering all the learning events As generalization mostly takes place when two individuals covering different sets of examples are crossed, the global generalization capability of the system is increased Moreover, in the case of disjunctive or multiple concepts, the universal suffrage algorithm allows the formation of different species, each one corresponding to a different disjunct In this way, all the disjuncts can be learned in parallel obtaining, in average, more general solutions than by learning them one at a time A formal analysis of the universal suffrage operator is presented, providing theoretical explanations of the experimentally observed behaviour A comparison with the classical selection algorithm and with the sharing function method is also made Finally, a long term control strategy, called “Tories and Whigs”, is proposed in order to overcome the problem of lethal matings between uncompatible disjuncts The effectiveness of REGAL is demonstrated on several learning problems
TL;DR: A syntactic notion of minimal multiple generalizations (mmg for short) is introduced to study the inferability of classes of unions of a bounded number of pattern languages to be polynomial time inferable from positive data.
Abstract: A pattern is a string of constant symbols and variables. The language defined by a pattern p is the set of constant strings obtained from p by substituting nonempty constant strings for variables in p. In this paper we are concerning with polynomial time inference from positive data of the class of unions of a bounded number of pattern languages. We introduce a syntactic notion of minimal multiple generalizations (mmg for short) to study the inferability of classes of unions. If a pattern p is obtained from another pattern q by substituting nonempty patterns for variables in q, q is said to be more general than p. A set of patterns defines a union of their languages. A set Q of patterns is said to be more general than a set P of patterns if for any pattern p in P there exists a more general pattern q in Q than p. Clearly more general set of patterns defines larger unions. A k-minimal multiple generalization (k-mmg) of a set S of strings is a minimally general set of at most k patterns that defines a union containing S. The syntactic notion of minimality enables us to efficiently compute a candidate for a semantically minimal concept. We present a general methodology for designing an efficient algorithm to find a k-mmg. Under some conditions an mmg can be used as an appropriate hypothesis for inductive inference from positive data. As results several classes of unions of pattern languages are shown to be polynomial time inferable from positive data.
TL;DR: Moving finite elements book focuses on finite element methods for time-dependent partial differential equations with moving grids and grid generation techniques.
Abstract: Abstract This book is mainly concerned with finite element methods for time-dependent partial differential equations when the grids are allowed to move in time, but also describes grid generation techniques which include grid adjustment. The mechanism for grid movement derives from a generalization of the residual minimization technique which is familiar from the Galerkin finite element method. The book brings together most of the work done over the last decade or so which has been stimulated by Miller's original idea, and discusses the interrelationships between the techniques of the method and the established ideas of the method of characteristics, Hamilton's equations, the Legendre transformation, and grid equidistribution. The book highlights the issues involved and should provide the reader with a clear view of the current state of the subject and prompt further research.
TL;DR: This thesis proposes a generalization of FIR modeling by replacing the usual delay operator with discrete so-called Laguerre or K autz filters and shows how constructive methods from commutative and differential algebra can be applied.
Abstract: One of the hardest problem in system identification is that of model structure selection. In this thesis two different kinds of a priori process knowledge are used to address this fundamental problem.Concentrating on linear model structures, the first prior taken advantage of is knowledge about the systems' dominating time constants and resonance frequencies. The idea is to generalize FIR modeling by replacing the usual delay operator with discrete so-called Laguerre or K autz filters. The generalization is such that stability, the linear regression structure and the approximation ability of the FIR model structure is retained, whereas the prior is used to reduce the number of parameters needed to arrive at a reasonable model. Tailorized and efficient system identification algorithms for these model structures are detailed in this work. The usefulness of the proposed methods is demonstrated through concrete simulation and application studies.The other approach is referred to as semi-physical modeling. The main idea is to use simple physical insight into the application, often in terms of a set of unstructured equations, in order to come up with suitable nonlinear transformation of the raw measurements, so as to allow for a good model structure. Semi-physical modeling is less "ambitious" than physical modeling in that no complete physical structure is sought, just combinations of inputs and outputs that can be subjected to more or less standard model structures, such as linear regressions. The suggested modeling procedure shows a first step where symbolic computations are employed to determine a suitable model structure - a set of regressors. We show how constructive methods from commutative and differential algebra can be applied for this. Subsequently, different numerical schemes for finding a subset of "good" regressors and for estimating the corresponding linear-in-the-parameters model are discussed. We also deal with more informal tools such as the programming environment.Finally and perhaps more importantly, software tools supporting the suggested approaches have been designed and implemented.
TL;DR: In this paper, the authors define the property of kinematic fault tolerance and develop a general constructive proof of the existence of fault tolerant manipulators, based on which a planar manipulator with a minimal kinematics structure is designed.
TL;DR: This paper derives a measure of criticality of examples and presents an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task.
Abstract: Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms.
TL;DR: The systematic derivation of the generalization algorithms from the modal truth criterion obviates the need for carrying out a separate formal proof of correctness of the EBG algorithms, and provides an empirical demonstration of the relative utility of EBG in partial ordering, as opposed to total ordering, planning frameworks.
TL;DR: This study introduces potential influence diagrams, a generalization of standard influence diagrams in which each chance node is associated with an arbitrary nonnegative function (called a potential) instead of a conditional probability table, and develops a new reduction algorithm for computing optimal strategies.
TL;DR: It is shown that pruning arises naturally within both adaptive regularization schemes, and explicitly that both methods in some cases may increase the generalization error.
Abstract: Inspired by the recent upsurge of interest in Bayesian methods we consider adaptive regularization. A generalization based scheme for adaptation of regularization parameters is introduced and compared to Bayesian regularization. We show that pruning arises naturally within both adaptive regularization schemes. As model example we have chosen the simplest possible: estimating the mean of a random variable with known variance. Marked similarities are found between the two methods in that they both involve a “noise limit,” below which they regularize with infinite weight decay, i.e., they prune. However, pruning is not always beneficial. We show explicitly that both methods in some cases may increase the generalization error. This corresponds to situations where the underlying assumptions of the regularizer are poorly matched to the environment.
TL;DR: A pitch predictor exploiting the present interpolation strategy, with an update rate of 50 Hz, provides a subjective speed quality similar to a conventional pitch predictor where the parameters are updated for every pitch cycle.
Abstract: The pitch-predictor contributes greatly to the efficiency of current analysis-by-synthesis speech coders by mapping the past reconstructed signal into the present. However, for good performance, it is required that its parameters are updated often (one every 2.5-7.5 ms). A slower update rate of the pitch-predictor delay results in time misalignment between the original signal and the pitch-predictor contribution to the reconstructed signal and the pitch-predictor contribution to the reconstructed signal. The authors introduce a new procedure, that allows a slow update rate of the pitch-predictor parameters without this problem. In this method the original signal is modified in a closed-loop fashion such that the parameter values obtained by interpolation of open-loop estimates form the optimal encoding of the modified signal. This new paradigm is a generalization of the familiar analysis-by-synthesis principle. The generalized analysis-by-synthesis principle can be used for interpolation of both the pitch-predictor delay and gain. The authors compare, by means of a subjective test, speech signals encoded with different versions of the code-excited linear predictor delay and gain. They compare, by means of a subjective test, speech signals encoded with different versions of the code-excited linear predictor (CELP) coder. The comparison shows that a pitch predictor exploiting the present interpolation strategy, with an update rate of 50 Hz, provides a subjective speed quality similar to a conventional pitch predictor where the parameters are updated for every pitch cycle. >
TL;DR: In this article, the conditions générales d'utilisation (http://www.compositio.nl/) implique l'accord avec les conditions generales de utilisation, i.e., usage commerciale ou impression systématique, constitutive of an infraction pénale.
TL;DR: A new model of diagnostic reasoning called the “possible cause-effect graphs” (PCEG) model is proposed, and can be viewed as a generalization of the signed digraph model of process diagnosis.