TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.
TL;DR: A generalization of the numerical renormalization-group procedure used first by Wilson for the Kondo problem is presented and it is shown that this formulation is optimal in a certain sense.
Abstract: A generalization of the numerical renormalization-group procedure used first by Wilson for the Kondo problem is presented. It is shown that this formulation is optimal in a certain sense. As a demonstration of the effectiveness of this approach, results from numerical real-space renormalization-group calculations for Heisenberg chains are presented.
TL;DR: The conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate.
TL;DR: A generalization of Allen's interval-based approach to temporal reasoning is presented and the notion of ‘conceptual neighborhood’ of qualitative relations between events is central to the presented approach, using semi-intervals rather than intervals as the basic units of knowledge.
TL;DR: It is shown that for smooth networks, i.e., those with continuously varying weights and smooth transfer functions, the generalization curve asymptotically obeys an inverse power law, while for nonsmooth networks other behaviors can appear, depending on the nature of the nonlinearities as well as the realizability of the rule.
Abstract: Learning from examples in feedforward neural networks is studied within a statistical-mechanical framework. Training is assumed to be stochastic, leading to a Gibbs distribution of networks characterized by a temperature parameter T. Learning of realizable rules as well as of unrealizable rules is considered. In the latter case, the target rule cannot be perfectly realized by a network of the given architecture. Two useful approximate theories of learning from examples are studied: the high-temperature limit and the annealed approximation. Exact treatment of the quenched disorder generated by the random sampling of the examples leads to the use of the replica theory. Of primary interest is the generalization curve, namely, the average generalization error ${\mathrm{\ensuremath{\epsilon}}}_{\mathit{g}}$ versus the number of examples P used for training. The theory implies that, for a reduction in ${\mathrm{\ensuremath{\epsilon}}}_{\mathit{g}}$ that remains finite in the large-N limit, P should generally scale as \ensuremath{\alpha}N, where N is the number of independently adjustable weights in the network. We show that for smooth networks, i.e., those with continuously varying weights and smooth transfer functions, the generalization curve asymptotically obeys an inverse power law. In contrast, for nonsmooth networks other behaviors can appear, depending on the nature of the nonlinearities as well as the realizability of the rule. In particular, a discontinuous learning transition from a state of poor to a state of perfect generalization can occur in nonsmooth networks learning realizable rules.We illustrate both gradual and continuous learning with a detailed analytical and numerical study of several single-layer perceptron models. Comparing with the exact replica theory of perceptron learning, we find that for realizable rules the high-temperature and annealed theories provide very good approximations to the generalization performance. Assuming this to hold for multilayer networks as well, we propose a classification of possible asymptotic forms of learning curves in general realizable models. For unrealizable rules we find that the above approximations fail in general to predict correctly the shapes of the generalization curves. Another indication of the important role of quenched disorder for unrealizable rules is that the generalization error is not necessarily a monotonically increasing function of temperature. Also, unrealizable rules can possess genuine spin-glass phases indicative of degenerate minima separated by high barriers.
TL;DR: In this article, the authors reviewed the assumptions of conventional effective-mass theory, especially the one of continuity of the envelope function at an abrupt interface, and the need for a fresh approach becomes apparent.
Abstract: The assumptions of conventional effective-mass theory, especially the one of continuity of the envelope function at an abrupt interface, are reviewed critically so that the need for a fresh approach becomes apparent. A new envelope-function method, developed by the author over the past few years, is reviewed. This new method is based on both a generalization and a novel application to microstructures of the Luttinger-Kohn envelope-function expansion. The differences between this new method and the conventional envelope-function method are emphasized. An alternative derivation of the new envelope-function equations, which are exact, to that already published is provided. A new and improved derivation of the author's effective-mass equation is given, in which the differences in the zone-centre eigenstates of the constituent crystals are taken into account. The cause of the kinks in the conventional effective-mass envelope function, at abrupt effective-mass changes, is identified.
TL;DR: It is shown that even high-order polynomial classifiers in high dimensional spaces can be trained with a small amount of training data and yet generalize better than classifiers with a smaller VC-dimension.
Abstract: Large VC-dimension classifiers can learn difficult tasks, but are usually impractical because they generalize well only if they are trained with huge quantities of data. In this paper we show that even high-order polynomial classifiers in high dimensional spaces can be trained with a small amount of training data and yet generalize better than classifiers with a smaller VC-dimension. This is achieved with a maximum margin algorithm (the Generalized Portrait). The technique is applicable to a wide variety of classifiers, including Perceptrons, polynomial classifiers (sigma-pi unit networks) and Radial Basis Functions. The effective number of parameters is adjusted automatically by the training algorithm to match the complexity of the problem. It is shown to equal the number of those training patterns which are closest patterns to the decision boundary (supporting patterns). Bounds on the generalization error and the speed of convergence of the algorithm are given. Experimental results on handwritten digit recognition demonstrate good generalization compared to other algorithms.
TL;DR: A new set of algorithms for locally–adaptive line generalization based on the so-called natural principle of objective generalization is described, which is compared with benchmarks based on both manual cartographic procedures and a standard method found in many geographical information systems.
Abstract: This article describes a new set of algorithms for locally–adaptive line generalization based on the so-called natural principle of objective generalization. The drawbacks of existing methods of line generalization are briefly discussed and the algorithms described. The performance of these new methods is compared with benchmarks based on both manual cartographic procedures and a standard method found in many geographical information systems.
TL;DR: This paper summarizes the results of a retrospective review of generalization in the context of social skills research with preschool children and reveals some differences concerning the practices employed by studies within each group.
Abstract: This paper summarizes the results of a retrospective review of generalization in the context of social skills research with preschool children. A review of studies from 22 journals (1976 to 1990) that assessed generalization as part of social interaction research provided information concerning the prevalence of studies that have assessed generalization, common practices concerning the production and assessment of generalization, and the overall success of obtaining generalization and maintenance of social behaviors. A comparison of the most and least successful studies, with respect to generalization, revealed some differences concerning the practices employed by studies within each group. Differences differentially related to the production of generalization are discussed and recommendations are provided to guide and support future research efforts.
TL;DR: This paper presents a meta-analysis of the generalization performance of backpropagation learning on a syllabification task in connection with Connectionism and natural language processing.
Abstract: Citation for published version (APA): Daelemans, W. M. P., & Bosch, A. P. J. (1992). Generalization performance of backpropagation learning on a syllabification task. In M. F. J. Drossaers, & A. Nijholt (Eds.), Connectionism and natural language processing: Proceedings of the third Twente Workshop on Language Technology, TWLT3, Enschede, May 12-13, 1992 (organized by Project Parlevink) (Vol. 3, pp. 27-38). (Memoranda informatica; Vol. 3, No. 92-64). University of Twente, Department of Computer Science.
TL;DR: In this article, a generalization of the standard normalized quadratic form has been proposed, which can provide a local second-order approximation while maintaining the correct curvature globally.
Abstract: In this paper, the authors propose and estimate a system of producer output supply and input demand functions that generalizes the standard normalized quadratic form. The generalization adds either linear or quadratic splines in a time (or technical change) variable, yet retains the main attractive property of the normalized quadratic, which is that it can provide a local second order approximation while maintaining the correct curvature globally. However, the generalization has additional desirable approximation properties with respect to the splined variable and, thus, permits a more flexible treatment of technical change than is provided by standard flexible functional forms. Copyright 1992 by Economics Department of the University of Pennsylvania and the Osaka University Institute of Social and Economic Research Association.
TL;DR: An improved version of a self-organizing network model which has been proposed at the ICANN-91 and since then has been applied to various problems is described, with the generalization of the model to arbitrary dimension and the introduction of a local estimate of the probability density.
Abstract: In this paper an improved version of a self-organizing network model is described which has been proposed at the ICANN-91[3] and since then has been applied to various problems [1,2,5]. The improvements presented here are the generalization of the model to arbitrary dimension and the introduction of a local estimate of the probability density. The latter leads to a very clear distinction between necessary and superfluous neurons with respect to modeling a given probability distribution. This makes it possible to automatically generate network structures that are nearly optimally suited for the distribution at hand.
TL;DR: In this paper, the problem of estimating the parameters of complete systems of simultaneous equations is considered, and a class of estimates is proposed, based on Aitken's generalized minimurn-variance approach.
Abstract: In this paper the problem of estimating the parameters of complete systems of simultaneous equations is considered. A class of estimates is proposed, based on Aitken’s generalized minimurn-variance approach. It is proved, under certain conditions, that a subclass yields consistent estimates, moreover that a subclass of this yields estimates with the same asymptotic covariance matrix as that of the limitedinformation maximum-likelihood estimates. The joint covariance matrix of estimates of different equations is also given. Both this matrix and the covariance matrix of the estimates of one equation remain valid if the system is nonlinear, under suitable conditions. It is proved that the asymptotic generalized variance of the estimates of one equation is never smaller, and in general larger, than the value obtained for this variance by a mechanical application of Fisher’s formula. Finally, a coefficient of simultaneous correlation for a complete linear system is proposed, which is the natural generalization of the coefficient of multiple correlation; it is closely related to Hotelling’s vector alienation coefficient Section 1 contains a brief introduction to the subject. In section 2 the estimation procedure is described and justified in an elementary way; only those members of the above-mentioned class are considered which appear to be especially promising. In section 2.6 the computational requirements are exposed; it appears that the procedure proposed is much simpler than limited-information maximumlikelihood. Both section 1 and section 2 are written in an expository style. Assumptions and theorems are to be found in section 3.
TL;DR: The fundamental updating process in the transferable belief model is related to the concept of specialization and can be described by a specialization matrix, and it is shown that Dempster's rule of conditioning corresponds essentially to the least committed specialization.
Abstract: The fundamental updating process in the transferable belief model is related to the concept of specialization and can be described by a specialization matrix. The degree of belief in the truth of a proposition is a degree of justified support. The Principle of Minimal Commitment implies that one should never give more support to the truth of a proposition than justified. We show that Dempster's rule of conditioning corresponds essentially to the least committed specialization, and that Dempster's rule of combination results essentially from commutativity requirements. The concept of generalization, dual to the concept of specialization, is described.
TL;DR: In this article, the minimality and realization theory for discrete time-varying finite dimensional linear systems with time varying state spaces has been developed, and the results appear as a natural generalization of the corresponding theory for the time independent case.
Abstract: The minimality and realization theory is developed for discrete time-varying finite dimensional linear systems with time-varying state spaces. The results appear as a natural generalization of the corresponding theory for the time-independent case. Special attention is paid to periodical systems. The case when the state space dimensions do not change in time is re-examined.
TL;DR: The project described in this article had two primary objectives: to design a strategy for terrain generalization that is adaptive to different terrain types, scales, and map purposes, and to implement and evaluate some components of this approach to assess its potential.
Abstract: The project described in this article had two primary objectives: to design a strategy for terrain generalization that is adaptive to different terrain types, scales, and map purposes, and to implement and evaluate some components of this approach to assess its potential. The strategy includes three different generalization methods: a global filtering procedure, a selective (iterative) filtering method, and a heuristic approach based on the generalization of the terrain's structure lines. For a given generalization problem that is constrained by the terrain character, map objective, scale, graphic limits, and data quality, the appropriate technique is selected through structure and process recognition procedures. Some of the key components of the strategy have been implemented and some experiments were conducted. Other parts were covered by proposing models that could serve as implementation guidelines. Our work was intended to break ground for future research. Recommendations for appropriate parameter se...
TL;DR: It is proved that unless NP ⊆ non-uniform P, not all theories have small Horn least-upper-bound approximations.
Abstract: Knowledge compilation speeds inference by creating tractable approximations of a knowledge base, but this advantage is lost if the approximations are too large. We show how learning concept generalizations can allow for a more compact representation of the tractable theory. We also give a general induction rule for generating such concept generalizations. Finally, we prove that unless NP ⊆ non-uniform P, not all theories have small Horn least-upper-bound approximations.
TL;DR: A stochastic learning algorithm based on simulated annealing in weight space is presented and the authors verify the convergence properties and feasibility of the algorithm.
Abstract: The authors discuss the requirements of learning for generalization, where the traditional methods based on gradient descent have limited success. A stochastic learning algorithm based on simulated annealing in weight space is presented. The authors verify the convergence properties and feasibility of the algorithm. An implementation of the algorithm and validation experiments are described. >
TL;DR: It is shown that reward-penalty style techniques emerge as a special case of system identification as part of the process of generalization in the n-dimensional hypercube.
TL;DR: A nonlinear generalization of the family of autoregressive signal models is introduced that leads to an interpolation strategy resembling a predictive counterpart to vector quantization for minimum mean-square error prediction.
Abstract: A nonlinear generalization of the family of autoregressive signal models is introduced. This generalization can be viewed as an autoregressive model with state-varying parameters. For such signals, minimum mean-square error prediction can be reformulated as an interpolation problem. A novel interpretation of the signal as a codebook for its own prediction leads to an interpolation strategy resembling a predictive counterpart to vector quantization. The applicability of this model is then demonstrated empirically for a variety of signals. >
TL;DR: The present paper clarifies asymptotic properties and their relation of two learning curves, one concerning the predictive loss or generalization loss and the other the training loss, which gives a natural definition of the complexity of a neural network.
Abstract: Learning curves show how a neural network is improved as the number of training examples increases and how it is related to the network complexity. The present paper clarifies asymptotic properties and their relation of two learning curves, one concerning the predictive loss or generalization loss and the other the training loss. The result gives a natural definition of the complexity of a neural network. Moreover, it provides a new criterion of model selection.
TL;DR: In this paper, a generalization of symmetric functions to a negative number of variables was proposed, which is similar to our generalization to the Stirling number in this direction.
TL;DR: For an additively written finite abelian group G, Davenport's constant D(G) is defined as the maximal length d of a sequence (g 1,..., gd) in G such that ∑d j=1 gj = 0, and ∑ j∈J gj 6= 0 for all ∅ 6= J {1,.,., d}. It has the following arithmetical meaning.
Abstract: 1. For an additively written finite abelian group G, Davenport’s constant D(G) is defined as the maximal length d of a sequence (g1, . . . , gd) in G such that ∑d j=1 gj = 0, and ∑ j∈J gj 6= 0 for all ∅ 6= J {1, . . . , d}. It has the following arithmetical meaning: Let K be an algebraic number field, R its ring of integers and G the ideal class group of R. Then D(G) is the maximal number of prime ideals (counted with multiplicity) which can divide an irreducible element of R. This fact was first observed by H. Davenport (1966) and worked out by W. Narkiewicz [8] and A. Geroldinger [4]. For a subset Z ⊂ R and x > 1 we denote by Z(x) the number of principal ideals (α) of R with α ∈ Z and (R : (α)) ≤ x. If M denotes the set of irreducible integers of R, then it was proved by P. Rémond [12] that, as x →∞, M(x) ∼ Cx(log x)−1(log log x)D(G)−1 ,
TL;DR: In this article, a generalization of the stochastic linearization method is proposed; namely, the nonlinear system is suggested to be replaced by a linear system equivalent to the original one in the following sense: the two systems should share common mean-square values of potential energies, as well as have coincident mean square values of energy dissipation function.
Abstract: In this study a generalization of the stochastic linearization method is proposed; namely the nonlinear system is suggested to be replaced by a linear system equivalent to the original one in the following sense: The two systems should share common mean-square values of potential energies, as well as have coincident mean square values of energy dissipation function. An example of a system with nonlinear damping and nonlinear stiffness is numerically evaluated, to elucidate the proposed method.
TL;DR: A "non-stylistic" approach is taken in the hope of establishing a general formulation that will be of use to more specific, style and genre-related theoretical work in contour, especially the mathematics of an abstract definition of contour itself.
Abstract: This paper explores some formal aspects of contour, especially the mathematics of an abstract definition of contour itself. In the hope of establishing a general formulation that will be of use to more specific, styleand genre-related theoretical work in contour, a "non-stylistic" approach is taken. Specific musical situations (like the equivalence classes generated by elementary transformations, or musical assumptions made by ethnomusicological contour studies) are not invoked: Further generalization of the theory of "the number of possible contours" includes the formulation of a theory of contour for asymmetrical and non-ternary contour descriptions, one we believe to be of musical interest.
TL;DR: In this paper, the authors determine the two groups linear discriminant function which minimizes the total probability of missclassification when a priori probabilities of the two classes are specified.
Abstract: In this paper we determine the two groups linear discriminant function which minimizes the total probability of missclassification when a priori probabilities of the two groups are specified. On one hand, this problem is shown to be a special case of the projection pursuit technique in discriminant analysis which provides efficient algorithms for solving optimization of this kind. On the other hand, this linear function is shown to be a generalization of the best linear discriminant function (the version concerned with the total probability of missclassification) introduced by Anderson & Bahadur (1962).Kernel estimation of the discrimination rule and a computer implementation are discussed, and it is shown that the estimate is consistent.