TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
TL;DR: Mixed logit as mentioned in this paper is a generalization of standard logit that does not exhibit the restrictive independence from irrelevant alternatives property and explicitly accounts for correlations in unobserved utility over repeated choices by each customer.
Abstract: Mixed logit models, also called random-parameters or error-components logit, are a generalization of standard logit that do not exhibit the restrictive “independence from irrelevant alternatives” property and explicitly account for correlations in unobserved utility over repeated choices by each customer. Mixed logits are estimated for households' choices of appliances under utility-sponsored programs that offer rebates or loans on high-efficiency appliances.
TL;DR: Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights.
Abstract: Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A/sup 3/ /spl radic/((log n)/m) (ignoring log A and log m factors), where m is the number of training patterns. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training. The proof techniques appear to be useful for the analysis of other pattern classifiers: when the input domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples.
TL;DR: A result is presented that allows one to trade off errors on the training sample against improved generalization performance, and a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets.
Abstract: The paper introduces some generalizations of Vapnik's (1982) method of structural risk minimization (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to trade off errors on the training sample against improved generalization performance. It then considers the more general case when the hierarchy of classes is chosen in response to the data. A result is presented on the generalization performance of classifiers with a "large margin". This theoretically explains the impressive generalization performance of the maximal margin hyperplane algorithm of Vapnik and co-workers (which is the basis for their support vector machines). The paper concludes with a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets. Four examples are given of such functions, including the Vapnik-Chervonenkis (1971) dimension measured on the sample.
TL;DR: In this paper, reliability generalization characterizes the typical reliability of scores for a given test across studies, the amount of variability in reliability coefficients for given measures, and the sources of variability of reliability coefficients across studies.
Abstract: Because tests are not reliable, it is important to explore score reliability in virtually all studies. The present article proposes and illustrates a new method-reliability generalization-that can be used in a meta-analysis application similar to validity generalization. Reliability generalization characterizes (a) the typical reliability of scores for a given test across studies, (b) the amount of variability in reliability coefficients for given measures, and (c) the sources of variability in reliability coefficients across studies. The use of reliability generalization is illustrated here by analyzing 87 reliability coefficients reported for the two scales of the Bem Sex Role Inventory (BSRI).
TL;DR: In this paper, a definition of weakω-categories based on a higher-order generalization of the apparatus of operads is presented, where weakω is defined as a class of weak operads.
TL;DR: In this paper, the Steiger-Lind root mean square error of approximation fit indexes and interval estimation procedure for models based on multiple independent samples is discussed and an approach that seems both reasonable and workable, and caution against one that definitely seems inappropriate.
Abstract: Generalization of the Steiger-Lind root mean square error of approximation fit indexes and interval estimation procedure to models based on multiple independent samples is discussed. In this article, we suggest an approach that seems both reasonable and workable, and caution against one that definitely seems inappropriate.
TL;DR: The Abelian sandpile model is the simplest analytically tractable model of self-organized criticality as discussed by the authors, which allows exact calculation of all the critical exponents for the directed model in all dimensions.
Abstract: The Abelian sandpile model is the simplest analytically tractable model of self-organized criticality. This paper presents a brief review of known results about the model. The abelian group structure allows an exact calculation of many of its properties. In particular, one can calculate all the critical exponents for the directed model in all dimensions. For the undirected case, the model is related to q= 0 Potts model. This enables exact calculation of some exponents in two dimensions, and there are some conjectures about others. We also discuss a generalization of the model to a network of communicating reactive processors. This includes sandpile models with stochastic toppling rules as a special case. We also consider a non-abelian stochastic variant, which lies in a different universality class, related to directed percolation.
TL;DR: The Hohenberg-Kohn energy functional as discussed by the authors is a generalization of the Legendre transform from the chemical potential to the number of particles (N), which is used in DFT to obtain the corresponding theorems.
Abstract: Density Functional Theory (DFT) is one of the most widely used methods for "ab initio" calculations of the structure of atoms, molecules, crystals, surfaces, and their interactions. Unfortunately, the customary introduction to DFT is often considered too lengthy to be included in various curricula. An alternative introduction to DFT is presented here, drawing on ideas which are well-known from thermodynamics, especially the idea of switching between different independent variables. The central theme of DFT, i.e. the notion that it is possible and beneficial to replace the dependence on the external potential v(r) by a dependence on the density distribution n(r), is presented as a straightforward generalization of the familiar Legendre transform from the chemical potential (\mu) to the number of particles (N). This approach is used here to introduce the Hohenberg-Kohn energy functional and to obtain the corresponding theorems, using classical nonuniform fluids as simple examples. The energy functional for electronic systems is considered next, and the Kohn-Sham equations are derived. The exchange-correlation part of this functional is discussed, including both the local density approximation to it, and its formally exact expression in terms of the exchange-correlation hole. A very brief survey of various applications and extensions is included.
TL;DR: In this paper, the problem of generalizing values of a case frame slot for a verb is viewed as that of estimating a conditional probability distribution over a partition of words, and a new generalization method based on the minimum description length (MDL) principle is proposed.
Abstract: A new method for automatically acquiring case frame patterns from large corpora is proposed. In particular, the problem of generalizing values of a case frame slot for a verb is viewed as that of estimating a conditional probability distribution over a partition of words, and a new generalization method based on the Minimum Description Length (MDL) principle is proposed. In order to assist with efficiency, the proposed method makes use of an existing thesaurus and restricts its attention to those partitions that are present as "cuts" in the thesaurus tree, thus reducing the generalization problem to that of estimating a "tree cut model" of the thesaurus tree. An efficient algorithm is given, which provably obtains the optimal tree cut model for the given frequency data of a case slot, in the sense of MDL. Case frame patterns obtained by the method were used to resolve PP-attachment ambiguity. Experimental results indicate that the proposed method improves upon or is at least comparable with existing methods.
TL;DR: This work investigates the problem of learning a classification task on data represented in terms of their pairwise proximities, which does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors.
Abstract: We investigate the problem of learning a classification task on data represented in terms of their pairwise proximities. This representation does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors, from which pairwise proximities can always be calculated. Our first approach is based on a combined linear embedding and classification procedure resulting in an extension of the Optimal Hyperplane algorithm to pseudo-Euclidean data. As an alternative we present another approach based on a linear threshold model in the proximity values themselves, which is optimized using Structural Risk Minimization. We show that prior knowledge about the problem can be incorporated by the choice of distance measures and examine different metrics W.r.t. their generalization. Finally, the algorithms are successfully applied to protein structure data and to data from the cat's cerebral cortex. They show better performance than K-nearest-neighbor classification.
TL;DR: It is claimed that the notion of existence dependency is always possible to classify objects according to this relationship, thus removing the necessity for the Part Of relation and other kinds of associations between object types.
Abstract: In object-oriented conceptual modeling, the generalization/specialization hierarchy and the whole/part relationship are prevalent classification schemes for object types. This paper presents an object-oriented conceptual model where, in the end, object types are classified according to two relationships only. Existence dependency and generalization/specialization. Existence dependency captures some of the interesting semantics that are usually associated with the concept of aggregation (also called composition or Part Of relation), but in contrast with the latter concept, the semantics of existence dependency are very precise and its use clear cut. The key advantage of classifying object types according to existence dependency are the simplicity of the concept, its absolute unambiguity, and the fact that it enables to check conceptual schemes for semantic integrity and consistency. We will first define the notion of existence dependency and claim that it is always possible to classify objects according to this relationship, thus removing the necessity for the Part Of relation and other kinds of associations between object types. The second claim of this paper is that existence dependency is the key to semantic integrity checking to a level unknown to current object-oriented analysis methods. In other words: Existence dependency allows us to track and solve inconsistencies in an object-oriented conceptual schema.
TL;DR: The aim of this paper is to observe school-case solutions available in standard cartographic books and try to replicate those automatically to preserve the overall structure with line bends which are mathematically defined according to size, shape, and context.
Abstract: Many solutions for line generalizations have already been proposed. Most of them, however, are geometric solutions, not cartographic ones. The position we take in this paper is to observe school-case solutions available in standard cartographic books and try to replicate those automatically. A central criterion guiding the process of cartographic generalization is line structure, which itself can be decomposed into a series of line bends. Hence our solution is to preserve the overall structure with line bends which are mathematically defined according to size, shape, and context. Rules are subsequently applied using operators such as elimination, combination, and exaggeration. The algorithms that were used are both procedural and knowledge based. Various experiments were conducted on physical and political geographic lines, and we show the graphical results so that readers may visually assess them. Further research to improve the present solutions is discussed, particularly options for avoiding conflicts ...
TL;DR: The authors showed 9-and 11-month-old infants imitation of animal and vehicle properties such as drinking from a cup or giving a ride. But infants generalized the properties broadly to both typical and novel exemplars within the appropriate domain, and rarely to exemplars from the inappropriate domain.
Abstract: Using little models, we showed 9- and 11-month-old infants events in which animal or vehicle properties were demonstrated, such as a dog drinking from a cup or a car giving a ride. The infants were tested on imitation of these properties on the same exemplars as used for the modeling, on generalization to other exemplars from the same domain, and on generalization to exemplars from an inappropriate domain. Infants generalized the properties broadly to both typical and novel exemplars within the appropriate domain, and rarely to exemplars from the inappropriate domain. It is concluded that at least by 9 months infants have formed global concepts of animals and vehicles that control the way infants learn the characteristic properties of these classes.
TL;DR: In this paper, the results of 30 years of investigation by the author into the creation of a new theory on statistical analysis of observations, based on the principle of random arrays of random vectors and matrices of increasing dimensions, are described.
Abstract: This book contains the results of 30 years of investigation by the author into the creation of a new theory on statistical analysis of observations, based on the principle of random arrays of random vectors and matrices of increasing dimensions It describes limit phenomena of sequences of random observations, which occupy a central place in the theory of random matrices This is the first book to explore statistical analysis of random arrays and provides the necessary tools for such analysis This book is a natural generalization of multidimensional statistical analysis and aims to provide its readers with new, improved estimators of this analysis The book consists of 14 chapters and opens with the theory of sample random matrices of fixed dimension, which allows to envelop not only the problems of multidimensional statistical analysis, but also some important problems of mechanics, physics and economics The second chapter deals with all 50 known canonical equations of the new statistical analysis, which form the basis for finding new and improved statistical estimators Chapters 3-5 contain detailed proof of the three main laws on the theory of sample random matrices In chapters 6-10 detailed, strong proofs of the Circular and Elliptic Laws and their generalization are given In chapters 11-13 the convergence rates of spectral functions are given for the practical application of new estimators and important questions on random matrix physics are considered The final chapter contains 54 new statistical estimators, which generalize the main estimators of statistical analysis
TL;DR: This article elaborate global control for partial deduction, using the concept of a characteristic tree, encapsulating specialization behavior rather than syntactic structure, to guide generalization and polyvariance, and shows how this can be done in a correct and elegant way.
Abstract: Given a program and some input data, partial deduction computes a specialized program handling any remaining input more efficiently. However, controlling the process well is a rather difficult problem. In this article, we elaborate global control for partial deduction: for which atoms, among possibly infinitely many, should specialized relations be produced, meanwhile guaranteeing correctness as well as termination? Our work is based on two ingredients. First, we use the concept of a characteristic tree, encapsulating specialization behavior rather than syntactic structure, to guide generalization and polyvariance, and we show how this can be done in a correct and elegant way. Second, we structure combinations of atoms and associated characteristic trees in global trees registering “causal” relationships among such pairs. This allows us to spot looming nontermination and perform proper generalization in order to avert the danger, without having to impose a depth bound on characteristic trees. The practical relevance and benefits of the work are illustrated through extensive experiments. Finally, a similar approach may improve upon current (on-line) control strategies for program transformation in general such as (positive) supercompilation of functional programs. It also seems valuable in the context of abstract interpretation to handle infinite domains of infinite height with more precision.
TL;DR: In this paper, the authors studied the optimal cutting strategy for an ongoing forest, using stochastic impulse control, and showed how Faustmann's formula can be generalized to growing forests.
TL;DR: In this article, the second derivatives of prepotential with respect to Whitham time-variables in the Seiberg-Witten theory are expressed in terms of Riemann theta-functions.
Abstract: The second derivatives of prepotential with respect to Whitham time-variables in the Seiberg-Witten theory are expressed in terms of Riemann theta-functions. These formulas give a direct transcendental generalization of algebraic ones for the Kontsevich matrix model. In particular case they provide an explicit derivation of the renormalization group (RG) equation proposed recently in papers on the Donaldson theory.
TL;DR: It is shown that evolutionary algorithms are able to converge to the set of minimal elements in finite time with probability one, provided that the search space is finite, the time-invariant variation operator is associated with a positive transition probability function and that the selection operator obeys the so-called ‘elite preservation strategy.’
Abstract: The task of finding minimal elements of a partially ordered set is a generalization of the task of finding the global minimum of a real-valued function or of finding Pareto-optimal points of a multicriteria optimization problem. It is shown that evolutionary algorithms are able to converge to the set of minimal elements in finite time with probability one, provided that the search space is finite, the time-invariant variation operator is associated with a positive transition probability function and that the selection operator obeys the so-called ‘elite preservation strategy.’
TL;DR: The study shows that a set of sophisticated generalization operators can be constructed for generalization of complex data objects, a dimension-based class generalization mechanism can be developed for object cube construction, and sophisticated rule formation methods can be develop for extraction of different kinds of knowledge from data.
Abstract: Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. With the increasing popularity of object-oriented database systems in advanced database applications, it is important to study the data mining methods for object-oriented databases because mining knowledge from such databases may improve understanding, organization, and utilization of the data stored there. In this paper, issues on generalization-based data mining in object-oriented databases are investigated in three aspects: (1) generalization of complex objects, (2) class-based generalization, and (3) extraction of different kinds of rules. An object cube model is proposed for class-based generalization, on-line analytical processing, and data mining. The study shows that (i) a set of sophisticated generalization operators can be constructed for generalization of complex data objects, (ii) a dimension-based class generalization mechanism can be developed for object cube construction, and (iii) sophisticated rule formation methods can be developed for extraction of different kinds of knowledge from data, including characteristic rules, discriminant rules, association rules, and classification rules. Furthermore, the application of such discovered knowledge may substantially enhance the power and flexibility of browsing databases, organizing databases and querying data and knowledge in object-oriented databases.
TL;DR: A generalization of Hadamard's inequality to r -convex functions is given in this article, and a corresponding generalization to Fink-Mond-Pecaric inequalities is established.
TL;DR: A statistically based methodology for the design of neural networks when the dimension d of the network input is comparable to the size n of the training set, illustrated in detail in the context of short-term forecasting of the demand for electric power from an electric utility.
Abstract: We introduce a statistically based methodology for the design of neural networks when the dimension d of the network input is comparable to the size n of the training set. If one proceeds straightforwardly, then one is committed to a network of complexity exceeding n. The result will be good performance on the training set but poor generalization performance when the network is presented with new data. To avoid this we need to select carefully the network architecture, including control over the input variables. Our approach to selecting a network architecture first selects a subset of input variables (features) using the nonparametric statistical process of difference-based variance estimation and then selects a simple network architecture using projection pursuit regression (PPR) ideas combined with the statistical idea of slicing inverse regression (SIR). The resulting network, which is then retrained without regard to the PPR/SIR determined parameters, is one of moderate complexity (number of parameters significantly less than n) whose performance on the training set can be expected to generalize well. The application of this methodology is illustrated in detail in the context of short-term forecasting of the demand for electric power from an electric utility.
TL;DR: Testing whether viewpoint-specific representations for some members of a class facilitate the recognition of other members of that class supports the hypothesis that image-based representations are viewpoint dependent, but that these representations generalize across members of perceptually-defined classes.
TL;DR: A general capacity formula for the case of more than one major lane is derived, using the M3 distribution and allowing the major lanes to differ in critical gaps and follow-up times.
Abstract: A general capacity formula for the case of more than one major lane is derived, using the M3 distribution and allowing the major lanes to differ in critical gaps and follow-up times. The derivation can be further extended to there being different minimum headways for the major lanes and to distributions other than the M3 distribution. It is shown that the resulting formulas provide more accurate capacity estimations.
TL;DR: A full generalization is presented where both the autocorrelation function and power spectral density are defined in terms of a general basis set and a partial generalization where the density is the Fourier transform of the characteristic function but the characteristicfunction is defined in Terms of an arbitrary basis set.
Abstract: We generalize the concept of the autocorrelation function and give the generalization of the Wiener-Khinchin theorem. A full generalization is presented where both the autocorrelation function and power spectral density are defined in terms of a general basis set. In addition, we present a partial generalization where the density is the Fourier transform of the characteristic function but the characteristic function is defined in terms of an arbitrary basis set. Both the deterministic and random cases are considered.
TL;DR: An algorithm for the determination of the dominant paths for indoor wave propagation is presented and three different prediction models are presented and compared with one another and with measurements.
Abstract: An algorithm for the determination of the dominant paths for indoor wave propagation is presented. The algorithm computes a tree of the relations between the rooms inside the building and the branches of the tree are used for the determination of the dominant paths. Based on these dominant paths, three different prediction models are presented and compared with one another and with measurements. Two of the three models are based on neural networks, trained with measurements and the third model is an empirical model. With the neural prediction models a good generalization is achieved and they are very accurate in buildings not used for the training of the neural network.
TL;DR: The author considers how Ptolemy's Theorem can be used to prove that Snell's Law and Fermat's Principle of Least Time both lead to the same geometry of refraction.
Abstract: The author discusses two interesting relationships between circles and lines. There should be a computer graphics application that can use the first topic to run faster or better. The second topic is Ptolemy's Theorem, which is a generalization of the triangle inequality. The author shows how it can be used to derive the angle addition formulas. He considers how Ptolemy's Theorem can be used to prove that Snell's Law and Fermat's Principle of Least Time both lead to the same geometry of refraction.
TL;DR: It is shown that a prescriptive learning procedure where the weights are simply read off based on the training data can provide good generalization.
TL;DR: This paper proposes and discusses three experiments on knowledge acquisition using unsupervised and supervised learning techniques and results are promising with a prediction rate higher than 80% having been obtained.
Abstract: The nature of map generalization may be non-uniform along the length of an individual line, requiring the application of methods that adapt to the local geometry and the geographical context. Geographical databases need to be enriched in terms of shape description structures (geometrical knowledge), knowledge of appropriate order of operations and of appropriate algorithms (procedural knowledge). Stored knowledge should take account of semantic and morphological characteristics, and of cartographic constraints.
This paper proposes and discusses three experiments on knowledge acquisition using unsupervised and supervised learning techniques. In order to exploit geometrical shape knowledge, classifications were computed according to a set of morphological measures using unsupervised learning. Choice of appropriate operations was determined by the results of a test with IGN cartographers considering line characteristics. These results were given to a supervised learning algorithm, along with corresponding computed measures in order to discover rules. The approach and the resulting rules are presented and discussed. Tests have also been conducted on the tuning of parameter values, applying a Gaussian smoothing tolerance value to a set of lines using the supervised learning algorithm. The values obtained by means of the learning algorithm have been compared with interactive choices of an expert. Results are promising with a prediction rate higher than 80% having been obtained.
TL;DR: In this paper, a tensorial approach to describe a k-way array is employed, and the singular value decomposition of this type of multiarray is established, based on a generalization of the transition formulae, has a Gauss-Seidel form.