TL;DR: The data cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.
Abstract: Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional answers. Applications need the N-dimensional generalization of these operators. The paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensionaI cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an "infinite value": ALL, so the point (ALL,ALL,...,ALL, sum(*)) represents the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation.
TL;DR: It is shown that input noise and weight noise encourage the neural-network output to be a smooth function of the input or its weights, respectively, and in the weak-noise limit, noise added to the output of the neural networks only changes the objective function by a constant, it cannot improve generalization.
Abstract: We study the effects of adding noise to the inputs, outputs, weight connections, and weight changes of multilayer feedforward neural networks during backpropagation training. We rigorously derive and analyze the objective functions that are minimized by the noise-affected training processes. We show that input noise and weight noise encourage the neural-network output to be a smooth function of the input or its weights, respectively. In the weak-noise limit, noise added to the output of the neural networks only changes the objective function by a constant. Hence, it cannot improve generalization. Input noise introduces penalty terms in the objective function that are related to, but distinct from, those found in the regularization approaches. Simulations have been performed on a regression and a classification problem to further substantiate our analysis. Input noise is found to be effective in improving the generalization performance for both problems. However, weight noise is found to be effective in improving the generalization performance only for the classification problem. Other forms of noise have practically no effect on generalization.
TL;DR: In this paper, an extension of the concept of quantiles in multidimensions that uses the geometry of multivariate data clouds has been considered, based on blending and generalization of the key ideas used in the construction of spatial median and regression quantiles, both of which have been extensively studied in the literature.
Abstract: An extension of the concept of quantiles in multidimensions that uses the geometry of multivariate data clouds has been considered. The approach is based on blending as well as generalization of the key ideas used in the construction of spatial median and regression quantiles, both of which have been extensively studied in the literature. These geometric quantiles are potentially useful in constructing trimmed multivariate means as well as many other L estimates of multivariate location, and they lead to a directional notion of central and extreme points in a multidimensional setup. Such quantiles can be defined as meaningful and natural objects even in infinite-dimensional Hilbert and Banach spaces, and they yield an effective generalization of quantile regression in multiresponse linear model problems. Desirable equivariance properties are shown to hold for these multivariate quantiles, and issues related to their computation for data in finite-dimensional spaces are discussed. n 1/2 consistenc...
TL;DR: By using a generalization of the optical tomography technique, the authors describe the dynamics of a quantum system in terms of equations for a purely classical probability distribution which contains complete information about the system.
Abstract: By using a generalization of the optical tomography technique we describe the dynamics of a quantum system in terms of equations for a purely classical probability distribution which contains complete information about the system.
TL;DR: This work presents a method of incorporating prior knowledge about transformation invariances by applying transformations to support vectors, the training examples most critical for determining the classification boundary.
Abstract: Developed only recently, support vector learning machines achieve high generalization ability by minimizing a bound on the expected test error; however, so far there existed no way of adding knowledge about invariances of a classification problem at hand. We present a method of incorporating prior knowledge about transformation invariances by applying transformations to support vectors, the training examples most critical for determining the classification boundary.
TL;DR: This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights.
Abstract: This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. More specifically, consider an l-layer feed-forward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A. The misclassification probability converges to an error estimate (that is closely related to squared error on the training set) at rate O((cA)l(l+1)/2 √(log n)/m) ignoring log factors, where m is the number of training patterns, n is the input dimension, and c is a constant. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training.
TL;DR: Advantages of this approach include its inherent ability for one-class generalization, freedom from characterizing the non-target class, and the ability to form closed decision boundaries for multi-modal classes that are more complex than hyperspheres without requiring inversion of large matrices.
TL;DR: The authors develop a self-contained theory for linear estimation in Krein spaces based on simple concepts such as projections and matrix factorizations and leads to an interesting connection between Krein space projection and the recursive computation of the stationary points of certain second-order (or quadratic) forms.
Abstract: The authors develop a self-contained theory for linear estimation in Krein spaces. The derivation is based on simple concepts such as projections and matrix factorizations and leads to an interesting connection between Krein space projection and the recursive computation of the stationary points of certain second-order (or quadratic) forms. The authors use the innovations process to obtain a general recursive linear estimation algorithm. When specialized to a state-space structure, the algorithm yields a Krein space generalization of the celebrated Kalman filter with applications in several areas such as H/sup /spl infin//-filtering and control, game problems, risk sensitive control, and adaptive filtering.
TL;DR: MacKay's Bayesian framework for backpropagation is a practical and powerful means to improve the generalization ability of neural networks and is applied in the prediction of fat content in minced meat from near infrared spectra.
Abstract: MacKay's (1992) Bayesian framework for backpropagation is a practical and powerful means to improve the generalization ability of neural networks. It is based on a Gaussian approximation to the posterior weight distribution. The framework is extended, reviewed, and demonstrated in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and a detailed and explicit procedure for adjusting several weight decay parameters is given. Bayesian backprop is applied in the prediction of fat content in minced meat from near infrared spectra. It outperforms "early stopping" as well as quadratic regression. The evidence of a committee of differently trained networks is computed, and the corresponding improved generalization is verified. The error bars on the predictions of the fat content are computed. There are three contributors: The random noise, the uncertainty in the weights, and the deviation among the committee members. The Bayesian framework is compared to Moody's GPE (1992). Finally, MacKay and Neal's automatic relevance determination, in which the weight decay parameters depend on the input number, is applied to the data with improved results.
TL;DR: In this paper, a teaching experiment using generalization activities is presented, and two generalizing activities are described in some detail, looking at the behavior of adults in the experimental group in the light of research results of high school students on tests and interviews involving the same activities.
Abstract: Considering algebra as a culture, this chapter looks at the introduction of algebra as an initiation process where generalization activities can be extremely effective. After a reflection on my own immersion into algebra and the evolution of attitudes toward the teaching of algebra, a teaching experiment using generalization activities is presented. Two generalizing activities are described in some detail, looking at the behavior of adults in the experimental group in the light of research results of high school students on tests and interviews involving the same activities. The paper concludes with a “cultural” reflection on the teaching experiment and a more general consideration of the role of generalization in the introduction of algebra.
TL;DR: Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance, and it is predicted that best overall performance can be achieved by injecting additive noise at each time step.
Abstract: Concerns the effect of noise on the performance of feedforward neural nets. We introduce and analyze various methods of injecting synaptic noise into dynamically driven recurrent nets during training. Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance. We analyze the effects of various noise parameters and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima. The first-order term is a regularization term that can improve generalization. Specifically, it can encourage internal representations where the state nodes operate in the saturated regions of the sigmoid discriminant function. While this effect can improve performance on automata inference problems with binary inputs and target outputs, it is unclear what effect it will have on other types of problems. To substantiate these predictions, we present simulations on learning the dual parity grammar from temporal strings for all noise models, and present simulations on learning a randomly generated six-state grammar using the predicted best noise model.
TL;DR: In this article, the central elements of the universal enveloping algebra of the general linear algebra which are called quantum immanants are considered and expressed in terms of generators and differential operators on the space of matrices.
Abstract: We consider remarkable central elements of the universal enveloping algebra of the general linear algebra which we call quantum immanants. We express them in terms of generators $E_{ij}$ and as differential operators on the space of matrices. These expressions are a direct generalization of the classical Capelli identities. They result in many nontrivial properties of quantum immanants.
TL;DR: In this paper, the authors established sharp capacitary estimates for Carnot-Caratheodory rings associated to a system of vector fields of Hormander type, which are instrumental to the study of the local behavior of singular solutions of a wide class of nonlinear subelliptic equations.
Abstract: We establish sharp capacitary estimates for Carnot-Caratheodory rings associated to a system of vector fields of Hormander type. Such estimates are instrumental to the study of the local behavior of singular solutions of a wide class of nonlinear subelliptic equations. One of the main results is a generalization of fundamental estimates obtained independently by Sanchez-Calle and Nagel, Stein and Wainger.
TL;DR: The main contribution of the paper is the development of Algorithm GenCom (Generalization for Commonality extraction) that makes use of concept generalization to effectively derive many meaningful commonalities that cannot be found otherwise.
Abstract: Studies two spatial knowledge discovery problems involving proximity relationships between clusters and features. The first problem is: given a cluster of points, how can we efficiently find features (represented as polygons) that are closest to the majority of points in the cluster? We measure proximity in an aggregate sense due to the nonuniform distribution of points in a cluster (e.g. houses on a map), and the different shapes and sizes of features (e.g. natural or man-made geographic features). The second problem is: given n clusters of points, how can we extract the aggregate proximity commonalities (i.e. features) that apply to most, if not all, of the n clusters? Regarding the first problem, the main contribution of the paper is the development of Algorithm CRH (Circle, Rectangle and Hull), which uses geometric approximations (i.e. encompassing circles, isothetic rectangles and convex hulls) to filter and select features. The highly scalable and incremental Algorithm CRH can examine over 50,000 features and their spatial relationships with a given cluster in approximately one second of CPU time. Regarding the second problem, the key contribution is the development of Algorithm GenCom (Generalization for Commonality extraction) that makes use of concept generalization to effectively derive many meaningful commonalities that cannot be found otherwise.
TL;DR: In this paper, the authors studied the optimal cutting strategy for an ongoing forest, using stochastic impulse control, and showed how Faustmann's formula can be generalized to growing forests.
Abstract: In the present paper we study the optimal cutting strategy { τ 1 , τ 2 , …} for an ongoing forest. By using stochastic impulse control we show how Faustmann's formula can be generalized to stochastic growing forests. The paper extends and clarifies previous studies by Miller and Voltaire (1983), Clarke and Reed (1989), and Reed and Clarke (1990).
TL;DR: In this paper, the central elements of the universal enveloping algebra U(gl(n)) which are called quantum immanants are considered and expressed in terms of generators and differential operators on the space of matrices.
Abstract: We consider some remarkable central elements of the universal enveloping algebraU(gl(n)) which we call quantum immanants. We express them in terms of generatorsE
ij ofU(gl(n)) and as differential operators on the space of matrices These expressions are a direct generalization of the classical Capelli identities. They result in many nontrivial properties of quantum immanants.
TL;DR: In this article, the authors investigate cardinality questions concerning sums of finite sets and present a generalization of Freiman's famous theorem that describes the structure of those sets A for which A + A is small, to the case of different summands.
Abstract: We investigate numerous cardinality questions concerning sums of finite sets. A typical problem looks like the following: if A has n elements, A + B has cn, what can we deduce about A and B? How can we estimate the cardinalities of other sets like A − B and A + B + A? This is in quest of a generalization of Freiman’s famous theorem that describes the structure of those sets A for which A + A is small, to the case of different summands.
TL;DR: In this article, a non-local small-slope approximation (NLSSA) is proposed for wave scattering from rough surfaces, referred to as the NLSSA, which is valid for an arbitrary wavelength of radiation provided that the slopes of the undulations are small enough.
Abstract: A new general analytical approach to solving the problems of wave scattering from rough surfaces, referred to as the non-local small-slope approximation (NLSSA), is suggested. It is formulated in the general form both for vector and scalar waves. This approach is valid for an arbitrary wavelength of radiation provided that the slopes of the undulations are small enough. The NLSSA represents a generalization of the small-slope approximation to situations where double scattering (in the optical sense) appears. It is demonstrated that with appropriate approximations the NLSSA of the lowest order reduces to the small-slope approximation of the second order.
TL;DR: A generalization of the original idea of rough sets as introduced by Pawlak, called the Variable Precision Rough Sets Model with Asymmetric Bounds, is aimed at modeling decision situations characterized by uncertain information expressed in terms of probability distributions estimated form frequency distributions observed in empirical data.
Abstract: We present a generalization of the original idea of rough sets as introduced by Pawlak. The generalization, called the Variable Precision Rough Sets Model with Asymmetric Bounds, is aimed at modeling decision situations characterized by uncertain information expressed in terms of probability distributions estimated form frequency distributions observed in empirical data. The model presented is a direct extension of the previous concept, the Variable Precision Rough Sets Model. The properties of the extended model are investigated and compared to the original model. Also, a real life problem of identifying the factors which most affect the likelihoods of specified events in the steel industry is discussed in the context of this theory.
TL;DR: In this article, a generalization of non-commutative geometry and gauge theories based on ternary Z-3-graded structures is proposed, where all products of two entities are left free, imposing relations on terntary products only.
Abstract: We propose a generalization of non-commutative geometry and gauge theories based on ternary Z_3-graded structures. In the new algebraic structures we define, we leave all products of two entities free, imposing relations on ternary products only. These relations reflect the action of the Z_3-group, which may be either trivial, i.e. abc=bca=cab, generalizing the usual commutativity, or non-trivial, i.e. abc=jbca, with j=e^{(2\pi i)/3}. The usual Z_2-graded structures such as Grassmann, Lie and Clifford algebras are generalized to the Z_3-graded case. Certain suggestions concerning the eventual use of these new structures in physics of elementary particles are exposed.
TL;DR: This paper investigates the case of higher-order patterns as introduced by Miller and sketches an efficient implementation of the abstract algorithm and its generalization to constraint simplification, which has yielded good experimental results at the core of a higher- order constraint logic programming language.
Abstract: In [6] we have proposed a general higher-order unification method using a theory of explicit substitutions and we have proved its completeness. In this paper, we investigate the case of higher-order patterns as introduced by Miller. We show that our general algorithm specializes in a very convenient way to patterns. We also sketch an efficient implementation of the abstract algorithm and its generalization to constraint simplification, which has yielded good experimental results at the core of a higher-order constraint logic programming language.
TL;DR: Numerous important lattices possess algebraic structures over various Euclidean rings, e.g. Eisenstein integers or Hurwitz quaternions, and one obtains efficient algorithms by performing within this frame the usual reduction procedures, including the well known LLL-algorithm.
Abstract: Numerous important lattices (D4, E8, the Coxeter-Todd lattice K12, the Barnes-Wall lattice 039B16, the Leech lattice 039B24, as well as the 2-modular 32-dimensional lattices found by Quebbemann and Bachoc) possess algebraic structures over various Euclidean rings, e.g. Eisenstein integers or Hurwitz quaternions. One obtains efficient algorithms by performing within this frame the usual reduction procedures, including the well known LLL-algorithm.
TL;DR: Methods that are faster than direct iterations on the Riccati equation and are more reliable than solutions based on eigenvalue–eigenvector decompositions of the state–costate evolution equation are discussed in the chapter.
Abstract: Publisher Summary This paper describes the recent advances for rapidly and accurately solving matrix Riccati and Sylvester equations and applies them to devise efficient computational methods for solving and estimating dynamic linear economies. The chapter explores the most promising solution methods available and compares their speed and accuracy for some particular economic examples. Except for the simplest dynamic linear models, it is necessary to compute solutions numerically. In estimation contexts, computation speed is important because climbing a likelihood function can require that a model be solved many times. Methods that are faster than direct iterations on the Riccati equation and are more reliable than solutions based on eigenvalue–eigenvector decompositions of the state–costate evolution equation are discussed in the chapter. Two generalizations are presented in the chapter: The first generalization introduces forcing sequences or “uncontrollable states” into the deterministic regulator problem, while the second generalization introduces, among other things, discounting and uncertainty into the augmented regulator problem.
TL;DR: It is found that increasing the mutation rate can significantly improve the generalization capabilities of GP and greatly extends the number of generations the GP system can train before the population converges.
Abstract: Ordinarily, Genetic Programming uses little or no mutation. Crossover is the predominant operator. This study tests the effect of a very aggressive use of the mutation operator on the generalization performance of our Compiling Genetic Programming System (‘CPGS’). We ran our tests on two benchmark classification problems on very sparse training sets. In all, we performed 240 complete runs of population 3000 for each of the problems, varying mutation rate between 5% and 80%. We found that increasing the mutation rate can significantly improve the generalization capabilities of GP. The mechanism by which mutation affects the generalization capability of GP is not entirely clear. What is clear is that changing the balance between mutation and crossover effects the course of GP training substantially — for example, increasing mutation greatly extends the number of generations for which the GP system can train before the population converges.
TL;DR: The article describes the way to overcome the strong correlation between the input signals of the various channels and derives an efficient algorithm that makes use of additional orthogonal projections.
Abstract: A straightforward generalization of the so-called affine projection algorithm (APA) to the multichannel (MC) case is easily obtained. However, due to the strong correlation between the input signals of the various channels, the resulting algorithm converges very slowly. The article describes the way to overcome this problem and derives an efficient algorithm that makes use of additional orthogonal projections.
TL;DR: A grasp synthesis algorithm that can use any grasp prototype as the starting point in a search for a good grasp, and makes the given prototype more effective by generalizing it for a specific task.
Abstract: This paper introduces a grasp synthesis algorithm that can use any grasp prototype as the starting point in a search for a good grasp. The algorithm makes the given prototype more effective by generalizing it for a specific task. This generalization step expands the range of application of the prototype to a wide variety of target object geometries, while ensuring that the resulting grasps are appropriate for the intended task. An example whole-hand grasp synthesized for the Salisbury hand is shown at the end of the paper.