TL;DR: This paper addresses two crucial issues which have been considered to be a 'black art' in classification tasks ever since the introduction of stacked generalization: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input.
Abstract: Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy In this paper we address two crucial issues which have been considered to be a 'black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones
We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms for classification tasks We also compare the performance of stacked generalization with majority vote and published results of arcing and bagging
TL;DR: In this article, a simple generalization of the traditional query optimization algorithm is proposed to optimize queries in the presence of materialized views. But, the optimization problem is not addressed in this paper.
Abstract: While much work has addressed the problem of maintaining materialized views, the important question of optimizing queries in the presence of materialised views has not been resolved. In this paper, we analyze the optimization question and provide a comprehensive and efficient solution. Our solution has the desirable property that it is a simple generalization of the traditional query optimization algorithm. >
TL;DR: This chapter summarises results that have been obtained for high conndence generalization error bounds for the Support Vector Machine (SVM) and other pattern classiiers related to the SVM and argues that the margin and number of support vectors are both estimators of the degree to which the distribution generating the inputs assists identiication of the target hyperplane.
Abstract: The aim of this chapter is to summarise results that have been obtained for high conndence generalization error bounds for the Support Vector Machine (SVM) and other pattern classiiers related to the SVM. As a by-product of the analysis we argue that the margin and number of support vectors are both estimators of the degree to which the distribution generating the inputs assists identiication of the target hyperplane. 1.1 Introduction Generalization analysis of pattern classiiers is concerned with determining the factors that aaect the accuracy of a pattern classiier. Such an analysis requires assumptions to be made about how the data used to train the classiier was gathered and how subsequent data will be generated. One of the most popular assumptions originally championed by Vapnik and Chervonenkis 12] is to assume that the training and testing data are both generated according to the same probability distribution. The distribution can be viewed as a model of the natural processes which give rise to the observed phenomenon. Since it is usually more diicult to estimate the distribution than to learn the classiication function, it is important that no assumptions are made about the distribution, resulting in a so-called distribution-free analysis. We will consider bounds on the generalization error, that is the probability of
TL;DR: The Analytic Hierarchy Process (AHP) as discussed by the authors provides objective mathematics to process the inescapably subjective and personal preferences of an individual or a group in making a decision, and the Analytic Network Process (ANP) is a generalization of the AHP.
Abstract: The Analytic Hierarchy Process (AHP) provides the objective mathematics to process the inescapably subjective and personal preferences of an individual or a group in making a decision. With the AHP and its generalization, the Analytic Network Process (ANP), one constructs hierarchies or feedback networks, then makes judgments or performs measurements on pairs of elements with respect to a controlling element to derive ratio scales that are then synthesized throughout the structure to select the best alternative.
TL;DR: This paper extends the concept of M-convex function to functions on generalized polymatroids with a view to providing a unified framework for efficiently solvable nonlinear discrete optimization problems.
Abstract: The concept of M-convex function, introduced by Murota 1996, is a quantitative generalization of the set of integral points in an integral base polyhedron as well as an extension of valuated matroid of Dress and Wenzel 1990. In this paper, we extend this concept to functions on generalized polymatroids with a view to providing a unified framework for efficiently solvable nonlinear discrete optimization problems.
TL;DR: In this paper, the authors studied a sequence of generalizations of the Tsetlin library and developed a formula analogous to Theorem 1.1 for the distinct eigenvalues and multiplicities for this more general class of Markov chains.
Abstract: 1. Introduction. Imagine a collection of books labeled 1 through n arranged in a row in some order. We reorganize the row of books by successively choosing a book at random: choosing book i with probability w i and moving it to the front of the row. This \" move-to-front rule \" determines an interesting Markov chain on the set of arrangements of the books. If σ and τ denote any two orderings of the books, then the probability of transition from σ to τ is w i if and only if τ is obtained from σ by moving book i to the front. This Markov chain is commonly called the Tsetlin library or move-to-front scheme. Due to its use in computer science as a standard scheme for dynamic file maintenance as well as cache maintenance (cf. [Do], [FHo], and [P]), the move-to-front rule is a very well-studied Markov chain. A primary resource for this problem is Fill's comprehensive paper [F], which derives the transition probabilities for any number of steps of the chain and the eigenvalues with corresponding idempotents and discusses the rate of convergence to stationarity. Its thorough bibliography contains a wealth of pointers to the relevant literature. Of particular interest is the spectrum of this Markov chain. In general, knowledge of the eigenvalues for the transition matrix of a Markov chain can give some indication of the rate at which the chain converges to its equilibrium distribution. In the case of the Tsetlin library, the eigenvalues have an elegant formula, discovered (independently) Theorem 1.1. The distinct eigenvalues for the move-to-front rule are indexed by subsets A ⊆ {1,. .. , n} and given by λ A = i∈A w i. The multiplicity of λ A is the number of derangements (permutations with no fixed points) of the set {1,. .. , n − |A|} In this paper we study a sequence of generalizations of the Tsetlin library, culminating in a generalization of the setting of central hyperplane arrangements. In each case we develop a formula analogous to Theorem 1.1 for the distinct eigenvalues and multiplicities for this more general class of Markov chains. Our first generalization comes from viewing move-to-front as the operation of moving the books in the subset {i} to the front and then moving the subset [n] − {i} behind {i} while retaining their relative order; this is all done with probability w i …
TL;DR: In this paper, a generalized resolution criterion is defined and used for assessing non-regular fractional factorials, notably Plackett-Burman designs, which is intended to capture projection properties, complementing that of Webb (1964) whose concept of resolution concerns the estimability of lower order fractional fractional factors under the assumption that higher order effects are negligible.
Abstract: Resolution has been the most widely used criterion for comparing regular fractional factorials since it was introduced in 1961 by Box and Hunter. In this pa- per, we examine how a generalized resolution criterion can be defined and used for assessing nonregular fractional factorials, notably Plackett-Burman designs. Our generalization is intended to capture projection properties, complementing that of Webb (1964) whose concept of resolution concerns the estimability of lower order ef- fects under the assumption that higher order effects are negligible. Our generalized resolution provides a fruitful criterion for ranking different designs while Webb's resolution is mainly useful as a classification rule. An additional advantage of our approach is that the idea leads to a natural generalization of minimum aberration. Examples are given to illustrate the usefulness of the new criteria.
TL;DR: Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions, demonstrating the advantages of VC-based complexity control with finite samples.
Abstract: It is well known that for a given sample size there exists a model of optimal complexity corresponding to the smallest prediction (generalization) error. Hence, any method for learning from finite samples needs to have some provisions for complexity control. Existing implementations of complexity control include penalization (or regularization), weight decay (in neural networks), and various greedy procedures (aka constructive, growing, or pruning methods). There are numerous proposals for determining optimal model complexity (aka model selection) based on various (asymptotic) analytic estimates of the prediction risk and on resampling approaches. Nonasymptotic bounds on the prediction risk based on Vapnik-Chervonenkis (VC)-theory have been proposed by Vapnik. This paper describes application of VC-bounds to regression problems with the usual squared loss. An empirical study is performed for settings where the VC-bounds can be rigorously applied, i.e., linear models and penalized linear models where the VC-dimension can be accurately estimated, and the empirical risk can be reliably minimized. Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions. Our results demonstrate the advantages of VC-based complexity control with finite samples.
TL;DR: A generalization of the classical Gruss's integral inequality in inner product spaces is given in this article, where applications for positive linear functionals and integrals are also pointed out.
TL;DR: This paper proposes a framework for time modeling in production workflows, a generalization of two categories of algorithms: the shortest-path partitioning algorithm and the Critical Path Method.
Abstract: The dynamic nature of events, in particular business processes, is a natural and accepted feature of today’s business environment. Therefore, workflow systems, if they are to successfully model portions of the real world, need to acknowledge the temporal aspect of business processes. This is particularly true for processes where any deviation from the prescribed model is either very expensive, dangerous or even illegal. Such processes include legal processes, airline maintenance or hazardous material handling. However, time modeling in workflows is still an open research problem. This paper proposes a framework for time modeling in production workflows. Relevant temporal constraints are presented, and rules for their verification are defined. Furthermore, to enable visualization of some temporal constraints, a concept of “duration space” is introduced. The duration algorithm which calculates the shortest/longest workflow instance is presented. It is a generalization of two categories of algorithms: the shortest-path partitioning algorithm and the Critical Path Method (CPM). Based on the duration algorithm, the verification algorithm is designed to check the consistency of introduced temporal constraints.
TL;DR: Triple systems are a natural generalization of graphs and have connections with various fields. They have applications in coding theory, cryptography, and computer science. The study of triple systems is extensive and disjointed, but the book attempts to survey current knowledge and gather common themes.
Abstract: Abstract Triple systems are among the simplest combinatorial designs, and are a natural generalization of graphs. They have connections with geometry, algebra, group theory, finite fields, and cyclotomy; they have applications in coding theory, cryptography, computer science, and statistics. Triple systems provide in many cases the prototype for deep results in combinatorial design theory; this design theory is permeated by problems that were first understood in the context of triple systems and then generalized. Such a rich set of connections has made the study of triple systems an extensive, but sometimes disjointed, field of combinatorics. This book attempts to survey current knowledge on the subject, to gather together common themes, and to provide an accurate portrait of the huge variety of problems and results. Representative samples of the major styles of proof technique are included, as is a comprehensive bibliography.
TL;DR: This work investigates the generalization performance of support vector machines (SVMs), which have been recently introduced as a general alternative to neural networks, and finds that SVMs overfit only weakly.
Abstract: Using methods of Statistical Physics, we investigate the generalization performance of support vector machines (SVMs), which have been recently introduced as a general alternative to neural networks. For nonlinear classification rules, the generalization error saturates on a plateau, when the number of examples is too small to properly estimate the coefficients of the nonlinear part. When trained on simple rules, we find that SVMs overfit only weakly. The performance of SVMs is strongly enhanced, when the distribution of the inputs has a gap in feature space.
TL;DR: A generalization of Ostrowski's inequality for lipschitzian mappings and applications in Numerical Analysis and for Euler's Beta function are given in this paper, where the authors also consider the use of ODEs in NER.
Abstract: A generalization of Ostrowski's inequality for lipschitzian mappings and applications in Numerical Analysis and for Euler's Beta function are given.
TL;DR: In this article, it was shown that when the function is convex, the generalized Bernstein polynomials Bn are monotonic in n, as in the classical case.
Abstract: This paper is concerned with a generalization of the classical Bernstein polynomials where the function is evaluated at intervals which are in geometric progression. It is shown that, when the function is convex, the generalized Bernstein polynomials Bn are monotonic in n, as in the classical case.
TL;DR: In this article, it was shown that universal fault-tolerant computation is possible with any higher-dimensional stabilizer code for prime d-dimensional systems, where the fundamental units are 2-dimensional qubits.
Abstract: Instead of a quantum computer where the fundamental units are 2-dimensional qubits, we can consider a quantum computer made up of d-dimensional systems. There is a straightforward generalization of the class of stabilizer codes to d-dimensional systems, and I will discuss the theory of fault-tolerant computation using such codes. I prove that universal fault-tolerant computation is possible with any higher-dimensional stabilizer code for prime d.
TL;DR: Three results are obtruned, which provide partial remedies for shortcomings in Hilbert series and degree bounds in the modular case and are a generalization of Goobel’s degree bound to the case of monomial representations.
Abstract: The Hilbert series and degree bounds play significant roles in computational invariant theory In the modular case, neither of these tools is avrulable in general In this article three results are obtruned, which provide partial remedies for these shortcomings First, it is shown that the so-called extended Hilbert series, which can always be calculated by a MoHen type formula, yields strong constraints on the degrees of primary invariants Then it is shown that for a trivial source module the (ordinary) Hilbert series coincides with that of a lift to characteristic 0 and can hence be calculated by MoHen’s formula The last result is a generalization of Goobel’s degree bound to the case of monomial representations
TL;DR: A generalization of Blaschke's Rolling Theorem for not necessarily convex sets is proved in this article, which exhibits an intimate connection between a generalized notion of convexity, various concepts in mathematical morphology and image processing, and a certain smoothness condition.
TL;DR: In this paper, a generalization of Shephard's distance functions is proposed, extending the usefulness of distance functions in economic analysis and applications to efficiency measurements and productivity analysis are presented.
Abstract: A generalization of Shephard’s distance functions is proposed, extending the usefulness of distance functions in economic analysis. Applications to efficiency measurements and productivity analysis are presented. New indexes of productivity and technical, allocative, and scale efficiency are proposed and analyzed. Interpretation of these indexes in terms of ray-average cost, ray-average revenue, and cost-to-revenue ratio is discussed.
TL;DR: This paper argues that two apparently distinct modes of generalizing concepts - abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework.
Abstract: This paper argues that two apparently distinct modes of generalizing concepts - abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework. Bayes explains the specific workings of these two modes - which rules are abstracted, how similarity is measured - as well as why generalization should appear rule - or similarity-based in different situations. This analysis also suggests why the rules/similarity distinction, even if not computationally fundamental, may still be useful at the algorithmic level as part of a principled approximation to fully Bayesian learning.
TL;DR: A general bound for a class of approximation schemes that include radial basis functions and multilayer perceptrons is proved and it is shown how the total error can be decomposed into two parts: an approximation part that is due to the finite number of parameters of the approximation scheme used.
Abstract: We consider the problem of approximating functions from scattered data using linear superpositions of non-linearly parameterized functions. We show how the total error (generalization error) can be decomposed into two parts: an approximation part that is due to the finite number of parameters of the approximation scheme used; and an estimation part that is due to the finite number of data available. We bound each of these two parts under certain assumptions and prove a general bound for a class of approximation schemes that include radial basis functions and multilayer perceptrons.
TL;DR: Mertens' theorem about the partial product of the Riemann zeta function at s = 1 has been proved in this paper, where the authors show that it is a theorem that is applicable to the case of R.
Abstract: Mertens' proved an interesting and useful theorem about the partial product of the Riemann zeta function at s = 1. Namely.
TL;DR: In this paper, the Busemann-Petty problem has been solved and the answer is negative ifn≥5 and affirmative whenn≤4, where n is the dimension of the affirmative answer.
Abstract: The 1956 Busemann-Petty problem asks whether symmetric convex bodies in ℝ
n
with larger central hyperplane sections must also have greater volume. The solution to the problem has recently been completed, and the answer is negative ifn≥5 and affirmative whenn≤4. We show a more general result, where the inequalities for the volume of central sections are replaced by similar inequalities for the derivatives of the parallel section functions at zero. The dimension of affirmative answer goes up together with the order of the derivatives. The proof is based on a version of Parseval's formula.
TL;DR: In this article, a unified approach to hypergeometric functions is given, which is a generalization of exponential functions that can also be manipulated analytically, and some potentially useful general applications emerge in a number of areas such as in econometrics and economic theory.
Abstract: Hypergeometric functions are a generalization of exponential functions. They are explicit, computable functions that can also be manipulated analytically. The functions and series we use in quantitative economics are all special cases of them. In this paper, a unified approach to hypergeometric functions is given. As a result, some potentially useful general applications emerge in a number of areas such as in econometrics and economic theory. The greatest benefit from using these functions stems from the fact that they provide parsimonious explicit (and interpretable) solutions to a wide range of general problems.
TL;DR: In this paper, a unified approach to hypergeometric functions is given, which is a generalization of exponential functions that can also be manipulated analytically, and some potentially useful general applications emerge in a number of areas such as in econometrics and economic theory.
Abstract: Hypergeometric functions are a generalization of exponential functions. They are explicit, computable functions that can also be manipulated analytically. The functions and series we use in quantitative economics are all special cases of them. In this paper, a unified approach to hypergeometric functions is given. As a result, some potentially useful general applications emerge in a number of areas such as in econometrics and economic theory. The greatest benefit from using these functions stems from the fact that they provide parsimonious explicit (and interpretable) solutions to a wide range of general problems.
TL;DR: In this paper, the authors examine some assumptions and results of cartographic line simplification in the digital realm, focusing upon two major aspects of map generalization-scale-specificity and the concept of characteristic points.
Abstract: This paper examines some assumptions and results of cartographic line simplification in the digital realm, focusing upon two major aspects of map generalization-scale-specificity and the concept of characteristic points. These are widely regarded as critical controls to generalization but, in our estimation, they are rarely well considered or properly applied. First, a look at how scale and shape are treated in various research papers identifies some important conceptual and methodological issues that either have been misconstrued or inadequately treated. We then conduct an empirical analysis with a set of line generalization experiments that control resolution, detail, and sinuosity using four source datasets. The tests yield about 100 different versions of two island coastlines digitized at two scales, exploring systematically the consequences of linking scale with spatial resolution as well as a variety of point selection strategies. The generalized results are displayed (at scale and enlarged) along w...
TL;DR: A necessary and sufficient condition is given for the exact reduction of systems modeled by linear fractional transformations on structured operator sets, based on the existence of a rank-deficient solution to either of a pair of linear matrix inequalities which generalize Lyapunov equations.
Abstract: A necessary and sufficient condition is given for the exact reduction of systems modeled by linear fractional transformations (LFTs) on structured operator sets. This condition is based on the existence of a rank-deficient solution to either of a pair of linear matrix inequalities which generalize Lyapunov equations; the notion of Gramians is thus also generalized to uncertain systems, as well as Kalman-like decomposition structures. A related minimality condition, the converse of the reducibility condition, may then be inferred from these results and the equivalence class of all minimal LFT realizations defined. These results comprise the first stage of a complete generalization of realization theory concepts to uncertain systems. Subsequent results, such as the definition of and rank tests on structured controllability and observability matrices are also given. The minimality results described are applicable to multidimensional system realizations as well as to uncertain systems; connections to formal powers series representations also exist.
TL;DR: A new generalization of the DST is put forward that gives a fuzzy-valued definition for the belief, plausibility, and probability functions over a finite referential set that is capable of modeling the uncertainties in the real world and eliminate the need for extra preassumptions and preprocessing.
Abstract: The Dempster-Shafer theory (DST) may be considered as a generalization of the probability theory, which assigns mass values to the subsets of the referential set and suggests an interval-valued probability measure. There have been several attempts for fuzzy generalization of the DST by assigning mass (probability) values to the fuzzy subsets of the referential set. The interval-valued probability measures thus obtained are not equivalent to the original fuzzy body of evidence. In this paper, a new generalization of the DST is put forward that gives a fuzzy-valued definition for the belief, plausibility, and probability functions over a finite referential set. These functions are all equivalent to one another and to the original fuzzy body of evidence. The advantage of the proposed model is shown in three application examples. It can be seen that the proposed generalization is capable of modeling the uncertainties in the real world and eliminate the need for extra preassumptions and preprocessing.
TL;DR: A generalization of k-order additive discrete fuzzy measures recently introduced by Grabisch is shown and connection of the proposed generalization with the general Mobius transform of Shafer is shown.
Abstract: A generalization of k-order additive discrete fuzzy measures recently introduced by Grabisch is shown. k-order additive fuzzy measures on general spaces are introduced. Connection of the proposed generalization with the general Mobius transform of Shafer is shown. General evaluation formula for the Choquet integral is given. Further generalizations concerning the type of applied arithmetics are proposed.
TL;DR: This paper generalizes the interpolant using Voronoi diagrams in two directions: one is to general-dimensional data, and the other is to data distributed continuously on curves.
Abstract: Recently, the authors found an interpolant using Voronoi diagrams that differs from Sibson's interpolant. This paper generalizes our interpolant in two directions: one is to general-dimensional data, and the other is to data distributed continuously on curves. The Minkowski's theorem is used as the basic principle in generalization.