TL;DR: A statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities.
Abstract: Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.
TL;DR: This paper develops a general framework for robust and efficient recovery of nonlinear but structured signal models, in which x lies in a union of subspaces, and presents an equivalence condition under which the proposed convex algorithm is guaranteed to recover the original signal.
Abstract: Traditional sampling theories consider the problem of reconstructing an unknown signal x from a series of samples. A prevalent assumption which often guarantees recovery from the given measurements is that x lies in a known subspace. Recently, there has been growing interest in nonlinear but structured signal models, in which x lies in a union of subspaces. In this paper, we develop a general framework for robust and efficient recovery of such signals from a given set of samples. More specifically, we treat the case in which x lies in a sum of k subspaces, chosen from a larger set of m possibilities. The samples are modeled as inner products with an arbitrary set of sampling functions. To derive an efficient and robust recovery algorithm, we show that our problem can be formulated as that of recovering a block-sparse vector whose nonzero elements appear in fixed blocks. We then propose a mixed lscr2/lscr1 program for block sparse recovery. Our main result is an equivalence condition under which the proposed convex algorithm is guaranteed to recover the original signal. This result relies on the notion of block restricted isometry property (RIP), which is a generalization of the standard RIP used extensively in the context of compressed sensing. Based on RIP, we also prove stability of our approach in the presence of noise and modeling errors. A special case of our framework is that of recovering multiple measurement vectors (MMV) that share a joint sparsity pattern. Adapting our results to this context leads to new MMV recovery methods as well as equivalence conditions under which the entire set can be determined efficiently.
TL;DR: In this article, a generalization of the Hukuhara difference for compact convex sets is proposed to the case of interval-valued functions, and connections between them and their properties are studied.
Abstract: In the present paper we use a recently proposed generalization of the Hukuhara difference for compact convex sets, to introduce and study a generalization of the Hukuhara differentiability to the case of interval-valued functions. We also consider several possible definitions for the derivative of an interval-valued function, recently proposed in the literature, and we study connections between them and their properties. Using these concepts we study interval differential equations. Local existence and uniqueness of two solutions is obtained together with characterizations of the solutions of an interval differential equation by ODE systems. Several examples are also shown.
TL;DR: It is proved Gower's distance can be extended to include new types of data and an evaluation of the real contribution of each variable to the mixed distance is proposed, concluding that such a generalized index will be crucial for analyzing functional diversity at small and large scales.
Abstract: Functional diversity is at the heart of current research in the field of conservation biology. Most of the indices that measure diversity depend on variables that have various statistical types (e.g. circular, fuzzy, ordinal) and that go through a matrix of distances among species. We show how to compute such distances from a generalization of Gower's distance, which is dedicated to the treatment of mixed data. We prove Gower's distance can be extended to include new types of data. The impact of this generalization is illustrated on a real data set containing 80 plant species and 13 various traits. Gower's distance allows an efficient treatment of missing data and the inclusion of variable weights. An evaluation of the real contribution of each variable to the mixed distance is proposed. We conclude that such a generalized index will be crucial for analyzing functional diversity at small and large scales.
TL;DR: The authors investigate the relevant levels of generalization in adult language, how and why generalizations are learned by children, and how to account for cross-linguistic generalizations in cross-language generalizations.
Abstract: This paper provides a concise overview of Constructions at Work (Goldberg 2006). The book aims to investigate the relevant levels of generalization in adult language, how and why generalizations are learned by children, and how to account for cross-linguistic generalizations.
TL;DR: The difficulty in fulfilling the user requirements related to geo-information generalisation is analyzed and the current state of the art is identified and descriptions of further research and development directions in generalisation are provided.
Abstract: This paper analyses the difficulty in fulfilling the user requirements related to geo-information generalisation. Despite the fact that this is a long-standing research topic, the results are not satisfactory and therefore there is a very active research community trying to better meet the expectations of the users, both at the side of the geo-information producers and at the side of the geo-information users. It is argued that part of the difficulties are due to the fact that the generalization problem is not specified formally enough. Therefore, currently the most important benchmark for the generalization software is the work of human cartographers doing manual generalization, supported by automated tools, and includes subjective aspects such as taste, resulting into artistic solutions. So, a very important, intermediate, research goal is formalizing the generalization problem. In addition, the expectations of the users are growing over the past years and will continue to do so in the future: faster updates propagated between different scales, ever growing size of geo-information, support for vario-scale (instead of just multiple fixed scales), integration of formal semantics and computational geometry techniques, support for 3D representations, and so on. This paper identifies the current state of the art and provides descriptions of further research and development directions in generalisation.
TL;DR: Novel and distinct stability-based generalization bounds for stationary φ-mixing and β- Mixing sequences are proved, which can be viewed as the first theoretical basis for the use of these algorithms in non-i.i.d. scenarios.
Abstract: Most generalization bounds in learning theory are based on some measure of the complexity of the hypothesis class used, independently of any algorithm. In contrast, the notion of algorithmic stability can be used to derive tight generalization bounds that are tailored to specific learning algorithms by exploiting their particular properties. However, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed. In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence. This paper studies the scenario where the observations are drawn from a stationary ϕ-mixing or β-mixing sequence, a widely adopted assumption in the study of non-i.i.d. processes that implies a dependence between observations weakening over time. We prove novel and distinct stability-based generalization bounds for stationary ϕ-mixing and βmixing sequences. These bounds strictly generalize the bounds given in the i.i.d. case and apply to all stable learning algorithms, thereby extending the use of stability-bounds to non-i.i.d. scenarios. We also illustrate the application of our ϕ-mixing generalization bounds to general classes of learning algorithms, including Support Vector Regression, Kernel Ridge Regression, and Support Vector Machines, and many other kernel regularization-based and relative entropy-based regularization algorithms. These novel bounds can thus be viewed as the first theoretical basis for the use of these algorithms in non-i.i.d. scenarios.
TL;DR: In this article, a new generalization {qn }, with initial conditions q 0 = 0 and q 1 = 1, which is generated by the recurrence relation qn = aq n-1 + q n-2 (when n is even) or q n = bq n−1+ q n−2 (When n is odd), where a and b are nonzero real numbers.
Abstract: Abstract Consider the Fibonacci sequence having initial conditions F 0 = 0, F 1 = 1 and recurrence relation Fn = F n–1 + F n–2 (n ≥ 2). The Fibonacci sequence has been generalized in many ways, some by preserving the initial conditions, and others by preserving the recurrence relation. In this article, we study a new generalization {qn }, with initial conditions q 0 = 0 and q 1 = 1 which is generated by the recurrence relation qn = aq n–1 + q n–2 (when n is even) or qn = bq n–1 + q n–2 (when n is odd), where a and b are nonzero real numbers. Some well-known sequences are special cases of this generalization. The Fibonacci sequence is a special case of {qn } with a = b = 1. Pell's sequence is {qn } with a = b = 2 and the k-Fibonacci sequence is {qn } with a = b = k. We produce an extended Binet's formula for the sequence {qn } and, thereby, identities such as Cassini's, Catalan's, d'Ocagne's, etc.
TL;DR: In this paper, the initial-boundary value problems for an integrable generalization of the nonlinear Schrodinger equation formulated on the half-line were analyzed and the so-called linearizable boundary conditions, which in this case are of Robin type, were investigated.
Abstract: We analyze initial-boundary value problems for an integrable generalization of the nonlinear Schrodinger equation formulated on the half-line. In particular, we investigate the so-called linearizable boundary conditions, which in this case are of Robin type. Furthermore, we use a particular solution to verify explicitly all the steps needed for the solution of a well-posed problem.
TL;DR: A new family of nonextensive mutual information kernels, which allow weights to be assigned to their arguments, and which includes the Boolean, JS, and linear kernels as particular cases, are defined that generalize the p-spectrum kernel.
Abstract: Positive definite kernels on probability measures have been recently applied to classification problems involving text, images, and other types of structured data. Some of these kernels are related to classic information theoretic quantities, such as (Shannon's) mutual information and the Jensen-Shannon (JS) divergence. Meanwhile, there have been recent advances in nonextensive generalizations of Shannon's information theory. This paper bridges these two trends by introducing nonextensive information theoretic kernels on probability measures, based on new JS-type divergences. These new divergences result from extending the the two building blocks of the classical JS divergence: convexity and Shannon's entropy. The notion of convexity is extended to the wider concept of q-convexity, for which we prove a Jensen q-inequality. Based on this inequality, we introduce Jensen-Tsallis (JT) q-differences, a nonextensive generalization of the JS divergence, and define a k-th order JT q-difference between stochastic processes. We then define a new family of nonextensive mutual information kernels, which allow weights to be assigned to their arguments, and which includes the Boolean, JS, and linear kernels as particular cases. Nonextensive string kernels are also defined that generalize the p-spectrum kernel. We illustrate the performance of these kernels on text categorization tasks, in which documents are modeled both as bags of words and as sequences of characters.
TL;DR: This work presents an algorithm for solving the L1- regularized optimization problem defined by this model, and shows that it is especially efficient when the optimal solution is sparse, and results in significantly improved self-taught learning performance.
Abstract: Sparse coding is an unsupervised learning algorithm for finding concise, slightly higher-level representations of inputs, and has been successfully applied to self-taught learning, where the goal is to use unlabeled data to help on a supervised learning task, even if the unlabeled data cannot be associated with the labels of the supervised task [Raina et al, 2007] However, sparse coding uses a Gaussian noise model and a quadratic loss function, and thus performs poorly if applied to binary valued, integer valued, or other non-Gaussian data, such as text Drawing on ideas from generalized linear models (GLMs), we present a generalization of sparse coding to learning with data drawn from any exponential family distribution (such as Bernoulli, Poisson, etc) This gives a method that we argue is much better suited to model other data types than Gaussian We present an algorithm for solving the L1- regularized optimization problem defined by this model, and show that it is especially efficient when the optimal solution is sparse We also show that the new model results in significantly improved self-taught learning performance when applied to text classification and to a robotic perception task
TL;DR: In this article, the authors generalize the poverty ordering criteria available for one-dimensional income poverty to the case of multi-dimensional welfare attributes and define general classes of poverty measures based on these properties.
Abstract: This paper generalizes the poverty ordering criteria available for one-dimensional income poverty to the case of multi-dimensional welfare attributes. A set of properties to be satisfied by multi-dimensional poverty measures is first discussed. Then general classes of poverty measures based on these properties are defined. Finally, dominance criteria are derived such that a distribution of multi-dimensional attributes exhibits less poverty than another for all multi-dimensional poverty indices belonging to a given class. These criteria may be seen as a generalization of the single dimension poverty-line criterion. However, it turns out that the way this generalization is made depends on whether attributes are complements or substitutes.
TL;DR: The method uses a statistical linear regression technique which is based on the orthogonal least squares (OLS) algorithm, substituting a QR algorithm for the traditional Gram-Schmidt algorithm, to find the connected weight of the hidden layer neurons.
Abstract: In this paper we present a method for improving the generalization performance of a radial basis function (RBF) neural network. The method uses a statistical linear regression technique which is based on the orthogonal least squares (OLS) algorithm. We first discuss a modified way to determine the center and width of the hidden layer neurons. Then, substituting a QR algorithm for the traditional Gram-Schmidt algorithm, we find the connected weight of the hidden layer neurons. Cross-validation is utilized to determine the stop training criterion. The generalization performance of the network is further improved using a bootstrap technique. Finally, the solution method is used to solve a simulation and a real problem. The results demonstrate the improved generalization performance of our algorithm over the existing methods.
TL;DR: Using a generalization of the level statistics analysis of quantum disordered systems, an approach able to extract automatically keywords in literary texts is presented and it is shown that the method works also in generic symbolic sequences (continuous texts without spaces), suggesting its general applicability.
Abstract: Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.
TL;DR: In this article, a general class of generalized Maxwell models for nonlinear kinetic equations has been considered from a very general point of view, including those with arbitrary polynomial nonlinearities and in any dimension space.
Abstract: Maxwell models for nonlinear kinetic equations have many applications in physics, dynamics of granular gases, economics, etc. In the present manuscript we consider such models from a very general point of view, including those with arbitrary polynomial non-linearities and in any dimension space. It is shown that the whole class of generalized Maxwell models satisfies properties one of which can be interpreted as an operator generalization of usual Lipschitz conditions. This property allows to describe in detail a behavior of solutions to the corresponding initial value problem. In particular, we prove in the most general case an existence of self similar solutions and study the convergence, in the sense of probability measures, of dynamically scaled solutions to the Cauchy problem to those self-similar solutions, as time goes to infinity. A new application of multi-linear models to economics and social dynamics is discussed.
TL;DR: The main results obtained in this paper not only extend the previously known results for i.i.d. observations to the case of exponentially strongly mixing observations, but also improve the previous results for strongly mixing samples.
Abstract: The generalization performance is the main concern of machine learning theoretical research. The previous main bounds describing the generalization ability of the Empirical Risk Minimization (ERM) algorithm are based on independent and identically distributed (i.i.d.) samples. In order to study the generalization performance of the ERM algorithm with dependent observations, we first establish the exponential bound on the rate of relative uniform convergence of the ERM algorithm with exponentially strongly mixing observations, and then we obtain the generalization bounds and prove that the ERM algorithm with exponentially strongly mixing observations is consistent. The main results obtained in this paper not only extend the previously known results for i.i.d. observations to the case of exponentially strongly mixing observations, but also improve the previous results for strongly mixing samples. Because the ERM algorithm is usually very time-consuming and overfitting may happen when the complexity of the hypothesis space is high, as an application of our main results we also explore a new strategy to implement the ERM algorithm in high complexity hypothesis space.
TL;DR: In this paper, the authors define the category of B-branes in a (not necessarily affine) Landau-Ginzburg B-model, incorporating the notion of R-charge.
Abstract: We define the category of B-branes in a (not necessarily affine) Landau-Ginzburg B-model, incorporating the notion of R-charge. Our definition is a direct generalization of the category of perfect complexes. We then consider pairs of Landau-Ginzburg B-models that arise as different GIT quotients of a vector space by a one-dimensional torus, and show that for each such pair the two categories of B-branes are quasi-equivalent. In fact we produce a whole set of quasi-equivalences indexed by the integers, and show that the resulting auto-equivalences are all spherical twists.
TL;DR: This work investigates the properties of a linear family of scoring rules that are intended specifically for quantile assessment (including the assessment of multiple quantiles) and can be related to a realistic decision-making problem.
Abstract: Quantile assessments are commonly encountered in the elicitation of probability distributions in decision analysis, forecasting, and risk analysis. Scoring rules have been developed to provide ex ante incentives for careful and truthful assessments and ex post evaluation measures in the context of probability assessment. We show that these scoring rules designed for probability assessment provide inappropriate incentives if used for quantile assessment. We investigate the properties of a linear family of scoring rules that are intended specifically for quantile assessment (including the assessment of multiple quantiles) and can be related to a realistic decision-making problem. These rules provide proper incentives for quantile assessment and yield higher expected scores for distributions that are more informative in the sense of having less dispersion. We discuss the special case of interval forecasts and a generalization involving transformations, and we briefly mention other possible extensions.
TL;DR: The dressing method is implemented for a novel integrable generalization of the nonlinear Schrödinger equation and a by-product of the analysis is a simplification of the formulas for the N-solitons of the derivative non linear SchröDinger equation given by Huang and Chen.
Abstract: We implement the dressing method for a novel integrable generalization of the nonlinear Schr\"odinger equation. As an application, explicit formulas for the $N$-soliton solutions are derived. As a by-product of the analysis, we find a simplification of the formulas for the $N$-solitons of the derivative nonlinear Schr\"odinger equation given by Huang and Chen.
TL;DR: A projectivegeneralization of expected utility along the lines of the quantum-mechanical generalization of probability theory is introduced, which accommodates the dominant paradoxes, while retaining significant simplicity and tractability.
TL;DR: This work asymptotically determines the size of the largest family F of subsets of subset of $\{1,\dots,n\}$ not containing a given poset P if the Hasse diagram of $P$ is a tree.
Abstract: We asymptotically determine the size of the largest family $\cal F$ of subsets of $\{1,\dots,n\}$ not containing a given poset $P$ if the Hasse diagram of $P$ is a tree. This is a qualitative generalization of several known results including Sperner's theorem.
TL;DR: This paper provides a formal link between maximum likelihood and score matching and develops a generalization of score matching, which shows that score matching finds model parameters that are more robust with noisy training data.
Abstract: Score matching is a recently developed parameter learning method that is particularly effective to complicated high dimensional density models with intractable partition functions. In this paper, we study two issues that have not been completely resolved for score matching. First, we provide a formal link between maximum likelihood and score matching. Our analysis shows that score matching finds model parameters that are more robust with noisy training data. Second, we develop a generalization of score matching. Based on this generalization, we further demonstrate an extension of score matching to models of discrete data.
TL;DR: In this paper, the authors presented an efficient generalization of the $k$-space interpolation scheme for electronic structure presented by Shirley, which reduced the number of required initial electronic-structure calculations, enabling accurate interpolation over the entire Brillouin zone.
Abstract: We present an efficient generalization of the $k$-space interpolation scheme for electronic structure presented by Shirley [Phys. Rev. B 54, 16464 (1996)]. The method permits the construction of a compact $k$-dependent Hamiltonian using a numerically optimal basis derived from a coarse-grained set of effective single-particle electronic-structure calculations (based on density-functional theory in this work). We provide some generalizations of the initial approach which reduce the number of required initial electronic-structure calculations, enabling accurate interpolation over the entire Brillouin zone based on calculations at the zone center only for large systems. We also generalize the representation of nonlocal Hamiltonians, leading to a more efficient implementation which permits the use of both norm-conserving and ultrasoft pseudopotentials in the input calculations. Numerically interpolated electronic eigenvalues with accuracy that is within 0.01 eV can be produced at very little computational cost. Furthermore, accurate eigenfunctions---expressed in the optimal basis---provide easy access to useful matrix elements for simulating spectroscopy and we provide details for computing optical transition amplitudes. The approach is also applicable to other theoretical frameworks such as the Dyson equation for quasiparticle excitations or the Bethe-Salpeter equation for optical responses.
TL;DR: In this article, the authors considered the case of a general perturbation, for r large enough, of an a priori unstable Hamiltonian system of 2 + 1/2 degrees of freedom, and provided explicit conditions on it, which turn out to be generic and are verifiable in concrete examples, which guarantee the existence of Arnold diffusion.
Abstract: In this paper we consider the case of a general perturbation, for r large enough, of an a priori unstable Hamiltonian system of 2 + 1/2 degrees of freedom, and we provide explicit conditions on it, which turn out to be generic and are verifiable in concrete examples, which guarantee the existence of Arnold diffusion.This is a generalization of the result in Delshams et al (2006 Mem. Am. Math. Soc.) where the case of a perturbation with a finite number of harmonics in the angular variables was considered.The method of proof is based on a careful analysis of the geography of resonances created by a generic perturbation and it contains a deep quantitative description of the invariant objects generated by the resonances therein. The scattering map is used as an essential tool to construct transition chains of objects of different topology. The combination of quantitative expressions for both the geography of resonances and the scattering map provides, in a natural way, explicit computable conditions for instability.
TL;DR: In this paper, the authors present a rational model that transparently identifies the inductive biases that a process model should seek to capture, and they find that it explains several phenomena, including knowledge partitioning and iterated learning data.
Abstract: A rational model of function learning Christopher Lucas UC Berkeley Thomas Griffiths UC Berkeley Michael Kalish University of Lafayette Abstract: People often face the problem of learning what value a variable will take, given information about the values of other variables. Categorization and causal prediction are special cases, each the subject of extensive research dealing exclusively with discrete variables. With continuous variables, this problem is known as function learning. Most function learning research has been concerned with specifying representations and processes by which people understand the functional relationship between pairs of continuous variables. In contrast, we present a rational model that transparently identifies the inductive biases that a process model should seek to capture. The foundation of our approach is an infinite mixture of Gaussian process experts. It extends our previous Gaussian process model, which outperforms several well-known alternatives and has been shown to be a generalization of both associative and rule-based (i.e., regression-like) function-learning models. We find that it explains several phenomena, including knowledge partitioning and iterated learning data.
TL;DR: In this paper, the representation theory of restricted Lie superalgebras over an algebraically closed field of characteristic p>2 was initiated, and a superalgebra generalization of the celebrated Kac-Weisfeiler Conjecture was formulated, which exhibits a mixture of p-power and 2-power divisibilities of dimensions.
Abstract: We initiate the representation theory of restricted Lie superalgebras over an algebraically closed field of characteristic p>2. A superalgebra generalization of the celebrated Kac-Weisfeiler Conjecture is formulated, which exhibits a mixture of p-power and 2-power divisibilities of dimensions of modules. We establish the Conjecture for basic classical Lie superalgebras.
TL;DR: This paper improves and extends previous results in three ways: the algorithms proposed work in one stage, which saves time for testing, the test complexity is lower than previous results, and the number of elements which need to be tested is sufficiently large.