TL;DR: In this paper, a unified framework for establishing consistency and convergence rates for regularized M$-estimators under high-dimensional scaling was provided, which can be used to re-derive some existing results.
Abstract: High-dimensional statistical inference deals with models in which the the number of parameters $p$ is comparable to or larger than the sample size $n$. Since it is usually impossible to obtain consistent procedures unless $p/n\rightarrow0$, a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse and structured matrices, low-rank matrices and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. This paper provides a unified framework for establishing consistency and convergence rates for such regularized $M$-estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive some existing results, and also to obtain a number of new results on consistency and convergence rates, in both $\ell_{2}$-error and related norms. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure corresponding regularized $M$-estimators have fast convergence rates and which are optimal in many well-studied cases.
TL;DR: The algorithm can be used to provide an efficient parametric estimation of the quantum state and therefore can be applied as an alternative to full quantum-state tomography given a fault tolerant quantum computer.
Abstract: We provide a new quantum algorithm that efficiently determines the quality of a least-squares fit over an exponentially large data set by building upon an algorithm for solving systems of linear equations efficiently [Harrow et al., Phys. Rev. Lett. 103, 150502 (2009)]. In many cases, our algorithm can also efficiently find a concise function that approximates the data to be fitted and bound the approximation error. In cases where the input data are pure quantum states, the algorithm can be used to provide an efficient parametric estimation of the quantum state and therefore can be applied as an alternative to full quantum-state tomography given a fault tolerant quantum computer.
TL;DR: In this paper, it was shown that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density, and this was confirmed in sampling experiments.
Abstract: What do auto-encoders learn about the underlying data generating distribution? Recent work suggests that some auto-encoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density. We show that the auto-encoder captures the score (derivative of the log-density with respect to the input). It contradicts previous interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the auto-encoder: they show what the auto-encoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising auto-encoder training criterion with small corruption noise, but with contraction applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to maximum likelihood because it does not involve a partition function. Finally, we show how an approximate Metropolis-Hastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments.
TL;DR: In this paper, an alternating augmented Lagrangian method for convex optimization problems where the cost function is the sum of two terms, one that is separable in the variable blocks, and a second th...
TL;DR: In this article, the authors discuss the connection between spatial point, count, and presence-absence methods and how their parameter estimates and predictions should be interpreted and illustrate that under certain assumptions, each method can be motivated by the same underlying spatial inhomogeneous Poisson point process (IPP) model in which the intensity function is modelled as a log-linear function of covariates.
Abstract: 1. The need to understand the processes shaping population distributions has resulted in a vast increase in the diversity of spatial wildlife data, leading to the development of many novel analytical techniques that are fit-for-purpose. One may aggregate location data into spatial units (e.g. grid cells) and model the resulting counts or presence–absences as a function of environmental covariates. Alternatively, the point data may be modelled directly, by combining the individual observations with a set of random or regular points reflecting habitat availability, a method known as a use-availability design (or, alternatively a presence – pseudo-absence or case–control design). 2. Although these spatial point, count and presence–absence methods are widely used, the ecological literature is not explicit about their connections and how their parameter estimates and predictions should be interpreted. The objective of this study is to recapitulate some recent statistical results and illustrate that under certain assumptions, each method can be motivated by the same underlying spatial inhomogeneous Poisson point process (IPP) model in which the intensity function is modelled as a log-linear function of covariates. 3. The Poisson likelihood used for count data is a discrete approximation of the IPP likelihood. Similarly, the presence–absence design will approximate the IPP likelihood, but only when spatial units (i.e. pixels) are extremely small (Electric Journal of Statistics, 2010, 4, 1151–1201). For larger pixel sizes, presence–absence designs do not differentiate between one or multiple observations within each pixel, hence leading to information loss. 4. Logistic regression is often used to estimate the parameters of the IPP model using point data. Although the response variable is defined as 0 for the availability points, these zeros do not serve as true absences as is often assumed; rather, their role is to approximate the integral of the denominator in the IPP likelihood (The Annals of Applied Statistics, 2010, 4, 1383–1402). Because of this common misconception, the estimated exponential function of the linear predictor (i.e. the resource selection function) is often assumed to be proportional to occupancy. Like IPP and count models, this function is proportional to the expected density of observations. 5. Understanding these (dis-)similarities between different species distribution modelling techniques should improve biological interpretation of spatial models and therefore advance ecological and methodological cross-fertilization.
TL;DR: In this article, the authors introduce multiplicative drift analysis as a suitable way to analyze the runtime of randomized search heuristics such as evolutionary algorithms, and give a relatively simple proof for the fact that any linear function is optimized in expected time O(n), where n is the length of the bit string.
Abstract: We introduce multiplicative drift analysis as a suitable way to analyze the runtime of randomized search heuristics such as evolutionary algorithms. Our multiplicative version of the classical drift theorem allows easier analyses in the often encountered situation that the optimization progress is roughly proportional to the current distance to the optimum. To display the strength of this tool, we regard the classical problem of how the (1+1) Evolutionary Algorithm optimizes an arbitrary linear pseudo-Boolean function. Here, we first give a relatively simple proof for the fact that any linear function is optimized in expected time O(nlogn), where n is the length of the bit string. Afterwards, we show that in fact any such function is optimized in expected time at most (1+o(1))1.39enlnn, again using multiplicative drift analysis. We also prove a corresponding lower bound of (1−o(1))enlnn which actually holds for all functions with a unique global optimum. We further demonstrate how our drift theorem immediately gives natural proofs (with better constants) for the best known runtime bounds for the (1+1) Evolutionary Algorithm on combinatorial problems like finding minimum spanning trees, shortest paths, or Euler tours in graphs.
TL;DR: This paper considers the case where each univariate component function fj* lies in a reproducing kernel Hilbert space (RKHS), and analyzes a method for estimating the unknown function f* based on kernels combined with l1-type convex regularization, obtaining optimal minimax rates for many interesting classes of sparse additive models.
Abstract: Sparse additive models are families of d-variate functions with the additive decomposition f* = Σj∈S fj*, where S is an unknown subset of cardinality s < d. In this paper, we consider the case where each univariate component function fj* lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f* based on kernels combined with l1-type convex regularization. Working within a high-dimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2(P) and L2(Pn) norms over the class Fd,s,H of sparse additive models with each univariate function fj* in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2(P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the d-variate function class is assumed to be globally bounded, then much faster estimation rates are possible for any sparsity s = Ω(√n), showing that global boundedness is a significant restriction in the high-dimensional setting.
TL;DR: A finite difference semi-implicit scheme is proposed for the optimal planning problem, which has an optimal control formulation and a strategy based on Newton iterations is proposed.
Abstract: Mean field games describe the asymptotic behavior of differential games in which the number of players tends to $+\infty$. Here we focus on the optimal planning problem, i.e., the problem in which the positions of a very large number of identical rational agents, with a common value function, evolve from a given initial spatial density to a desired target density at the final horizon time. We propose a finite difference semi-implicit scheme for the optimal planning problem, which has an optimal control formulation. The latter leads to existence and uniqueness of the discrete control problem. We also study a penalized version of the semi-implicit scheme. For solving the resulting system of equations, we propose a strategy based on Newton iterations. We describe some numerical experiments.
TL;DR: In this article, the cosmological reconstruction in modified f(R, T) gravity is investigated, where R is the Ricci scalar and T the trace of the stress energy tensor.
Abstract: We investigate the cosmological reconstruction in modified f(R, T) gravity, where R is the Ricci scalar and T the trace of the stress–energy tensor. Special attention is attached to the case in which the function f is given by f(R, T) = f1(R) + f2(T). The use of auxiliary scalar field is considered with two known examples for the scale factor corresponding to an expanding universe. In the first example, where ordinary matter is usually neglected for obtaining the unification of matter dominated and accelerated phases with f(R) gravity, it is shown in this paper that this unification can be obtained without neglecting ordinary matter. In the second example, as in f(R) gravity, model of f(R, T) gravity with transition of matter dominated phase to the acceleration phase is obtained. In both cases, linear function of the trace is assumed for f2(T) and it is obtained that f1(R) is proportional to a power of R with exponents depending on the input parameters.
TL;DR: In this paper, the authors generalize the single-field consistency relation to capture not only the leading term in the squeezed limit, but also the subleading one, going as 1/q2.
Abstract: We generalize the single-field consistency relations to capture not only the leading term in the squeezed limit — going as 1/q3, where q is the small wavevector — but also the subleading one, going as 1/q2. This term, for an (n+1)-point function, is fixed in terms of the variation of the n-point function under a special conformal transformation; this parallels the fact that the 1/q3 term is related with the scale dependence of the n-point function. For the squeezed limit of the 3-point function, this conformal consistency relation implies that there are no terms going as 1/q2. We verify that the squeezed limit of the 4-point function is related to the conformal variation of the 3-point function both in the case of canonical slow-roll inflation and in models with reduced speed of sound. In the second case the conformal consistency conditions capture, at the level of observables, the relation among operators induced by the non-linear realization of Lorentz invariance in the Lagrangian. These results mean that, in any single-field model, primordial correlation functions of ζ are endowed with an SO(4,1) symmetry, with dilations and special conformal transformations non-linearly realized by ζ. We also verify the conformal consistency relations for any n-point function in models with a modulation of the inflaton potential, where the scale dependence is not negligible. Finally, we generalize (some of) the consistency relations involving tensors and soft internal momenta.
TL;DR: Although little known, it is possible to construct an expansion of the objective function in its original complex variables by notching up the real and imaginary parts of its complex argument.
Abstract: Nonlinear optimization problems in complex variables are frequently encountered in applied mathematics and engineering applications such as control theory, signal processing, and electrical engineering. Optimization of these problems often requires a first- or second-order approximation of the objective function to generate a new step or descent direction. However, such methods cannot be applied to real functions of complex variables because they are necessarily nonanalytic in their argument, i.e., the Taylor series expansion in their argument alone does not exist. To overcome this problem, the objective function is usually redefined as a function of the real and imaginary parts of its complex argument so that standard optimization methods can be applied. However, this approach may needlessly disguise any inherent structure present in the derivatives of such complex problems. Although little known, it is possible to construct an expansion of the objective function in its original complex variables by noti...
TL;DR: In this article, a novel gradient algorithm is proposed to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem.
Abstract: In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's observed behavior. The main difficulty is that the mapping from the parameters to policies is both nonsmooth and highly redundant. Resorting to subdifferentials solves the first difficulty, while the second one is over- come by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods.
TL;DR: The behavior of the Newton-based nonlinear solver as a function of timestep size for different variable sets and for different nonlinear updating strategies is analyzed.
TL;DR: In this paper, a mobile terminal and application sharing method are provided to transmit changed use information to a terminal sharing an application if the use information of a user about an executing application is changed.
Abstract: PURPOSE: A mobile terminal and application sharing method thereof are provided to transmit changed use information to a terminal sharing an application if the use information of a user about an executing application is changed CONSTITUTION: A control unit executes an application sharing function(S201) The control unit displays an application list within a terminal on a display screen(S202) If one application on the displayed application list is selected, the control unit outputs a popup window for a sharing type selection on a display unit(S203, S204) The control unit transmits the selected application information(S205)
TL;DR: In this paper, a penalty for choosing the number of change-points in the kernel-based method of Harchaoui and Capp{\'e} was proposed.
Abstract: We tackle the change-point problem with data belonging to a general set. We build a penalty for choosing the number of change-points in the kernel-based method of Harchaoui and Capp{\'e} (2007). This penalty generalizes the one proposed by Lebarbier (2005) for one-dimensional signals. We prove a non-asymptotic oracle inequality for the proposed method, thanks to a new concentration result for some function of Hilbert-space valued random variables. Experiments on synthetic data illustrate the accuracy of our method, showing that it can detect changes in the whole distribution of data, even when the mean and variance are constant.
TL;DR: In this article, a scaling gain is introduced into the output feedback controller, which can be used by tuning the scaling gain to solve the problem of global output feedback stabilization for a class of upper-triangular systems.
TL;DR: In this article, it was shown that for certain classes of admissible inputs the existence of an ISS-Lyapunov function implies the input-to-state stability of a system.
Abstract: We develop tools for investigation of input-to-state stability (ISS) of infinite-dimensional control systems. We show that for certain classes of admissible inputs the existence of an ISS-Lyapunov function implies the input-to-state stability of a system. Then for the case of systems described by abstract equations in Banach spaces we develop two methods of construction of local and global ISS-Lyapunov functions. We prove a linearization principle that allows a construction of a local ISS-Lyapunov function for a system which linear approximation is ISS. In order to study interconnections of nonlinear infinite-dimensional systems, we generalize the small-gain theorem to the case of infinite-dimensional systems and provide a way to construct an ISS-Lyapunov function for an entire interconnection, if ISS-Lyapunov functions for subsystems are known and the small-gain condition is satisfied. We illustrate the theory on examples of linear and semilinear reaction-diffusion equations.
TL;DR: This work presents an alternating augmented Lagrangian method for convex optimization problems where the cost function is the sum of two terms, one that is separable in the variable blocks, and a second that is separated in theVariable blocks.
Abstract: We present an alternating augmented Lagrangian method for convex optimization problems where the cost function is the sum of two terms, one that is separable in the variable blocks, and a second that is separable in the difference between consecutive variable blocks. Examples of such problems include Fused Lasso estimation, total variation denoising, and multi-period portfolio optimization with transaction costs. In each iteration of our method, the first step involves separately optimizing over each variable block, which can be carried out in parallel. The second step is not separable in the variables, but can be carried out very efficiently. We apply the algorithm to segmentation of data based on changes inmean (l_1 mean filtering) or changes in variance (l_1 variance filtering). In a numerical example, we show that our implementation is around 10000 times faster compared with the generic optimization solver SDPT3.
TL;DR: A new measure to quantify circular-linear associations is introduced that leads to a robust estimate of the slope and phase offset of the regression line, and it provides a correlation coefficient for circular- linear data that is a natural analog of Pearson's product-moment correlation coefficientFor linear-linear data.
TL;DR: In this article, the authors present a quantum algorithm and a scalable quantum circuit design which approximates the solution of the Poisson equation on a grid with error O(varepsilon.
Abstract: The Poisson equation occurs in many areas of science and engineering. Here we focus on its numerical solution for an equation in d dimensions. In particular we present a quantum algorithm and a scalable quantum circuit design which approximates the solution of the Poisson equation on a grid with error \varepsilon. We assume we are given a supersposition of function evaluations of the right hand side of the Poisson equation. The algorithm produces a quantum state encoding the solution. The number of quantum operations and the number of qubits used by the circuit is almost linear in d and polylog in \varepsilon^{-1}. We present quantum circuit modules together with performance guarantees which can be also used for other problems.
TL;DR: In this paper, the conditional quantile for each fixed quantile index is modeled as a linear functional of the covariate, and an estimator for the slope function based on the principal component basis is obtained by a plug-in method.
Abstract: This paper studies estimation in functional linear quantile regression in which the dependent variable is scalar while the covariate is a function, and the conditional quantile for each fixed quantile index is modeled as a linear functional of the covariate. Here we suppose that covariates are discretely observed and sampling points may differ across subjects, where the number of measurements per subject increases as the sample size. Also, we allow the quantile index to vary over a given subset of the open unit interval, so the slope function is a function of two variables: (typically) time and quantile index. Likewise, the conditional quantile function is a function of the quantile index and the covariate. We consider an estimator for the slope function based on the principal component basis. An estimator for the conditional quantile function is obtained by a plug-in method. Since the so-constructed plug-in estimator not necessarily satisfies the monotonicity constraint with respect to the quantile index, we also consider a class of monotonized estimators for the conditional quantile function. We establish rates of convergence for these estimators under suitable norms, showing that these rates are optimal in a minimax sense under some smoothness assumptions on the covariance kernel of the covariate and the slope function. Empirical choice of the cutoff level is studied by using simulations.
TL;DR: In this paper, an improved interpolating moving least-square (IIMLS) method is presented, where the shape function of the IIMLS method satisfies the property of the Kronecker δ function.
Abstract: In this paper, an improved interpolating moving least-square (IIMLS) method is presented. The shape function of the IIMLS method satisfies the property of the Kronecker δ function. The weight function used in the IIMLS method is nonsingular. Then the IIMLS method can overcome the difficulties caused by the singularity of the weight function in the IMLS method. The number of unknown coefficients in the trial function of the IIMLS method is less than that of the moving least-square (MLS) approximation. Then by combining the IIMLS method with the Galerkin weak form of the potential problem, the improved interpolating element-free Galerkin (IIEFG) method for two-dimensional potential problems is presented. Compared with the conventional element-free Galerkin (EFG) method, the IIEFG method can directly use the essential boundary conditions. Then the IIEFG method has higher accuracy. For demonstration, three numerical examples are solved using the IIEFG method.
TL;DR: In this paper, the authors show how to construct garbled RAM programs (GRAM) where its size only depends on fixed polynomial in the security parameter times the program running time.
Abstract: Assuming solely the existence of one-way functions, we show how to construct Garbled RAM Programs (GRAM) where its size only depends on fixed polynomial in the security parameter times the program running time. We stress that we avoid converting the RAM programs into circuits. As an example, our techniques implies the first garbled binary search program (searching over sorted encrypted data stored in a cloud) which is poly-logarithmic in the data size instead of linear. Our result requires the existence of one-way function and enjoys the same non-interactive properties as Yao’s original garbled circuits.
TL;DR: In this paper, the authors introduce two new subclasses of the function class Σ of bi-univalent functions defined in the open unit disc, and find estimates on the coefficients 2 a and 3 a for functions in these subclasses.
Abstract: In this paper, we introduce two new subclasses of the function class Σ of bi-univalent functions defined in the open unit disc. We find estimates on the coefficients 2 a and 3 a for functions in these new subclasses.
TL;DR: An approach based on the IG-IFOWA and IFWA (intuitionistic fuzzy weighted averaging) operators is developed to solve MAGDM problems with intuitionistic fuzzy information.
Abstract: With respect to multi-attribute group decision making (MAGDM) problems in which both the attribute weights and the decision makers (DMs) weights take the form of real numbers, attribute values provided by the DMs take the form of intuitionistic fuzzy numbers, a new group decision making method is developed. Some operational laws, score function and accuracy function of intuitionistic fuzzy numbers are introduced at first. Then a new aggregation operator called induced generalized intuitionistic fuzzy ordered weighted averaging (IG-IFOWA) operator is proposed, which extend the induced generalized ordered weighted averaging (IGOWA) operator introduced by Merigo and Gil-Lafuente [Merigo, J. M., & Gil-Lafuente, A. M. (2009). The induced generalized OWA operator. Information Sciences, 179, 729-741] to accommodate the environment in which the given arguments are intuitionistic fuzzy sets that are characterized by a membership function and a non-membership function. Some desirable properties of the IG-IFOWA operator are studied, such as commutativity, idempotency, monotonicity and boundary. And then, an approach based on the IG-IFOWA and IFWA (intuitionistic fuzzy weighted averaging) operators is developed to solve MAGDM problems with intuitionistic fuzzy information. Finally, a numerical example is used to illustrate the developed approach.
TL;DR: In this paper, a stable measure of sparsity s(x) is proposed, which is a sharp lower bound on the sparsity of the unknown signal x. The estimation procedure uses only a small number of linear measurements, does not rely on any sparsity assumptions and requires very little computation.
Abstract: In the theory of compressed sensing (CS), the sparsity ||x||_0 of the unknown signal x\in\R^p is commonly assumed to be a known parameter. However, it is typically unknown in practice. Due to the fact that many aspects of CS depend on knowing ||x||_0, it is important to estimate this parameter in a data-driven way. A second practical concern is that ||x||_0 is a highly unstable function of x. In particular, for real signals with entries not exactly equal to 0, the value ||x||_0=p is not a useful description of the effective number of coordinates. In this paper, we propose to estimate a stable measure of sparsity s(x):=||x||_1^2/||x||_2^2, which is a sharp lower bound on ||x||_0. Our estimation procedure uses only a small number of linear measurements, does not rely on any sparsity assumptions, and requires very little computation. A confidence interval for s(x) is provided, and its width is shown to have no dependence on the signal dimension p. Moreover, this result extends naturally to the matrix recovery setting, where a soft version of matrix rank can be estimated with analogous guarantees. Finally, we show that the use of randomized measurements is essential to estimating s(x). This is accomplished by proving that the minimax risk for estimating s(x) with deterministic measurements is large when n<
TL;DR: In this paper, the authors introduce a class of deformations of the values of the Goss zeta function and prove a formula for their value at 1, and some arithmetic properties of values at other positive integers.
Abstract: We introduce a class of deformations of the values of the Goss zeta function. We prove, with the use of the theory of deformations of vectorial modular forms as well as with other techniques, a formula for their value at 1, and some arithmetic properties of values at other positive integers. Our formulas involve Anderson and Thakur’s function !. We discuss how our formulas may be used to investigate the existence of a kind of functional equation for the Goss zeta function.
TL;DR: In this article, the existence of analytically strong solutions to stochastic partial differential equations (SPDE) with drift given by the subdifferential of a quasi-convex function and with general multiplicative noise is proven.
TL;DR: This work presents a one-parameter family of divergence functions for measuring distances between Hermitian positive-definite matrices, and studies the invariance properties of these divergence functions as well as the matrix means based on them.