TL;DR: The Kernel Estimate as a Frequency Counting Curve and the Histogram as a Maximum Likelihood Estimate: Keeping the Kernel Bias the Same and Keeping the Support of the Kernel the Same.
Abstract: I. Density Smoothing.- 1. The Histogram.- 1.0 Introduction.- 1.1 Definitions of the Histogram.- The Histogram as a Frequency Counting Curve.- The Histogram as a Maximum Likelihood Estimate.- Varying the Binwidth.- 1.2 Statistics of the Histogram.- 1.3 The Histogram in S.- 1.4 Smoothing the Histogram by WARPing.- WARPing Algorithm.- WARPing in S.- Exercises.- 2. Kernel Density Estimation.- 2.0 Introduction.- 2.1 Definition of the Kernel Estimate.- Varying the Kernel.- Varying the Bandwidth.- 2.2 Kernel Density Estimation in S.- Direct Algorithm.- Implementation in S.- 2.3 Statistics of the Kernel Density.- Speed of Convergence.- Confidence Intervals and Confidence Bands.- 2.4 Approximating Kernel Estimates by WARPing.- 2.5 Comparison of Computational Costs.- 2.6 Comparison of Smoothers Between Laboratories.- Keeping the Kernel Bias the Same.- Keeping the Support of the Kernel the Same.- Canonical Kernels.- 2.7 Optimizing the Kernel Density.- 2.8 Kernels of Higher Order.- 2.9 Multivariate Kernel Density Estimation.- Same Bandwidth in Each Component.- Nonequal Bandwidths in Each Component.- A Matrix of Bandwidths.- Exercises.- 3. Further Density Estimators.- 3.0 Introduction.- 3.1 Orthogonal Series Estimators.- 3.2 Maximum Penalized Likelihood Estimators.- Exercises.- 4. Bandwidth Selection in Practice.- 4.0 Introduction.- 4.1 Kernel Estimation Using Reference Distributions.- 4.2 Plug-In Methods.- 4.3 Cross-Validation.- 4.3.1 Maximum Likelihood Cross-Validation.- Direct Algorithm.- 4.3.2 Least-Squares Cross-Validation.- Direct Algorithm.- 4.3.3 Biased Cross-Validation.- Algorithm.- 4.4 Cross-Validation for WARPing Density Estimation.- 4.4.1 Maximum Likelihood Cross-Validation.- 4.4.2 Least-Squares Cross-Validation.- Algorithm.- Implementation in S.- 4.4.3 Biased Cross-Validation.- Algorithm.- Implementation in S.- Exercises.- II. Regression Smoothing.- 5. Nonparametric Regression.- 5.0 Introduction.- 5.1 Kernel Regression Smoothing.- 5.1.1 The Nadaraya-Watson Estimator.- Direct Algorithm.- Implementation in S.- 5.1.2 Statistics of the Nadaraya-Watson Estimator.- 5.1.3 Confidence Intervals.- 5.1.4 Fixed Design Model.- 5.1.5 The WARPing Approximation.- Basic Algorithm.- Implementation in S.- 5.2 k-Nearest Neighbor (k-NN).- 5.2.1 Definition of the k-NN Estimate.- 5.2.2 Statistics of the k-NN Estimate.- 5.3 Spline Smoothing.- Exercises.- 6. Bandwidth Selection.- 6.0 Introduction.- 6.1 Estimates of the Averaged Squared Error.- 6.1.0 Introduction.- 6.1.1 Penalizing Functions.- 6.1.2 Cross-Validation.- Direct Algorithm.- 6.2 Bandwidth Selection with WARPing.- Penalizing Functions.- Cross-Validation.- Basic Algorithm.- Implementation in S.- Applications.- Exercises.- 7. Simultaneous Error Bars.- 7.1 Golden Section Bootstrap.- Algorithm for Golden Section Bootstrapping.- Implementation in S.- 7.2 Construction of Confidence Intervals.- Exercises.- Tables.- Solutions.- List of Used S Commands.- Symbols and Notation.- References.
TL;DR: This document includes some detailed supplemental derivations used in the bandwidth estimation for the online Kernel Density Estimator which was proposed in the paper \Multivariate Online Kernel D density Estimation with Gaussian Kernels.
Abstract: This document includes some detailed supplemental derivations used in the bandwidth estimation for the online Kernel Density Estimator which was proposed in the paper \Multivariate Online Kernel Density Estimation with Gaussian Kernels" by authors Matej Kristan, Ale
TL;DR: It is proved that finding a maximum weight spanning forest with restricted tree size is NP-hard, and an approximation algorithm is developed for this problem.
Abstract: We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models.
TL;DR: The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management.
Abstract: This paper presents an analysis of motor vehicle insurance claims relating to vehicle damage and to associated medical expenses. We use univariate severity distributions estimated with parametric and non-parametric methods. The methods are implemented using the statistical package R. Parametric analysis is limited to estimation of normal and lognormal distributions for each of the two claim types. The nonparametric analysis presented involves kernel density estimation. We illustrate the benefits of applying transformations to data prior to employing kernel based methods. We use a log-transformation and an optimal transformation amongst a class of transformations that produces symmetry in the data. The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management. To this end, we have included in the Appendix of this paper all the R code that has been used in the analysis so that readers, both students and educators, can fully explore the techniques described.
TL;DR: In this article, the authors introduce a specific class of product kernels whose order is suitably defined in such a way to obtain L 2 risk formulas whose structure can be compared to their Euclidean counterparts.
TL;DR: DETs empirically exhibit the interpretability, adaptability and feature selection properties of supervised decision trees while incurring slight loss in accuracy over other nonparametric density estimators, suggesting they might be able to avoid the curse of dimensionality if the true density is sparse in dimensions.
Abstract: In this paper we develop density estimation trees (DETs), the natural analog of classification trees and regression trees, for the task of density estimation. We consider the estimation of a joint probability density function of a d-dimensional random vector X and define a piecewise constant estimator structured as a decision tree. The integrated squared error is minimized to learn the tree. We show that the method is nonparametric: under standard conditions of nonparametric density estimation, DETs are shown to be asymptotically consistent. In addition, being decision trees, DETs perform automatic feature selection. They empirically exhibit the interpretability, adaptability and feature selection properties of supervised decision trees while incurring slight loss in accuracy over other nonparametric density estimators. Hence they might be able to avoid the curse of dimensionality if the true density is sparse in dimensions. We believe that density estimation trees provide a new tool for exploratory data analysis with unique capabilities.
TL;DR: In this paper, a histogram-like estimator of a conditional density that uses cross-validation to estimate the histogram probabilities, as well as the optimal number and position of the bins is presented.
Abstract: In this paper, we present a histogram-like estimator of a conditional density that uses cross-validation to estimate the histogram probabilities, as well as the optimal number and position of the bins. This estimator is an alternative to kernel density estimators when the dimension of the covariate vector is large. We demonstrate its applicability to estimation of Marginal Structural Model (MSM) parameters in which an initial estimator of the exposure mechanism is needed. MSM estimation based on the proposed density estimator results in less biased estimates, when compared to estimates based on a misspecified parametric model.
TL;DR: This letter employs a squared-loss variant of mutual information as an independence measure and gives its estimation method, and develops an ICA algorithm, named least-squares independent component analysis.
Abstract: Accurately evaluating statistical independence among random variables is a key element of independent component analysis (ICA). In this letter, we employ a squared-loss variant of mutual information as an independence measure and give its estimation method. Our basic idea is to estimate the ratio of probability densities directly without going through density estimation, thereby avoiding the difficult task of density estimation. In this density ratio approach, a natural cross-validation procedure is available for hyperparameter selection. Thus, all tuning parameters such as the kernel width or the regularization parameter can be objectively optimized. This is an advantage over recently developed kernel-based independence measures and is a highly useful property in unsupervised learning problems such as ICA. Based on this novel independence measure, we develop an ICA algorithm, named least-squares independent component analysis.
TL;DR: In this paper, a root-n consistent estimator of the probability density function of the response variable in a nonparametric regression model is proposed, which has a (uniform) asymptotic normal distribution and is computationally very simple to calculate.
Abstract: The paper introduces a root-n consistent estimator of the probability density function of the response variable in a nonparametric regression model. The proposed estimator is shown to have a (uniform) asymptotic normal distribution, and it is computationally very simple to calculate. A Monte Carlo experiment confirms our theoretical results, and an empirical application demonstrates its usefulness. The results derived in the paper adapts general U-processes theory to the inclusion of infinite dimensional nuisance parameters.
TL;DR: In this paper, a multivariate and multimodal wind distribution (MMWD) model was developed to estimate the wind conditions and design optimal wind farm configurations. But, the model is not suitable for large-scale wind farms due to the non-uniform distribution of wind speed, wind direction and air density.
TL;DR: The self‐consistent estimate is defined as a prior candidate density that precisely reproduces itself and is applied to artificial data generated from various distributions and reaches the theoretical limit for the scaling of the square error with the size of the data set.
Abstract: The estimation of a density profile from experimental data points is a challenging problem, usually tackled by plotting a histogram. Prior assumptions on the nature of the density, from its smoothness to the specification of its form, allow the design of more accurate estimation procedures, such as Maximum Likelihood. Our aim is to construct a procedure that makes no explicit assumptions, but still providing an accurate estimate of the density. We introduce the self-consistent estimate: the power spectrum of a candidate density is given, and an estimation procedure is constructed on the assumption, to be released a posteriori, that the candidate is correct. The self-consistent estimate is defined as a prior candidate density that precisely reproduces itself. Our main result is to derive the exact expression of the self-consistent estimate for any given dataset, and to study its properties. Applications of the method require neither priors on the form of the density nor the subjective choice of parameters. A cutoff frequency, akin to a bin size or a kernel bandwidth, emerges naturally from the derivation. We apply the self-consistent estimate to artificial data generated from various distributions and show that it reaches the theoretical limit for the scaling of the square error with the dataset size.
TL;DR: In this article, the authors prove the asymptotic normality of the kernel density estimator in the context of stationary strongly mixing random fields, which is based on the Lindeberg method rather than on Bernstein's small-block large-block technique and coupling arguments widely used in previous works on nonparametric estimation for spatial processes.
Abstract: We prove the asymptotic normality of the kernel density estimator (introduced by Rosenblatt, Proc Natl Acad Sci USA 42:43–47, 1956 and Parzen, Ann Math Stat 33:1965–1976, 1962) in the context of stationary strongly mixing random fields. Our approach is based on the Lindeberg’s method rather than on Bernstein’s small-block-large-block technique and coupling arguments widely used in previous works on nonparametric estimation for spatial processes. Our method allows us to consider only minimal conditions on the bandwidth parameter and provides a simple criterion on the strong mixing coefficients which do not depend on the bandwidth.
TL;DR: Experimental results show that the proposed method outperforms the Nadaraya-Watson estimator in terms of revised mean integrated squared error (RMISE) and is an effective method for estimating the conditional densities.
TL;DR: In this paper, the authors study the problem of estimating density functions with support in [0, 1] from an asymptotic minimax point of view and prove that for very regular density functions or for certain losses, these estimators are not minimax.
TL;DR: In this article, the kernel estimator of a Density kernel is replaced by a kernel estimate of the Varying Bandwidths Estimator (VBE) of a Regression Function.
Abstract: Introduction Kernel Estimator of a Density Kernel Estimator of a Regression Function Limits for the Varying Bandwidths Estimators Nonparametric Estimation of Quantiles Nonparametric Estimation for Stochastic Processes Estimation in Semi-Parametric Regression Models Diffusions Processes Applications to Time Series
TL;DR: In this article, a class of Fourier series-based direct plug-in bandwidth selectors for kernel density estimation is considered and the proposed bandwidth estimators have a relative convergence rate n − 1.
Abstract: A class of Fourier series-based direct plug-in bandwidth selectors for kernel density estimation is considered in this paper. The proposed bandwidth estimators have a relative convergence rate n −1...
TL;DR: In this article, a rescaled generalized Bernstein polynomial was proposed for approximating any continuous function defined on the closed interval [ 0, Δ ], whose coefficients are probabilities of the binomial random variable with parameters (m − 1, x / Δ ) depending on the location x ∈ [ 0, Δ ] where the density estimation is made.
TL;DR: A kernel density estimator of a bootstrap series that estimates their marginal densities root-$n$ consistently is presented, equal to the rate of the best known convolution estimators, and faster than the standard kerneldensity estimator.
Abstract: This thesis is concerned with nonparametric techniques for inferring properties of time series. First, we consider finite-order moving average and nonlinear autoregressive processes with no parametric assumption on the innovation distribution, and present a kernel density estimator of a bootstrap series that estimates their marginal densities root-$n$ consistently. This is equal to the rate of the best known convolution estimators, and faster than the standard kernel density estimator. We also conduct simulations to check the finite sample properties of our estimator, and the results are generally better than corresponding results for the standard kernel density estimator. Next, given stationary time series data, we study the problem of finding the best linear combination of a set of lag window spectral density estimators with respect to the mean squared risk. We present an aggregation procedure and prove a sharp oracle inequality for its risk. We also provide simulations demonstrating the performance of our aggregation procedure, given Bartlett and other estimators of varying bandwidths as input. This extends work by Rigollet and Tsybakov on aggregation of density estimators. The last part of this thesis introduces a class of robust autocorrelation estimators based on interpreting the sample autocorrelation function as a linear regression. We investigate the efficiency and robustness properties of the estimators that result from plugging on three common robust regression techniques. Construction of robust autocovariance and positive definite autocorrelation estimates is discussed, as well as application of the estimators to AR model fitting. We finish with simulations, which suggest that the estimators are especially well suited for AR model fitting
TL;DR: In this article, a kernel estimator of a density in which the kernel is adapted to the data but not fixed is proposed and studied, which naturally leads to an adaptive choice of the smoothing parameters which avoids asymptotic expansions.
TL;DR: In this article, the authors developed a test for log-concavity of multivariate densities using kernel density estimation, where the test statistic is the smallest bandwidth for which the estimate is logconcave.
TL;DR: The asymptotic normality of the error density estimator and its rate-optimality are investigated, and the optimal choices of the first and second-step bandwidths used for estimating the regression function and the errordensity respectively are proposed.
TL;DR: In this article, the authors considered a general framework to jointly model continuous, count and categorical variables under a nonparametric prior, which is induced through rounding latent variables having an unknown density with respect to Lebesgue measure.
Abstract: Although continuous density estimation has received abundant attention in the Bayesian nonparametrics literature, there is limited theory on multivariate mixed scale density estimation. In this note, we consider a general framework to jointly model continuous, count and categorical variables under a nonparametric prior, which is induced through rounding latent variables having an unknown density with respect to Lebesgue measure. For the proposed class of priors, we provide sufficient conditions for large support, strong consistency and rates of posterior contraction. These conditions allow one to convert sufficient conditions obtained in the setting of multivariate continuous density estimation to the mixed scale case. To illustrate the procedure a rounded multivariate nonparametric mixture of Gaussians is introduced and applied to a crime and communities dataset.
TL;DR: A data-driven estimator is developed that adapts to unknown anisotropic smoothness of the joint density and, whenever the density depends on a smaller number of variables, performs a dimension reduction that implies the corresponding optimal rate of the mean integrated squared error (MISE) convergence.
TL;DR: In this paper, statistics are developed to test for the presence of an asymptotic discontinuity (or infinite density or peakedness) in a probability density at the median.
Abstract: Statistics are developed to test for the presence of an asymptotic discontinuity (or infinite density or peakedness) in a probability density at the median. The approach makes use of work by Knight (1998) on L1 estimation asymptotics in conjunction with non-parametric kernel density estimation methods. The size and power of the tests are assessed, and conditions under which the tests have good performance are explored in simulations. The new methods are applied to stock returns of leading companies across major U.S. industry groups. The results confirm the presence of infinite density at the median as a new significant empirical evidence for stock return distributions.
TL;DR: In this article, the variable location kernel (VLK) method was used to fit line transect data in order to estimate the density of a biological population, which improved upon the performance of the classical kernel estimator.
TL;DR: In this article, the authors propose to learn a full Euclidean metric through an expectation-minimization (EM) procedure, which can be seen as an unsupervised counterpart to neighbourhood component analysis (NCA).
Abstract: Kernel density estimation, a.k.a. Parzen windows, is a popular density estimation method, which can be used for outlier detection or clustering. With multivariate data, its performance is heavily reliant on the metric used within the kernel. Most earlier work has focused on learning only the bandwidth of the kernel (i.e., a scalar multiplicative factor). In this paper, we propose to learn a full Euclidean metric through an expectation-minimization (EM) procedure, which can be seen as an unsupervised counterpart to neighbourhood component analysis (NCA). In order to avoid overfitting with a fully nonparametric density estimator in high dimensions, we also consider a semi-parametric Gaussian-Parzen density model, where some of the variables are modelled through a jointly Gaussian density, while others are modelled through Parzen windows. For these two models, EM leads to simple closed-form updates based on matrix inversions and eigenvalue decompositions. We show empirically that our method leads to density estimators with higher test-likelihoods than natural competing methods, and that the metrics may be used within most unsupervised learning techniques that rely on such metrics, such as spectral clustering or manifold learning methods. Finally, we present a stochastic approximation scheme which allows for the use of this method in a large-scale setting.
TL;DR: A method for nonparametric density estimation that exhibits robustness to contamination of the training sample is analyzed, achieving robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation.
Abstract: We analyze a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. The KDE based on a Gaussian kernel is interpreted as a sample mean in the associated reproducing kernel Hilbert space (RKHS). This mean is estimated robustly through the use of a robust loss, yielding the so-called robust kernel density estimator (RKDE). This robust sample mean can be found via a kernelized iteratively re-weighted least squares (IR-WLS) algorithm. Our contributions are summarized as follows. First, we present a representer theorem for the RKDE, which gives an insight into the robustness of the RKDE. Second, we provide necessary and sufficient conditions for kernel IRWLS to converge to the global minimizer, in the Gaussian RKHS, of the objective function defining the RKDE. Third, characterize and provide a method for computing the influence function associated with the RKDE. Fourth, we illustrate the robustness of the RKDE through experiments on several data sets.
TL;DR: A scheme is developed for estimating state-dependent drift and diffusion coefficients in a stochastic differential equation from time-series data using a maximum likelihood method combined with a concept based on a kernel density estimation.
Abstract: A scheme is developed for estimating state-dependent drift and diffusion coefficients in a stochastic differential equation from time-series data. The scheme does not require to specify parametric forms for the drift and diffusion coefficients in advance. In order to perform the nonparametric estimation, a maximum likelihood method is combined with a concept based on a kernel density estimation. In order to deal with discrete observation or sparsity of the time-series data, a local linearization method is employed, which enables a fast estimation.
TL;DR: In this article, a density estimator and an estimator of the distribution function in the uniform deconvolution model were constructed based on inversion formulas and kernel estimators of the density of the observations and its derivative.
Abstract: We construct a density estimator and an estimator of the distribution function in the uniform deconvolution model. The estimators are based on inversion formulas and kernel estimators of the density of the observations and its derivative. Initially the inversions yield two different estimators of the density and two estimators of the distribution function. We construct asymptotically optimal convex combinations of these two estimators. We also derive pointwise asymptotic normality of the resulting estimators, the pointwise asymptotic biases and an expansion of the mean integrated squared error of the density estimator. It turns out that the pointwise limit distribution of the density estimator is the same as the pointwise limit distribution of the density estimator introduced by Groeneboom and Jongbloed (Neerlandica, 57, 2003, 136), a kernel smoothed nonparametric maximum likelihood estimator of the distribution function.