TL;DR: This paper concludes with a discussion of the inherent advantages of kernel estimation techniques and systematic errors associated with the estimation of parent distributions.
TL;DR: This article proposes assessing the capability of a process using a nonparametric estimator that is based on a kernel estimate of an integral of a multivariate density and a smoothed bootstrap estimate of the mean squared error of the estimator.
Abstract: The determination of the capability of a stable process with a multivariate quality characteristic using standard methods usually requires the assumption that the quality characteristic of interest follows a multivariate normal distribution. Unfortunately, multivariate normality is a difficult assumption to assess. Further, departures from this assumption can result in erroneous conclusions. In this article, I propose assessing the capability of a process using a nonparametric estimator. This estimator is based on a kernel estimate of an integral of a multivariate density. Bandwidth selection for this method is based on a smoothed bootstrap estimate of the mean squared error of the estimator. I also address the issue of constructing approximate confidence intervals. An example is presented that applies the proposed method to bivariate nonnormal process data. The performance of the resulting estimator is then compared to the sample proportion and a normal parametric estimate in a simulation study.
TL;DR: In this article, the authors discuss a number of issues in the smoothed nonparametric estimation of kernel conditional probability density functions for stationary processes, and point out the different implications of leading choices of bandwidths in numerator and denominator for the ability of the estimate to integrate to one and to have finite moments.
Abstract: We discuss a number of issues in the smoothed nonparametric estimation of kernel conditional probability density functions for stationary processes. The kernel conditional density estimate is a ratio of joint and marginal density estimates. We point out the different implications of leading choices of bandwidths in numerator and denominator for the ability of the estimate to integrate to one and to have finite moments. Again bearing in mind different bandwidth possibilities, we discuss asymptotic theory for the estimate: asymptotic bias and variance are calculated under various conditions, an extended discussion of bandwidth choice is included, and a central limit theorem is given.
TL;DR: This paper proposes an adaptive and simultaneous estimation procedure for all additive components in additive regression models and proposes a regularization algorithm which guarantees an adaptive solution to the multivariate estimation problem.
Abstract: It is well-known that multivariate curve estimation suffers from the “curse of dimensionality.” However, reasonable estimators are possible, even in several dimensions, under appropriate restrictions on the complexity of the curve. In the present paper we explore how much appropriate wavelet estimators can exploit a typical restriction on the curve such as additivity. We first propose an adaptive and simultaneous estimation procedure for all additive components in additive regression models and discuss rate of convergence results and data-dependent truncation rules for wavelet series estimators. To speed up computation we then introduce a wavelet version of functional ANOVA algorithm for additive regression models and propose a regularization algorithm which guarantees an adaptive solution to the multivariate estimation problem. Some simulations indicate that wavelets methods complement nicely the existing methodology for nonparametric multivariate curve estimation.
TL;DR: A new class of density estimators is introduced, termed look-ahead density estimator, for performance measures associated with a Markov chain, which empirically give visually superior results relative to more standard estimators, such as kerneldensity estimators.
Abstract: We introduce a new class of density estimators, termed look-ahead density estimators, for performance measures associated with a Markov chain. Look-ahead density estimators are given for both transient and steady-state quantities. Look-ahead density estimators converge faster (especially in multidimensional problems) and empirically give visually superior results relative to more standard estimators, such as kernel density estimators. Several numerical examples that demonstrate the potential applicability of look-ahead density estimation are given.
TL;DR: A novel computationally efficient adaptive algorithm to accomplish edge detection in multidimensional and color images is presented, based on local, non-parametric kernel density estimation.
Abstract: A novel computationally efficient adaptive algorithm to accomplish edge detection in multidimensional and color images is presented. It is a statistical approach, based on local, non-parametric kernel density estimation. The location of the edge discontinuity coincides with the image density function minimum and it is determined by appropriate resampling of the locally defined probability space. The operator is radially symmetric and can be easily adapted to cope with signals of any dimensionality.
TL;DR: The presentation of data by the histogram compared to the empirical distribution function is illustrated and the more efficient kernel density estimator is introduced and illustrated with real data.
Abstract: The probability density function is useful for summarizing and exploring a set of data. When the parametric form is unknown, then a nonparametric estimator such as the histogram is appropriate. This article illustrates the presentation of data by the histogram compared to the empirical distribution function. The steps required to construct a histogram are described, and data-based methods to accomplish this task are presented. Finally, the more efficient kernel density estimator is introduced and illustrated with real data.
TL;DR: In this paper, the authors derived optimal bandwidths for kernel density estimators of functions of observations proposed in Frees (J. Amer. Statist. Assoc. 89 (1994) 517-525).
TL;DR: Results are presented on synthetic and real-world data, which show that the QDE can improve the generalization of the kernel density estimator although its estimate is based on significantly lower-dimensional projection indices of the data.
Abstract: We suggest a nonparametric framework for unsupervised learning of projection models in terms of density estimation on quantized sample spaces. The objective is not to optimally reconstruct the data but instead the quantizer is chosen to optimally reconstruct the density of the data. For the resulting quantizing density estimator (QDE) we present a general method for parameter estimation and model selection. We show how projection sets which correspond to traditional unsupervised methods like vector quantization or PCA appear in the new framework. For a principal component quantizer we present results on synthetic and real-world data, which show that the QDE can improve the generalization of the kernel density estimator although its estimate is based on significantly lower-dimensional projection indices of the data.
TL;DR: This article showed that the cumulative distribution function corresponding to a kernel density estimator with optimal bandwidth lies outside any confidence interval, around the empirical distribution function, with probability tending to 1 as the sample size increases.
TL;DR: In this article, the authors assess the performance of several important univariate density estimation methods focusing on the robustness of the methods to heavy tailed target densities and conclude that the logspline and adaptive kernel methods are superior for fitting heavy-tailed densities.
Abstract: Motivated by finance applications, the objective of this paper is to assess the performance of several important methods for univariate density estimation focusing on the robustness of the methods to heavy tailed target densities. We consider four approaches: a fixed bandwidth kernel estimator, an adaptive bandwidth kernel estimator, the Hermite series (SNP) estimator of Gallant and Nychka, and the logspline estimator of Kooperberg and Stone. We conclude that the logspline and adaptive kernel methods are superior for fitting heavy tailed densities. Evaluation of the convergence rates of the SNP estimator for the family of Student-t densities reveals poor performance, measured by Hellinger error. In contrast, the logspline estimator exhibits good convergence independent of the tail behavior of the target density. These findings are confirmed in a small Monte-Carlo experiment.
TL;DR: In this paper, the integrated square error of the kernel estimator with a data-dependent bandwidth was derived and the central limit theorem was established for Jn under some regularity conditions.
TL;DR: In this article, the authors address the question of nonparametric estimation of the asymptotic variance of the kernel density estimator, an unknown quantity dependent on the marginal density of the continuous time process.
TL;DR: It is found that a weighted version of the log likelihood function has desirable robust properties in detecting the order of the process and can be traced from conditional density estimation.
Abstract: The study focuses on the selection of the order of a general time series process via the conditional density of the latter, a characteristic of which is that it remains constant for every order beyond the true one. Using simulated time series from various nonlinear models we illustrate how this feature can be traced from conditional density estimation. We study whether two statistics derived from the likelihood function can serve as univariate statistics to determine the order of the process. It is found that a weighted version of the log likelihood function has desirable robust properties in detecting the order of the process.
TL;DR: In this article, a relationship between plant density and the probability density function of the squared point-to-plant distance is found when a design-based approach is considered, and the estimation of the probabilitydensity function and consequently of plant density is performed using a boundary kernel estimator.
Abstract: A relationship between plant density and the probability density function of the squared point-to-plant distance is found when a design-based approach is considered. The estimation of the probability density function (and consequently of plant density) is performed using a boundary kernel estimator. Accordingly, by means of a simulation study, the performance of the proposed estimator is evaluated with respect to some existing density estimators assuming some patterns of plant populations. Finally, an example from field data is considered.
TL;DR: This paper considers an alternative that uses a local approach to bandwidth selection to not only reduce the bias, but to eliminate it entirely, and so-called zero-bias bandwidths are shown to exist for univariate and multivariate kernel density estimation as well as kernel regression.
Abstract: A great deal of research has focused on improving the bias properties of kernel estimators. One proposal involves removing the restriction of non-negativity on the kernel to construct “higher-order” kernels that eliminate additional terms in the Taylor's series expansion of the bias. This paper considers an alternative that uses a local approach to bandwidth selection to not only reduce the bias, but to eliminate it entirely. These so-called “zero-bias bandwidths” are shown to exist for univariate and multivariate kernel density estimation as well as kernel regression. Implications of the existence of such bandwidths are discussed. An estimation strategy is presented, and the extent of the reduction or elimination of bias in practice is studied through simulation and example.
TL;DR: In this paper, a nonparametric estimate for the logarithmic probability density derivatives is proposed, i.e., an estimate that is stable to observation and based piecewise-smooth approximation.
Abstract: In nonparametric signal estimation, there is a need for estimating the logarithmic probability density derivatives. This problem is complex, because the logarithmic density derivative is a function with singularity—a ratio containing density in the denominator. Since the density estimate can take values close or even equal to zero, the estimate of the logarithmic derivative becomes unstable. This difficulty is surmounted by constructing a new nonparametric estimate for the logarithmic derivative, i.e., an estimate that is stable to observation and based piecewise-smooth approximation. Its properties for dependent observations generated by stationary processes satisfying the strong mixing condition are studied. The rate of convergence of the nonparametric estimate and the principal part of the expansion of the mean-square estimate error are determined.
TL;DR: A new combined parametric and non-parametric approach applies both the bootstrap and the structural risk-minimization method to the estimation of heavy-tailed probability density functions (p.d.f.s) and their mixtures is proposed.
Abstract: The paper is devoted to the estimation of heavy-tailed probability density functions (p.d.f.s) and their mixtures. We propose a new combined parametric and non-parametric approach applying both the bootstrap and the structural risk-minimization method. It is illustrated using some relevant mixtures of heavy-tailed p.d.f.s. and its effectiveness is shown by an application to real data arising from Web-traffic characteristics.
TL;DR: A simulation study of the behavior of a particular kernel density estimator that is sharp in the sense of the minimax adaptive theory and applies it successfully to i.i.d. simulated data of different probability densities.
Abstract: We present here a simulation study of the behavior of a particular kernel density estimator. It was previously proven that this nonparametric estimator is sharp in the sense of the minimax adaptive theory, which means that it is equally well performing for very smooth or unsmooth densities. The method selects locally both the bandwidth and the kernel function according to the evaluated smoothness of the underlying density. In this paper we describe the method and apply it successfully to i.i.d. simulated data of different probability densities.
TL;DR: In this article, the authors further developed the theory of vertical density representation (VDR) in the multivariate case and provided a formula for the calculation of the conditional probability density of a random vector when its density value is given.
Abstract: In this paper we further develop the theory of vertical density representation (VDR) in the multivariate case and provide a formula for the calculation of the conditional probability density of a random vector when its density value is given. An application to random vector generation is also given.
TL;DR: In this article, the central limit theorem for the integrated square error of multivariate box-spline density estimators is proved for the case where the density estimator is a Gaussian distribution.
Abstract: We prove the central limit theorem for the integrated square error of multivariate box-spline density estimators.
TL;DR: The proposed kernel density estimation model belongs to a class of data-driven approach and avoids the form of probability distribution or dependence and is suitable for stochastic simulation of hydrology time series.
Abstract: In this paper the kernel density estimation model based on kernel density estimation theories is established for time series of single variable . It belongs to a class of data-driven approach and avoids the form of probability distribution (normal or pm) and the form of dependence (linear or nonlinear) . The model has clear concept and single structure. The model is applied to the stochastic generation of daily discharge time series at single station. The results indicate that the suggested model is suitable for stochastic simulation of hydrology time series.
TL;DR: In this article, a kernel estimate of the probability function from bivariate data when a component is subject to random left-truncation is constructed, using a strong approximation result.
Abstract: In this article we construct a kernel estimate of the probability function from bivariate data when a component is subject to random left-truncation. We establish consistency and asymptotic normality of the proposed estimator using a strong approximation result. Simulation studies show that the proposed procedure gives a good estimate of the true density function even when the sample size is moderate.
TL;DR: In this paper algorithms for bandwidth selection and kernel density estimation are proposed for non-negative random variables and are compared with some of the principal solutions in the literature through a simulation study.
Abstract: Kernel-based density estimation algorithms are inefficient in presence of discontinuities at support endpoints. This is substantially due to the fact that classic kernel density estimators lead to positive estimates beyond the endopoints. If a nonparametric estimate of a density functional is required in determining the bandwidth, then the problem also affects the bandwidth selection procedure. In this paper algorithms for bandwidth selection and kernel density estimation are proposed for non-negative random variables. Furthermore, the methods we propose are compared with some of the principal solutions in the literature through a simulation study.
TL;DR: In this paper, a multivariate linear functional relationship model, where the covariance matrix of the observational errors is not restricted, is considered, and the parameter estimation of this model is discussed.
Abstract: In this paper, a multivariate linear functional relationship model, where the covariance matrix of the observational errors is not restricted, is considered. The parameter estimation of this model is discussed. The estimators are shown to be a strongly consistent estimation under some mild conditions on the incidental parameters.
TL;DR: The rate of divergence depends on the 9th generalized Renyi dimension of u, and estimators of the dimension spectrum are devel-oped, and strong consistency is established.
Abstract: We consider relations between Renyi's and Hentschel-Procaccia's definitions of generalized dimensions of a probability measure u and give conditions under which the two concepts are equivalent/different. Estimators of the dimension spectrum are devel-oped, and strong consistency is established. Particular cases of our estimators are methods based on the sample correlation integral and box counting.Then we discuss the relation between generalized dimensions and kernel density estimatorsf. It was shown in Frigyesi and Hossjer (1998), that ∫ƒ1+qdx diverges with increasing sample size and decreasing bandwidth if the marginal distribution u has a singular part and q > 0. In this paper, we show that the rate of divergence depends on the 9th generalized Renyi dimension of u.
TL;DR: In this paper, an asympotic mean absolute error expression for nonparametric kernel density estimators from right-censored data is developed, which is used to obtain local and global bandwidths that are optimal in the sense that they minimize asymptotic MEA and integrated SEA, respectively.
Abstract: In Kernel density estimation, a criticism of bandwidth selection techniques which minimize squared error expressions is that they perform poorly when estimating tails of probability density functions. Techniques minimizing absolute error expressions are thought to result in more uniform performance and be potentially superior. An asympotic mean absolute error expression for nonparametric kernel density estimators from right-censored data is developed here. This expression is used to obtain local and global bandwidths that are optimal in the sense that they minimize asymptotic mean absolute error and integrated asymptotic mean absolute error, respectively. These estimators are illustrated fro eight data sets from known distributions. Computer simulation results are discussed, comparing the estimation methods with squared-error-based bandwidth selection for right-censored data.