TL;DR: This work proposes a solution to the problem of whether or not observed features, such as bumps, are “really there” as opposed to being artifacts of the natural sampling variability using the graphical technique of significance in scale space.
Abstract: An important problem in the use of density estimation for data analysis is whether or not observed features, such as bumps, are “really there” as opposed to being artifacts of the natural sampling variability Here we propose a solution to this problem, in the challenging two-dimensional case, using the graphical technique of significance in scale space Color and dynamic graphics form an important part of the visualization method
TL;DR: Two new techniques based on nonparametric estimation of probability densities are introduced which improve on the performance of equivalent robust methods currently employed in computer vision.
Abstract: Two new techniques based on nonparametric estimation of probability densities are introduced which improve on the performance of equivalent robust methods currently employed in computer vision. The first technique draws from the projection pursuit paradigm in statistics, and carries out regression M-estimation with a weak dependence on the accuracy of the scale estimate. The second technique exploits the properties of the multivariate adaptive mean shift, and accomplishes the fusion of uncertain measurements arising from an unknown number of sources. As an example, the two techniques are extensively used in an algorithm for the recovery of multiple structures from heavily corrupted data.
TL;DR: A new method of kernel density estimation with a varying adaptive window size based on the so-called intersection of confidence intervals (ICI) rule is proposed, based on which the quality of the adaptive density estimate is assessed by means of numerical simulations.
TL;DR: In this paper, the authors show that the minimax risk for estimating the value of the spectral density at frequency zero is infinite regardless of sample size, and that confidence sets are close to being uninformative.
Abstract: Important estimation problems in econometrics like estimating the value of a spectral density at frequency zero, which appears in the econometrics literature in the guises of heteroskedasticity and autocorrelation consistent variance estimation and long run variance estimation, are shown to be "ill-posed" estimation problems. A prototypical result obtained in the paper is that the minimax risk for estimating the value of the spectral density at frequency zero is infinite regardless of sample size, and that confidence sets are close to being uninformative. In this result the maximum risk is over commonly used specifications for the set of feasible data generating processes. The consequences for inference on unit roots and cointegration are discussed. Similar results for persistence estimation and estimation of the long memory parameter are given. All these results are obtained as special cases of a more general theory developed for abstract estimation problems, which readily also allows for the treatment of other ill-posed estimation problems such as, e.g., nonparametric regression or density estimation.
TL;DR: It is proved that the L(infinity) convergence to the true density for both the density estimation and random variate generation techniques occurs at a rate O((log log N/N)((1-epsilon)/2)) where N is the number of data points and epsilon can be made arbitrarily small for sufficiently smooth target densities, very close to the optimally achievable convergence rate under similar smoothness conditions.
Abstract: In this paper we consider two important topics: density estimation and random variate generation. We present a framework that is easily implemented using the familiar multilayer neural network. First, we develop two new methods for density estimation, a stochastic method and a related deterministic method. Both methods are based on approximating the distribution function, the density being obtained by differentiation. In the second part of the paper, we develop new random number generation methods. Our methods do not suffer from some of the restrictions of existing methods in that they can be used to generate numbers from any density provided that certain smoothness conditions are satisfied. One of our methods is based on an observed inverse relationship between the density estimation process and random number generation. We present two variants of this method, a stochastic, and a deterministic version. We propose a second method that is based on a novel control formulation of the problem, where a "controller network" is trained to shape a given density into the desired density. We justify the use of all the methods that we propose by providing theoretical convergence results. In particular, we prove that the L/sub /spl infin// convergence to the true density for both the density estimation and random variate generation techniques occurs at a rate O((log log N/N)/sup (1-/spl epsi/)/2/) where N is the number of data points and /spl epsi/ can be made arbitrarily small for sufficiently smooth target densities. This bound is very close to the optimally achievable convergence rate under similar smoothness conditions. Also, for comparison, the L/sub 2/ root mean square (rms) convergence rate of a positive kernel density estimator is O(N/sup -2/5/) when the optimal kernel width is used. We present numerical simulations to illustrate the performance of the proposed density estimation and random variate generation methods. In addition, we present an extended introduction and bibliography that serves as an overview and reference for the practitioner.
TL;DR: In this paper, the authors derived rates of uniform strong convergence for kernel density estimators and hazard rate estimators in the presence of right censoring, assuming that the failure times (survival times) form a stationary α-mixing sequence.
Abstract: We derive rates of uniform strong convergence for kernel density estimators and hazard rate estimators in the presence of right censoring. It is assumed that the failure times (survival times) form a stationary α-mixing sequence. Moreover, we show that, by an appropriate choice of the bandwidth, both estimators attain the optimal strong convergence rate known from independent complete samples. The results represent an improvement over that of Cai's paper (cf. Cai (1998b, J. Multivariate Anal., 67, 23–34)).
TL;DR: In this paper, the kernel estimation of the multivariate density of a random field indexed by Z N is investigated and the loss between the estimator and the unknown density is measured by means of mean squared and mean integrated squared errors.
TL;DR: The α → PDE method as mentioned in this paper is based on a direct construction of joint probability densities of known variables and the parameters to be estimated, and the posterior densities and best-value estimates are then obtained for the parameters of interest by a straightforward manipulation of these densities.
TL;DR: It is shown that kernel estimation with multi-tones are very accurate and efficient compared to the latter, and results for the Volterra kernel estimations with random multi-tone inputs and random Gaussian input are compared.
Abstract: We consider the problem of frequency domain kernel estimation using random multi-tone (harmonic) excitation for 2nd-order Volterra models. The basic approach is based on least squares minimization of model output error, and results for the Volterra kernel estimations with random multi-tone inputs and random Gaussian input are compared. We show that kernel estimation with multi-tones are very accurate and efficient compared to the latter. As an illustration, the proposed method is applied to a discrete input–output system obtained from the numerical simulation of a representative hydrodynamic system for modeling semiconductor device transport. We also consider the effect of noise in the kernel estimation.
TL;DR: A new method of discriminating between observations using a set of mixed variables, using Bozdogan’s information-theoretic measure complexity ICOMP to select both the window width of the kernel density estimator as well as the dimension of the object scores matrix is proposed.
Abstract: Linear discriminant analysis is a well known procedure for discrimination where the linear predictors define one set of variables and a set of dummy variables representing class membership which defines the other set. Here we propose a new method of discriminating between observations using a set of mixed (i.e., categorical and/or continuous) variables. This nonparametric discriminant procedure optimally scales the data and estimates the distribution of the object scores using multivariate kernel density estimation. We propose using Bozdogan’s information-theoretic measure complexity ICOMP to select both the window width of the kernel density estimator as well as the dimension of the object scores matrix.
TL;DR: In this paper, the kernel estimation of probability density functions is considered when ranked-set samples are available, and the properties of the resulting estimators are derived for small and large samples, while performance with respect to the usual simple random sample estimators is investigated.
Abstract: Kernel estimation of probability density functions is considered when ranked-set samples are available. The properties of the resulting estimators are derived for small and large samples, while performance with respect to the usual simple random sample estimators is investigated for a range of probability density models.
TL;DR: This paper proposed a modification of kernel time series regression estimators that improves efficiency when the innovation process is autocorrelated, based on a pre-whitening transformation of the dependent variable that has to be estimated from the data.
Abstract: We propose a modification of kernel time series regression estimators that improves efficiency when the innovation process is autocorrelated. The procedure is based on a pre-whitening transformation of the dependent variable that has to be estimated from the data. We establish the asymptotic distribution of our estimator under weak dependence conditions. It is shown that the proposed estimation procedure is more efficient than the conventional kernel method. We also provide simulation evidence to suggest that gains can be achieved in moderate sized samples.
TL;DR: By considering a multivariate version of the Hille's theorem, the technique developed by Chaubey and Sen (Statist. Decisions 14 (1996) 1) is extended for estimating a multiivariate survival distribution and its associated density.
TL;DR: This paper compares nonparametric kernel estimates with smoothed histograms as methods for displaying logarithmically transformed dwell-time distributions with the advantage of being smoothed in a well-specified, carefully controlled manner.
TL;DR: Experimental results demonstrate that multiscale methods can outperform wavelet and kernel based density estimation methods.
Abstract: This paper introduces a new multiscale method for nonparametric piecewise polynomial intensity and density estimation of point processes. Fast, piecewise polynomial, maximum penalized likelihood methods for intensity and density estimation are developed. The recursive partitioning scheme underlying these methods is based on multiscale likelihood factorizations which, unlike conventional wavelet decompositions, are very well suited to applications with point process data. Experimental results demonstrate that multiscale methods can outperform wavelet and kernel based density estimation methods.
TL;DR: In this paper, it was shown that identical data transformations can be used in each case, regardless of whether the data involve censoring. This dramatically simplifies the application of data sharpening to problems involving hazard rate estimation.
Abstract: Data sharpening is a general tool for enhancing the performance of statistical estimators, by altering the data before substituting them into conventional methods. In one of the simplest forms of data sharpening, available for curve estimation, an explicit empirical transformation is used to alter the data. The attraction of this approach is diminished, however, if the formula has to be altered for each different application. For example, one could expect the formula for use in hazard rate estimation to differ from that for straight density estimation, since a hazard rate is a ratio–type functional of a density. This paper shows that, in fact, identical data transformations can be used in each case, regardless of whether the data involve censoring. This dramatically simplifies the application of data sharpening to problems involving hazard rate estimation, and makes data sharpening attractive.
TL;DR: The theory and the applications of non-parametric density and regression estimation problems with emphases in kernel, nearest neighbor, variable kernel, orthogonal series, smoothing splines, logsplines and H-splines methods are described.
Abstract: Various features of econometric data can be analyzed by non-parametric approach. This review summarizes some of the most important procedures in curve estimation that has been very useful in the field of econometrics. Specifically, it describes the theory and the applications of non-parametric density and regression estimation problems with emphases in kernel, nearest neighbor, variable kernel, orthogonal series, smoothing splines, logsplines and H-splines methods.
TL;DR: It is proved that the convergence to the true density for both the density estimation and random variate generation techniques at a rate where is the number of data points and can be made arbitrarily small for sufficiently smooth target densities, which is very close to the optimally achievable convergence rate under similar smoothness conditions.
Abstract: In this paper we consider two important topics: den- sity estimation and random variate generation. We will present a framework that is easily implemented using the familiar multi- layer neural network. First, we develop two new methods for den- sity estimation, a stochastic method and a related deterministic method. Both methods are based on approximating the distribu- tion function, the density being obtained by differentiation. In the second part of the paper, we develop new random number genera- tion methods. Our methods do not suffer from some of the restric- tions of existing methods in that they can be used to generate num- bers from any density provided that certain smoothness conditions are satisfied. One of our methods is based on an observed inverse relationship between the density estimation process and random number generation. We present two variants of this method, a sto- chastic, and a deterministic version. We propose a second method that is based on a novel control formulation of the problem, where a "controller network" is trained to shape a given density into the desired density. We justify the use of all the methods that we pro- pose by providing theoretical convergence results. In particular, we prove that the convergence to the true density for both the density estimation and random variate generation techniques oc- curs at a rate where is the number of data points and can be made arbitrarily small for sufficiently smooth target densities. This bound is very close to the optimally achievable convergence rate under similar smoothness conditions. Also, for comparison, the root mean square (rms) convergence rate of a positive kernel density estimator is when the optimal kernel width is used. We present numerical simulations to illustrate the performance of the proposed density estimation and random variate generation methods. In addition, we present an ex- tended introduction and bibliography that serves as an overview and reference for the practitioner.
TL;DR: In this paper, the results of using two very different density estimation methods, based on kernel smoothing or on the monotone ML method, are compared and shown to be asymptotically equivalent.
TL;DR: An alternative approach that does not make assumption on the shape, size and volumes of the clusters is presented, based on the estimation of the probability density function (pdf).
Abstract: Many techniques have already been suggested for handling and analyzing the large and high-dimensional data sets produced by newly developed gene expression experiments. These techniques include supervised classification and unsupervised agglomerative or hierarchical clustering techniques. Here, we present an alternative approach that does not make assumption on the shape, size and volumes of the clusters. The technique is based on the estimation of the probability density function (pdf). Once the pdf is estimated, with the Parzen technique (with the right amount of smoothing), the parameter space is partitioned according to methods inherited from image processing, namely the skeleton by influence zones and the watershed. We show some advantages of this suggested approach.
TL;DR: This paper presents relative stability properties of various nonparametric density estimators (histogram, kernel estimates) and of regression estimator (partitioning, kernel, and nearest neighbor estimates).
Abstract: This paper presents relative stability properties of various nonparametric density estimators (histogram, kernel estimates) and of regression estimators (partitioning, kernel, and nearest neighbor estimates). In density estimation, let En denote the L/sub 1/ error of an estimate calculated from n data, whereas in regression estimation, the L/sub 2/ error of the estimate is used. Sufficient conditions for E/sub n//E{E/sub n/}/spl rarr/1 in probability are provided. If this limit holds, the asymptotic behavior of the random error E/sub n/ can be characterized by its expectation E{E/sub n/},, and one may apply, for example, the established rate-of-convergence results for E{En}.
TL;DR: In this paper, the performance of parametric and nonparametric discriminant analysis approaches was compared with the classical one by using real data and computer time as the measure of performance.
Abstract: This article compares the performance of parametric and
nonparametric discrimination. After a brief description of the
discriminant analysis problem the parametric and nonparametric
approaches are described. The multivariate product Gaussian and
polynomial kernels with various datadriven choices of the
bandwidth are used for density estimators and this
nonparametric approaches are compared with classical one by
some real data. Overall percentages of the misclassification
and computer time are used as the measure of performance.
TL;DR: In this article, the authors investigate problems in the method for teaching probability density function as gradual meaning of histogram and introduce the density curve concept as alternative approach to mitigate student's difficulty in studying probability density functions.
Abstract: The context and intuitive understanding is very important in Statistics Education. Especially, there is a need to mitigate student's difficulty in studying probability density function. One of teaching method this concept is to using relative frequency histogram. But, as using this method, we should know several problems included in that. This study investigate problems in the method for teaching probability density function as gradual meaning of histogram. Also, as alternative approach, this thesis introduce the density curve concept. The application of four methods to teach the concept of the probability density function and analysis of the survey result is done in this research.
TL;DR: In this paper, the idea of estimating population curves, like the density or the regression function, is studied from a nonparametric viewpoint, and some ideas about the important problem of smoothing parameter selection are also presented.
Abstract: Some ideas about how basic aspects of nonparametric curve estimation can be introduced to students at a post secondary level will be discussed here. The idea of estimating population curves, like the density or the regression function, is studied from a nonparametric viewpoint. Starting from well-known estimators as the histogram or the regressogram, the discussion will then go to some of the smoothing methods developed in the last four decades, mainly focusing on the kernel density and regression estimators. Some ideas about the important problem of smoothing parameter selection will also be presented.
TL;DR: In this article, a developed digital algorithm is used to obtain analytical expressions characterizing the error due to the bias and variance in a proposed estimate of a probability distribution density, and the results of experimental investigations demonstrate the efficiency of the proposed algorithm.
Abstract: A developed digital algorithm is used to obtain analytical expressions characterizing the error due to the bias and variance in a proposed estimate of a probability distribution density. The results of experimental investigations are presented which demonstrate the efficiency of the proposed algorithm.
TL;DR: In this article, two improved methods for conditional density estimation were proposed based on locally fitting a log-linear model, and is in the spirit of recent work on locally parametric techniques in estimation.
Abstract: We suggest two improved methods for conditional density estimation. The first is based on locally fitting a log-linear model, and is in the spirit of recent work on locally parametric techniques in...
TL;DR: In this paper, the bandwidth is assigned a prior distribution in the neighborhood around the point at which the density is being estimated, and the mean of the posterior distribution is used to select the local bandwidth.
Abstract: In data driven bandwidth selection procedures for density estimation such as least squares cross validation and biased cross validation, the choice of a single global bandwidth is too restrictive. It is however reasonable to assume that the bandwidth has a distribution of its own and that locally, depending on the data, the bandwidth may differ. In this approach, the bandwidth is assigned a prior distribution in the neighborhood around the point at which the density is being estimated. Assuming that the kernel function is a proper probability distribution, a Bayesian approach is employed to come up with a posterior type distribution of the bandwidth given the data. Finally, the mean of the posterior distribution is used to select the local bandwidth.