TL;DR: This study will derive the bandwidth using the least square cross validation and the contrast methods and compare between the two methods using Monte Carlo simulation and using an example from the real life.
TL;DR: In this paper, a semiparametric density estimator is proposed under a two-sample density ratio model, which is an extension of the kernel density estimate suggested by Jones for length-biased data.
Abstract: A semiparametric density estimation is proposed under a two-sample density ratio model. This model, arising naturally from case-control studies and logistic discriminant analyses, can also be regarded as a biased sampling model. Our proposed density estimate is therefore an extension of the kernel density estimate suggested by Jones for length-biased data. We show that under the model considered the new density estimator not only is consistent but also has the `smallest' asymptotic variance among general nonparametric density estimators. We also show how to use the new estimate to define a procedure for testing the goodness of fit of the density ratio model. Such a test is consistent under very general alternatives. Finally, we present some results from simulations and from the analysis of two real data sets.
TL;DR: A novel approach that synthesizes the T2 and Q statistics for statistical process condition monitoring is introduced that can be more sensitive to detect abnormal process behaviour than the individual statistics and reduces the number of monitoring charts to be observed.
TL;DR: In this article, the density of a sum of independent random variables can be estimated by the convolution of kernel estimators for the marginal densities, and the resulting estimator is n 1/2-consistent and converges in distribution in the spaces C 0(ℝ) and L 1 to a centered Gaussian process.
Abstract: The density of a sum of independent random variables can be estimated by the convolution of kernel estimators for the marginal densities. We show under mild conditions that the resulting estimator is n 1/2-consistent and converges in distribution in the spaces C 0(ℝ) and L 1 to a centered Gaussian process. Email: anton@math.binghamton.edu
TL;DR: An efficient construction algorithm for obtaining sparse kernel density estimates based on a regression approach that directly optimizes model generalization capability and is fully automatic and the user is not required to specify any criterion to terminate the density construction procedure.
Abstract: This paper presents an efficient construction algorithm for obtaining sparse kernel density estimates based on a regression approach that directly optimizes model generalization capability. Computational efficiency of the density construction is ensured using an orthogonal forward regression, and the algorithm incrementally minimizes the leave-one-out test score. A local regularization method is incorporated naturally into the density construction process to further enforce sparsity. An additional advantage of the proposed algorithm is that it is fully automatic and the user is not required to specify any criterion to terminate the density construction procedure. This is in contrast to an existing state-of-art kernel density estimation method using the support vector machine (SVM), where the user is required to specify some critical algorithm parameter. Several examples are included to demonstrate the ability of the proposed algorithm to effectively construct a very sparse kernel density estimate with comparable accuracy to that of the full sample optimized Parzen window density estimate. Our experimental results also demonstrate that the proposed algorithm compares favorably with the SVM method, in terms of both test accuracy and sparsity, for constructing kernel density estimates.
TL;DR: In this article, an orthogonal forward regression (OFR) algorithm is proposed to obtain sparse kernel density estimates based on a regression approach that directly optimizes model generalization capability, and the algorithm incrementally minimizes the leave-one-out test score.
Abstract: This paper presents an efficient construction algo- rithm for obtaining sparse kernel density estimates based on a regression approach that directly optimizes model generalization capability. Computational efficiency of the density construction is ensured using an orthogonal forward regression, and the algorithm incrementally minimizes the leave-one-out test score. A local regularization method is incorporated naturally into the density construction process to further enforce sparsity. An additional advantage of the proposed algorithm is that it is fully automatic and the user is not required to specify any criterion to terminate the density construction procedure. This is in contrast to an existing state-of-art kernel density estimation method using the support vector machine (SVM), where the user is required to specify some critical algorithm parameter. Several examples are included to demonstrate the ability of the proposed algorithm to effectively construct a very sparse kernel density estimate with comparable accuracy to that of the full sample optimized Parzen window density estimate. Our experimental results also demonstrate that the proposed algorithm compares favorably with the SVM method, in terms of both test accuracy and sparsity, for constructing kernel density estimates.
TL;DR: The kernel density estimation (KDE) method is chosen as the non-parametric algorithm to extract the PDF and confidence intervals of the training data sets and three case studies show that it is a pragmatic method for dealing with real industrial process data.
TL;DR: In this paper, the problem of estimating the kernel error density function in nonparametric regression models is considered and sufficient conditions are given under which the kernel estimation based on non-parametric residuals is uniformly weakly and strongly consistent.
TL;DR: This chapter has shown that the histogram is more than just a convenient tool for giving a graphical representation of an empirical frequency distribution and that the method of kernel density estimation is in many respects preferable to the histograms.
Abstract: Contrary to the treatment of the histogram in statistics textbooks we have shown that the histogram is more than just a convenient tool for giving a graphical representation of an empirical frequency distribution. It is a serious and widely used method for estimating an unknown pdf. Yet, the histogram has some shortcomings and hopefully this chapter will persuade you that the method of kernel density estimation is in many respects preferable to the histogram.
TL;DR: In this paper, a Markov chain Monte Carlo (MCMCMC) algorithm is proposed for estimating optimal bandwidth matrices for multivariate kernel density estimation, which is based on treating the elements of the bandwidth matrix as parameters whose posterior density can be obtained through the likelihood cross-validation criterion.
Abstract: Kernel density estimation for multivariate data is an important technique that has a wide range of applications in econometrics and finance. However, it has received significantly less attention than its univariate counterpart. The lower level of interest in multivariate kernel density estimation is mainly due to the increased difficulty in deriving an optimal data-driven bandwidth as the dimension of data increases. We provide Markov chain Monte Carlo (MCMC) algorithms for estimating optimal bandwidth matrices for multivariate kernel density estimation. Our approach is based on treating the elements of the bandwidth matrix as parameters whose posterior density can be obtained through the likelihood cross-validation criterion. Numerical studies for bivariate data show that the MCMC algorithm generally performs better than the plug-in algorithm under the Kullback-Leibler information criterion, and is as good as the plug-in algorithm under the mean integrated squared errors (MISE) criterion. Numerical studies for 5 dimensional data show that our algorithm is superior to the normal reference rule. Our MCMC algorithm is the first data-driven bandwidth selector for kernel density estimation with more than two variables, and the sampling algorithm involves no increased difficulty as the dimension of data increase
TL;DR: In this paper, the authors extended the univariate framework for theoretically analysing kernel density estimators to a general multivariate version and proposed a non-parametric clustering approach to summarize the local properties of the data.
Abstract: Kernel density estimation is an important data smoothing technique. It has been applied most successfully for univariate data whilst for multivariate data its development and implementation have been relatively limited. The performance of kernel density estimators depends crucially on the bandwidth selection. Bandwidth selection in the univariate case involves selecting a scalar parameter which controls the amount of smoothing. In the multivariate case, the bandwidth matrix controls both the degree and direction of smoothing so its selection is more difficult. So far most of the research effort has been expended on automatic, data-driven selectors for univariate data. There is, on the other hand, a relative paucity of multivariate counterparts. Most of these multivariate bandwidth selectors are focused on the restricted case of diagonal matrices. In this thesis practical algorithms are constructed, with supporting theoretical justifications, for unconstrained bandwidth matrices. The two main classes of univariate bandwidth selectors are plug-in and cross validation. These unidimensional selectors are generalised to the multidimensional case. The univariate framework for theoretically analysing kernel density estimators is extended to a general multivariate version. This framework has at its core the quantification of the relative rates of convergence which provide a guide to the asymptotic behaviour of bandwidth selectors. Simulation studies and real data analysis are employed to illustrate their finite sample behaviour. It is found that unconstrained selectors possess good asymptotic and finite sample properties in a wide range of situations. Buoyed by this success, two extensions are embarked upon. The first is variable bandwidth selection, generalising the above case where the bandwidth is fixed throughout the sample space. The variation of the bandwidths is controlled by the local properties of the data. The novel contribution is to use non-parametric clustering to summarise these local properties, along with unconstrained bandwidth matrices. The second is in kernel discriminant analysis where unconstrained bandwidth matrices are shown to produce more accurate discrimination.
TL;DR: A new multivariate density estimator suitable for pattern classifier design is proposed that captures the non-Gaussian structure of the data while parametric Gaussian conditional density estimation is applied to the rest of the components.
TL;DR: This paper derives an iterative sampling-expectation (SE) algorithm for estimating the color, distribution and segmentation of the figure-ground discrimination by employing non-parametric kernel estimation for color distributions of both the figure and background.
Abstract: Figure-ground discrimination is an important problem in computer vision. Previous work usually assumes that the color distribution of the figure can be described by a low dimensional parametric model such as a mixture of Gaussians. However, such approach has difficulty selecting the number of mixture components and is sensitive to the initialization of the model parameters. In this paper, we employ non-parametric kernel estimation for color distributions of both the figure and background. We derive an iterative sampling-expectation (SE) algorithm for estimating the color, distribution and segmentation. There are several advantages of kernel-density estimation. First, it enables automatic selection of weights of different cues based on the bandwidth calculation from the image itself. Second, it does not require model parameter initialization and estimation. The experimental results on images of cluttered scenes demonstrate the effectiveness of the proposed algorithm.
TL;DR: This relation provides theoretical background for the behaviour of One-Class SVM when the Gaussian kernel is used, the only case for which successful results are shown in the literature.
Abstract: One-Class Support Vector Machines (SVM) afford the problem of estimating high density regions from univariate or multivariate data samples. To be more precise, sets whose probability is specified in advance are estimated. In this paper the exact relation between One-Class SVM and density estimation is demonstrated. This relation provides theoretical background for the behaviour of One-Class SVM when the Gaussian kernel is used, the only case for which successful results are shown in the literature.
TL;DR: In this article, the authors consider the problem of estimating conditional probability distributions that are multivariate in both the conditioned and conditioning variable sets, and they use the kernel method with the smoothing parameters selected from the cross-validated minimization of a weighted integrated squared error of the kernel estimator.
Abstract: We consider the problem of estimating conditional probability distributions that are multivariate in both the conditioned and conditioning variable sets. This is an extension of Hall, Racine, and Li (forthcoming), who considered the case of a univariate conditioned variable but who also considered the more general case of both irrelevant and relevant conditioning variables. Following Hall et al. (forthcoming), we use the kernel method with the smoothing parameters selected from the cross-validated minimization of a weighted integrated squared error of the kernel estimator. We derive the rate of convergence of the smoothing parameters to some non-stochastic optimal smoothing parameter values, and establish the asymptotic normal distribution of the resulting nonparametric conditional probability (density) estimator. Simulations show that the proposed method performs quite well with a mixture of categorical and continuous variables.
TL;DR: Using a parametric approximation of the true cumulative distribution function (CDF), the transformation-retransformation of the data is explored here as a useful tool for the reliable PDF prediction.
Abstract: Common non-parametric estimators of a probability density function (PDF) show bad performance for heavy-tailed PDFs. Using a parametric approximation of the true cumulative distribution function (CDF), the transformation-retransformation of the data is explored here as a useful tool for the reliable PDF prediction. The PDF estimators are compared by their capacity to solve a classification problem. Simulation results and an application to Web data analysis are presented, too.
TL;DR: This paper presents an optimal density estimation scheme that combines the desirable properties of Parzen windowing and Gaussianization, using minimum Kullback-Leibler divergence as the optimality criterion for selecting the kernel size in the Parzenwindowing step.
Abstract: Multivariate density estimation is an important problem that is frequently encountered in statistical learning and signal processing One of the most popular techniques is Parzen windowing, also referred to as kernel density estimation Gaussianization is a procedure that allows one to estimate multivariate densities efficiently from the marginal densities of the individual random variables In this paper, we present an optimal density estimation scheme that combines the desirable properties of Parzen windowing and Gaussianization, using minimum Kullback-Leibler divergence as the optimality criterion for selecting the kernel size in the Parzen windowing step The performance of the estimate is illustrated in a classifier design example
TL;DR: Asymptotic properties of the local likelihood density estimators for stationary random fields in the usual smoothing context of the bandwidth, h, tending to zero as the sample size tends to infinity are detailed.
TL;DR: In this paper, a wavelet based linear density estimator for the estimation of the probability density function for a sequence of associated random variables with a common onedimensional probability density functions was developed.
Abstract: We develop a wavelet based linear density estimator for the estimation of the probability density function for a sequence of associated random variables with a common onedimensional probability density function and obtain bounds on L p -losses for such estimators.
TL;DR: Ahmad et al. as discussed by the authors proposed a semiparametric kernel density estimation method, which is a data-based method of chosing smoothing parameters in nonparametric density estimation.
Abstract: In contrast to the traditional kernel density estimate which is totally nonparametric, if one has a reasonable parametric guess about the density, it can be used to improve upon the traditional method [Hjort, N. L. and Glad, I. K. (1995). Nonparametric density estimation with a parametric start. Ann. Statist., 23 882–904.]. This semiparametric approach should work in a broad nonparametric neighborhood of a given parametric family. The idea is to multiply the initial parametric guess by a kernel estimate of the correction factor. Since the resulting estimate is clearly not a density, it is corrected by dividing it by its total mass. This correction was missed in the above-mentioned work of Hjort and Glad. This mass corrected version performs better than the uncorrected estimate in the sense of the bias and mean square error. Using the concept of ‘kernel contrast’ [Ahmad, I. A. and Ran, I. S. (1998). Kernel contrasts: a data based method of chosing smoothing parameters in nonparametric density estimation. U...
TL;DR: In this paper, the problem of selecting the bandwidths in kernel estimation of f is investigated and the optimal root n relative convergence rate for bandwidth selection is established and the information bounds in this convergence are given, and a stabilized bandwidth selector (SBS) is proposed.
Abstract: Based on a random sample of size n from an unknown d-dimensional density f, the problem of selecting the bandwidths in kernel estimation of f is investigated The optimal root n relative convergence rate for bandwidth selection is established and the information bounds in this convergence are given, and a stabilized bandwidth selector (SBS) is proposed It is known that for all d the bandwidths selected by the least squares cross-validation (LSCV) have large sample variations The proposed SBS, as an improvement of LSCV, will reduce the variation of LSCV without significantly inflating its bias The key idea of the SBS is to modify the d-dimensional sample characteristic function beyond some cut-off frequency in estimating the integrated squared bias It is shown that for all d and sufficiently smooth f and kernel, if the bandwidth in each coordinate direction varies freely, then the multivariate SBS is asymptotically normal with the optimal root n relative convergence rate and achieves the (conjectured) ‘‘lower bound’’ on the covariance matrix
TL;DR: An object tracking algorithm using a novel simple symmetric similarity function between spatially-smoothed kernel-density estimates of the model and target distributions is proposed and tested and shown to achieve robust and reliable real-time tracking.
Abstract: An object tracking algorithm using a novel simple symmetric similarity function between spatially-smoothed kernel-density estimates of the model and target distributions is proposed and tested. The similarity measure is based on the expectation of the density estimates over the model or target images. The density is estimated using radial-basis kernel functions that measure the affinity between points and provide a better outlier rejection property. The mean-shift algorithm is used to track objects by iteratively maximizing this similarity function. To alleviate the quadratic complexity of the density estimation, we employ Gaussian kernels and the fast Gauss transform to reduce the computations to linear order. This leads to a very efficient and robust nonparametric tracking algorithm. The proposed algorithm is tested with several image sequences and shown to achieve robust and reliable real-time tracking. Several sequences are placed at http://www.cs.umd.edu/users/yangcj/node3.html.
TL;DR: By introducing the concept of kernel contrasts as an error criterion, with a global norm, which is taken here to be the L2 norm that is usually used in nonparametric density estimation, it is possible to provide a completely data-based choice of the bandwidth, which are asymptotically equivalent to the optimal theoretical choice as mentioned in this paper.
Abstract: By introducing the concept of ‘kernel contrasts’ as an error criterion, with a global norm, which is taken here to be the L2 norm that is usually used in nonparametric density estimation, it is possible to provide a completely data-based choice of the bandwidth, which is asymptotically equivalent to the optimal theoretical choice. The density estimate based on this data-based choice of the bandwidth has desirable properties. Monte Carlo studies and studies of real data sets show how much better this new method is over usual other methods such as unbiased cross-validation method. The technique is also extendible in a direct fashion to multivariate setting.
TL;DR: In this article, the authors consider a kernel estimator of a density in a convolution model and give a central limit theorem for its integrated square error (ISE) for any sequence of bandwidths decreasing to 0.
Abstract: In this paper we consider a kernel estimator of a density in a convolution model and give a central limit theorem for its integrated square error (ISE). The kernel estimator is rather classical in minimax theory when the underlying density is recovered from noisy observations. The kernel is fixed and depends heavily on the distribution of the noise, supposed entirely known. The bandwidth is not fixed, the results hold for any sequence of bandwidths decreasing to 0. In particular the central limit theorem holds for the bandwidth minimizing the mean integrated square error (MISE). Rates of convergence are sensibly different in the case of regular noise and of super-regular noise. The smoothness of the underlying unknown density is relevant for the evaluation of the MISE.
TL;DR: The density estimation method proposed in this paper employs piecewise polynomial fits on adaptive dyadic partitions, which allows the data to adaptively determine the smoothness of the underlying basis functions.
Abstract: The density estimation method proposed in this paper employs piecewise polynomial fits on adaptive dyadic partitions. The proposed estimator enjoys the minimax adaptivity associated with wavelet-based density estimators as well as the following additional advantages: estimates are guaranteed to be nonnegative, theoretical bounds provide an indication of performance even for small sample sizes, and the method can be extended to free-degree piecewise polynomial estimation, which allows the data to adaptively determine the smoothness of the underlying basis functions
TL;DR: In this paper, the authors proposed a class of adjusted Pelletier density estimators on homogeneous spaces, which converge uniformly and almost surely at the same rate as naive kernel density estimation on Euclidean spaces.
Abstract: The landmark data reduction approach in high level image analysis has led to significant progress to scene recognition via statistical shape analysis (Dryden and Mardia, 1998). While a number of families of similarity shape densities have proven useful in data analysis, only a few parametric models have been considered only recently in the context of projective shape ( Mardia and Patrangenaru, 2004 ), or affine shape. Shape spaces of interest have the geometric structure of symmetric spaces: planar similarity shape spaces are complex projective spaces ( Kendall, 1984 ), affine shape spaces are real Grassmann manifolds ( Sparr, 1992), and spaces of planar projective shapes of configurations of points in general position are products of real projective spaces ( Mardia and Patrangenaru, 2004 ). Therefore, data driven density estimation of shapes, regarded as points on symmetric spaces and arising from digitizing landmarks in images, is necessary. Recently, Pelletier (2004) considered kernel density estimation on “general” Riemannian manifolds; his results however hold only in homogeneous spaces. This is sufficient for image analysis, since any symmetric space is homogeneous. Pelletier estimators generalize the density estimators on certain homogeneous spaces introduced by Ruymgaart (1989), by H. Hendriks, J. H. M. Janssen and Ruymgaart (1993), and by Lee and Ruymgaart (1998). In this paper, we propose a class of adjusted Pelletier density estimators, on homogeneous spaces, that converge uniformly and almost surely at the same rate as naive kernel density estimators on Euclidean spaces. A concrete example of projective shape density estimation of 6-ads arising from digitized images of the “actor” data set in Wayne et.al. (2001).
TL;DR: The likelihood for patterns of continuous attributes for the naive Bayesian classifier (NBC) may be approximated by kernel density estimation (KDE), letting every pattern influence the shape of the probability density, thus leading to accurate estimation.
Abstract: The likelihood for patterns of continuous attributes for the naive Bayesian classifier (NBC) may be approximated by kernel density estimation (KDE), letting every pattern influence the shape of the probability density, thus leading to accurate estimation. KDE suffers from computational cost, making it unpractical in many real-world applications. We smooth the density using a spline, thus requiring only very few coefficients for the estimation rather than the whole training set, allowing rapid implementation of the NBC without sacrificing classifier accuracy. Experiments conducted over several real-world databases reveal acceleration, sometimes in several orders of magnitude, in favor of the spline approximation, making the application of KDE to the NBC practical.
TL;DR: In this article, moment inequalities for the supremum of empirical processes of U-Statistic structure were derived and applied to kernel type density estimation and estimation of the distribution function for functions of observations.
Abstract: We derive moment inequalities for the supremum of empirical processes of U-Statistic structure and give application to kernel type density estimation and estimation of the distribution function for functions of observations.
TL;DR: The likelihood for patterns of continuous attributes for the naive Bayesian classifier (NBC) may be approximated by kernel density estimation (KDE), letting every pattern influence the shape of the probability density thus leading to accurate estimation.
Abstract: The likelihood for patterns of continuous attributes for the naive Bayesian classifier (NBC) may be approximated by kernel density estimation (KDE), letting every pattern influence the shape of the probability density thus leading to accurate estimation. KDE suffers from computational cost making it unpractical in many real-world applications. We smooth the density using a spline thus requiring only very few coefficients for the estimation rather than the whole training set, allowing rapid implementation of the NBC without sacrificing classifier accuracy. Experiments conducted over several real-world databases reveal acceleration, sometimes in several orders of magnitude, in favor of the spline approximation making the application of KDE to the NBC practical.
TL;DR: Experimental results show that the proposed non-parametric approach to the ICA problem that is robust towards outlier effects is able to perform separation of sources in the presence of outliers, whereas existing algorithms like Jade and Infomax break down under such conditions.
Abstract: Learning using independent component analysis (ICA) has found a wide range of applications in the area of computer vision and pattern analysis, ranging from face recognition to speech separation. This paper presents a non-parametric approach to the ICA problem that is robust towards outlier effects. The algorithm, for the first time in the field of ICA, adopts an intuitive and direct approach, focusing on the very definition of independence itself; i.e. the joint probability density function (pdf) of independent sources is factorial over the marginal distributions. In the proposed algorithm, kernel density estimation is employed to approximate the underlying distributions. There are two major advantages of our algorithm. First, existing algorithms focus on learning the independent components by attempting to fulfill necessary conditions (but not sufficient) for independence. For example, the Jade algorithm attempts to approximate independence by minimizing higher order statistics, which are not robust to outliers. Comparatively, our technique is inherently robust towards outlier effects. Second, since the learning employs kernel density estimation, it is naturally free from the assumptions of source distributions (unlike the Infomax algorithm). Experimental results show that the algorithm is able to perform separation of sources in the presence of outliers, whereas existing algorithms like Jade and Infomax break down under such conditions. The results have also shown that the proposed non-parametric approach is generally source distribution independent. In addition, it is able to separate non-Gaussian zero-kurtotic signals unlike the traditional ICA algorithms like Jade and Infomax.