Top 103 papers published in the topic of Multivariate kernel density estimation in 2011

Showing papers on "Multivariate kernel density estimation published in 2011"

Book•

Smoothing Techniques : With Implementation in S

[...]

9 Nov 2011

TL;DR: The Kernel Estimate as a Frequency Counting Curve and the Histogram as a Maximum Likelihood Estimate: Keeping the Kernel Bias the Same and Keeping the Support of the Kernel the Same.

...read moreread less

Abstract: I. Density Smoothing.- 1. The Histogram.- 1.0 Introduction.- 1.1 Definitions of the Histogram.- The Histogram as a Frequency Counting Curve.- The Histogram as a Maximum Likelihood Estimate.- Varying the Binwidth.- 1.2 Statistics of the Histogram.- 1.3 The Histogram in S.- 1.4 Smoothing the Histogram by WARPing.- WARPing Algorithm.- WARPing in S.- Exercises.- 2. Kernel Density Estimation.- 2.0 Introduction.- 2.1 Definition of the Kernel Estimate.- Varying the Kernel.- Varying the Bandwidth.- 2.2 Kernel Density Estimation in S.- Direct Algorithm.- Implementation in S.- 2.3 Statistics of the Kernel Density.- Speed of Convergence.- Confidence Intervals and Confidence Bands.- 2.4 Approximating Kernel Estimates by WARPing.- 2.5 Comparison of Computational Costs.- 2.6 Comparison of Smoothers Between Laboratories.- Keeping the Kernel Bias the Same.- Keeping the Support of the Kernel the Same.- Canonical Kernels.- 2.7 Optimizing the Kernel Density.- 2.8 Kernels of Higher Order.- 2.9 Multivariate Kernel Density Estimation.- Same Bandwidth in Each Component.- Nonequal Bandwidths in Each Component.- A Matrix of Bandwidths.- Exercises.- 3. Further Density Estimators.- 3.0 Introduction.- 3.1 Orthogonal Series Estimators.- 3.2 Maximum Penalized Likelihood Estimators.- Exercises.- 4. Bandwidth Selection in Practice.- 4.0 Introduction.- 4.1 Kernel Estimation Using Reference Distributions.- 4.2 Plug-In Methods.- 4.3 Cross-Validation.- 4.3.1 Maximum Likelihood Cross-Validation.- Direct Algorithm.- 4.3.2 Least-Squares Cross-Validation.- Direct Algorithm.- 4.3.3 Biased Cross-Validation.- Algorithm.- 4.4 Cross-Validation for WARPing Density Estimation.- 4.4.1 Maximum Likelihood Cross-Validation.- 4.4.2 Least-Squares Cross-Validation.- Algorithm.- Implementation in S.- 4.4.3 Biased Cross-Validation.- Algorithm.- Implementation in S.- Exercises.- II. Regression Smoothing.- 5. Nonparametric Regression.- 5.0 Introduction.- 5.1 Kernel Regression Smoothing.- 5.1.1 The Nadaraya-Watson Estimator.- Direct Algorithm.- Implementation in S.- 5.1.2 Statistics of the Nadaraya-Watson Estimator.- 5.1.3 Confidence Intervals.- 5.1.4 Fixed Design Model.- 5.1.5 The WARPing Approximation.- Basic Algorithm.- Implementation in S.- 5.2 k-Nearest Neighbor (k-NN).- 5.2.1 Definition of the k-NN Estimate.- 5.2.2 Statistics of the k-NN Estimate.- 5.3 Spline Smoothing.- Exercises.- 6. Bandwidth Selection.- 6.0 Introduction.- 6.1 Estimates of the Averaged Squared Error.- 6.1.0 Introduction.- 6.1.1 Penalizing Functions.- 6.1.2 Cross-Validation.- Direct Algorithm.- 6.2 Bandwidth Selection with WARPing.- Penalizing Functions.- Cross-Validation.- Basic Algorithm.- Implementation in S.- Applications.- Exercises.- 7. Simultaneous Error Bars.- 7.1 Golden Section Bootstrap.- Algorithm for Golden Section Bootstrapping.- Implementation in S.- 7.2 Construction of Confidence Intervals.- Exercises.- Tables.- Solutions.- List of Used S Commands.- Symbols and Notation.- References.

...read moreread less

609 citations

Supplemental online material for the paper: \Multivariate Online Kernel Density Estimation with Gaussian Kernels"

[...]

Matej Kristan, Danijel Sko

1 Jan 2011

TL;DR: This document includes some detailed supplemental derivations used in the bandwidth estimation for the online Kernel Density Estimator which was proposed in the paper \Multivariate Online Kernel D density Estimation with Gaussian Kernels.

...read moreread less

Abstract: This document includes some detailed supplemental derivations used in the bandwidth estimation for the online Kernel Density Estimator which was proposed in the paper \Multivariate Online Kernel Density Estimation with Gaussian Kernels" by authors Matej Kristan, Ale

...read moreread less

144 citations

Journal Article•

Forest Density Estimation

[...]

Han Liu¹, Min Xu², Haijie Gu², Anupam Gupta², John Lafferty², Larry Wasserman² - Show less +2 more•Institutions (2)

Johns Hopkins University¹, Carnegie Mellon University²

01 Feb 2011-Journal of Machine Learning Research

TL;DR: It is proved that finding a maximum weight spanning forest with restricted tree size is NP-hard, and an approximation algorithm is developed for this problem.

...read moreread less

Abstract: We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models.

...read moreread less

118 citations

Journal Article•10.2139/SSRN.1856982•

Estimation of Parametric and Nonparametric Models for Univariate Claim Severity Distributions: An Approach Using R

[...]

David Pitt¹, Montserrat Guillén², Catalina Bolancé²•Institutions (2)

University of Melbourne¹, University of Barcelona²

19 May 2011-Social Science Research Network

TL;DR: The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management.

...read moreread less

Abstract: This paper presents an analysis of motor vehicle insurance claims relating to vehicle damage and to associated medical expenses. We use univariate severity distributions estimated with parametric and non-parametric methods. The methods are implemented using the statistical package R. Parametric analysis is limited to estimation of normal and lognormal distributions for each of the two claim types. The nonparametric analysis presented involves kernel density estimation. We illustrate the benefits of applying transformations to data prior to employing kernel based methods. We use a log-transformation and an optimal transformation amongst a class of transformations that produces symmetry in the data. The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management. To this end, we have included in the Appendix of this paper all the R code that has been used in the analysis so that readers, both students and educators, can fully explore the techniques described.

...read moreread less

101 citations

Journal Article•10.1016/J.JSPI.2011.01.002•

Kernel density estimation on the torus

[...]

Marco Di Marzio¹, Agnese Panzera¹, Charles C. Taylor²•Institutions (2)

University of Chieti-Pescara¹, University of Leeds²

01 Jun 2011-Journal of Statistical Planning and Inference

TL;DR: In this article, the authors introduce a specific class of product kernels whose order is suitably defined in such a way to obtain L 2 risk formulas whose structure can be compared to their Euclidean counterparts.

...read moreread less

91 citations

Proceedings Article•10.1145/2020408.2020507•

Density estimation trees

[...]

Parikshit Ram¹, Alexander G. Gray¹•Institutions (1)

Georgia Institute of Technology¹

21 Aug 2011

TL;DR: DETs empirically exhibit the interpretability, adaptability and feature selection properties of supervised decision trees while incurring slight loss in accuracy over other nonparametric density estimators, suggesting they might be able to avoid the curse of dimensionality if the true density is sparse in dimensions.

...read moreread less

Abstract: In this paper we develop density estimation trees (DETs), the natural analog of classification trees and regression trees, for the task of density estimation. We consider the estimation of a joint probability density function of a d-dimensional random vector X and define a piecewise constant estimator structured as a decision tree. The integrated squared error is minimized to learn the tree. We show that the method is nonparametric: under standard conditions of nonparametric density estimation, DETs are shown to be asymptotically consistent. In addition, being decision trees, DETs perform automatic feature selection. They empirically exhibit the interpretability, adaptability and feature selection properties of supervised decision trees while incurring slight loss in accuracy over other nonparametric density estimators. Hence they might be able to avoid the curse of dimensionality if the true density is sparse in dimensions. We believe that density estimation trees provide a new tool for exploratory data analysis with unique capabilities.

...read moreread less

89 citations

Journal Article•10.2202/1557-4679.1356•

Super learner based conditional density estimation with application to marginal structural models.

[...]

Ivan Diaz Munoz¹, Mark J. van der Laan¹•Institutions (1)

University of California, Berkeley¹

03 Oct 2011-The International Journal of Biostatistics

TL;DR: In this paper, a histogram-like estimator of a conditional density that uses cross-validation to estimate the histogram probabilities, as well as the optimal number and position of the bins is presented.

...read moreread less

Abstract: In this paper, we present a histogram-like estimator of a conditional density that uses cross-validation to estimate the histogram probabilities, as well as the optimal number and position of the bins. This estimator is an alternative to kernel density estimators when the dimension of the covariate vector is large. We demonstrate its applicability to estimation of Marginal Structural Model (MSM) parameters in which an initial estimator of the exposure mechanism is needed. MSM estimation based on the proposed density estimator results in less biased estimates, when compared to estimates based on a misspecified parametric model.

...read moreread less

53 citations

Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation

[...]

Liu Song, Yamada Makoto, Sugiyama Masashi

2 Nov 2011

46 citations

Journal Article•10.1162/NECO_A_00062•

Least-squares independent component analysis

[...]

Taiji Suzuki¹, Masashi Sugiyama²•Institutions (2)

University of Tokyo¹, Tokyo Institute of Technology²

01 Jan 2011-Neural Computation

TL;DR: This letter employs a squared-loss variant of mutual information as an independence measure and gives its estimation method, and develops an ICA algorithm, named least-squares independent component analysis.

...read moreread less

Abstract: Accurately evaluating statistical independence among random variables is a key element of independent component analysis (ICA). In this letter, we employ a squared-loss variant of mutual information as an independence measure and give its estimation method. Our basic idea is to estimate the ratio of probability densities directly without going through density estimation, thereby avoiding the difficult task of density estimation. In this density ratio approach, a natural cross-validation procedure is available for hyperparameter selection. Thus, all tuning parameters such as the kernel width or the regularization parameter can be objectively optimized. This is an advantage over recently developed kernel-based independence measures and is a highly useful property in unsupervised learning problems such as ICA. Based on this novel independence measure, we develop an ICA algorithm, named least-squares independent component analysis.

...read moreread less

45 citations

Journal Article•10.2139/SSRN.1134796•

Root-n Uniformly Consistent Density Estimation in Nonparametric Regression Models

[...]

Juan Carlos Escanciano¹, David T. Jacho-Chávez²•Institutions (2)

Indiana University¹, Emory University²

28 Sep 2011-Social Science Research Network

TL;DR: In this paper, a root-n consistent estimator of the probability density function of the response variable in a nonparametric regression model is proposed, which has a (uniform) asymptotic normal distribution and is computationally very simple to calculate.

...read moreread less

Abstract: The paper introduces a root-n consistent estimator of the probability density function of the response variable in a nonparametric regression model. The proposed estimator is shown to have a (uniform) asymptotic normal distribution, and it is computationally very simple to calculate. A Monte Carlo experiment confirms our theoretical results, and an empirical application demonstrates its usefulness. The results derived in the paper adapts general U-processes theory to the inclusion of infinite dimensional nuisance parameters.

...read moreread less

44 citations

Proceedings Article•10.1115/ES2011-54507•

Multivariate and Multimodal Wind Distribution Model Based on Kernel Density Estimation

[...]

Jie Zhang¹, Souma Chowdhury¹, Achille Messac², Luciano Castillo¹•Institutions (2)

Rensselaer Polytechnic Institute¹, Syracuse University²

1 Jan 2011

TL;DR: In this paper, a multivariate and multimodal wind distribution (MMWD) model was developed to estimate the wind conditions and design optimal wind farm configurations. But, the model is not suitable for large-scale wind farms due to the non-uniform distribution of wind speed, wind direction and air density.

...read moreread less

Abstract: This paper presents a new method to accurately characterize and predict the annual variation of wind conditions. Estimation of the distribution of wind conditions is necessary (i) to quantify the available energy (power density) at a site, and (ii) to design optimal wind farm configurations. We develop a smooth multivariate wind distribution model that captures the coupled variation of wind speed, wind direction, and air density. The wind distribution model developed in this paper also avoids the limiting assumption of unimodality of the distribution. This method, which we call the Multivariate and Multimodal Wind distribution (MMWD) model, is an evolution from existing wind distribution modeling techniques. Multivariate kernel density estimation , a standard non-parametric approach to estimate the probability density function of random variables, is adopted for this purpose. The MMWD technique is successfully applied to model (i) the distribution of wind speed (univariate); (ii) the distribution of wind speed and wind direction (bivariate); and (iii) the distribution of wind speed, wind direction, and air density (multivariate). The latter is a novel contribution of this paper, while the former offers opportunities for validation. Ten-year recorded wind data, obtained from the North Dakota Agricultural Weather Network (NDAWN), is used in this paper. We found the coupled distribution to be multimodal. A strong correlation among the wind condition parameters was also observed.Copyright © 2011 by ASME

...read moreread less

Journal Article•10.1111/J.1467-9868.2011.00772.X•

Self-consistent method for density estimation

[...]

Alberto Bernacchia¹, Simone Pigolotti²•Institutions (2)

Yale University¹, Niels Bohr Institute²

01 Jun 2011-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: The self‐consistent estimate is defined as a prior candidate density that precisely reproduces itself and is applied to artificial data generated from various distributions and reaches the theoretical limit for the scaling of the square error with the size of the data set.

...read moreread less

Abstract: The estimation of a density profile from experimental data points is a challenging problem, usually tackled by plotting a histogram. Prior assumptions on the nature of the density, from its smoothness to the specification of its form, allow the design of more accurate estimation procedures, such as Maximum Likelihood. Our aim is to construct a procedure that makes no explicit assumptions, but still providing an accurate estimate of the density. We introduce the self-consistent estimate: the power spectrum of a candidate density is given, and an estimation procedure is constructed on the assumption, to be released a posteriori, that the candidate is correct. The self-consistent estimate is defined as a prior candidate density that precisely reproduces itself. Our main result is to derive the exact expression of the self-consistent estimate for any given dataset, and to study its properties. Applications of the method require neither priors on the form of the density nor the subjective choice of parameters. A cutoff frequency, akin to a bin size or a kernel bandwidth, emerges naturally from the derivation. We apply the self-consistent estimate to artificial data generated from various distributions and show that it reaches the theoretical limit for the scaling of the square error with the dataset size.

...read moreread less

Journal Article•10.1007/S11203-011-9052-4•

Asymptotic normality of the Parzen-Rosenblatt density estimator for strongly mixing random fields

[...]

Mohamed El Machkouri¹•Institutions (1)

University of Rouen¹

01 Mar 2011-Statistical Inference for Stochastic Processes

TL;DR: In this article, the authors prove the asymptotic normality of the kernel density estimator in the context of stationary strongly mixing random fields, which is based on the Lindeberg method rather than on Bernstein's small-block large-block technique and coupling arguments widely used in previous works on nonparametric estimation for spatial processes.

...read moreread less

Abstract: We prove the asymptotic normality of the kernel density estimator (introduced by Rosenblatt, Proc Natl Acad Sci USA 42:43–47, 1956 and Parzen, Ann Math Stat 33:1965–1976, 1962) in the context of stationary strongly mixing random fields. Our approach is based on the Lindeberg’s method rather than on Bernstein’s small-block-large-block technique and coupling arguments widely used in previous works on nonparametric estimation for spatial processes. Our method allows us to consider only minimal conditions on the bandwidth parameter and provides a simple criterion on the strong mixing coefficients which do not depend on the bandwidth.

...read moreread less

Journal Article•10.1016/J.PATCOG.2010.08.027•

A kernel-based parametric method for conditional density estimation

[...]

Gang Fu, Frank Y. Shih¹, Haimin Wang¹•Institutions (1)

New Jersey Institute of Technology¹

01 Feb 2011-Pattern Recognition

TL;DR: Experimental results show that the proposed method outperforms the Nadaraya-Watson estimator in terms of revised mean integrated squared error (RMISE) and is an effective method for estimating the conditional densities.

...read moreread less

Journal Article•10.1016/J.JSPI.2011.01.009•

Minimax properties of beta kernel estimators

[...]

Karine Bertin¹, Nicolas Klutchnikoff²•Institutions (2)

Valparaiso University¹, University of Strasbourg²

01 Jul 2011-Journal of Statistical Planning and Inference

TL;DR: In this paper, the authors study the problem of estimating density functions with support in [0, 1] from an asymptotic minimax point of view and prove that for very regular density functions or for certain losses, these estimators are not minimax.

...read moreread less

Monograph•10.1142/8124•

Functional Estimation for Density, Regression Models and Processes

[...]

Odile Pons

1 Mar 2011

TL;DR: In this article, the kernel estimator of a Density kernel is replaced by a kernel estimate of the Varying Bandwidths Estimator (VBE) of a Regression Function.

...read moreread less

Abstract: Introduction Kernel Estimator of a Density Kernel Estimator of a Regression Function Limits for the Varying Bandwidths Estimators Nonparametric Estimation of Quantiles Nonparametric Estimation for Stochastic Processes Estimation in Semi-Parametric Regression Models Diffusions Processes Applications to Time Series

...read moreread less

Journal Article•10.1080/10485252.2010.537337•

Fourier series-based direct plug-in bandwidth selectors for kernel density estimation

[...]

Carlos Tenreiro¹•Institutions (1)

University of Coimbra¹

12 Jan 2011-Journal of Nonparametric Statistics

TL;DR: In this article, a class of Fourier series-based direct plug-in bandwidth selectors for kernel density estimation is considered and the proposed bandwidth estimators have a relative convergence rate n − 1.

...read moreread less

Abstract: A class of Fourier series-based direct plug-in bandwidth selectors for kernel density estimation is considered in this paper. The proposed bandwidth estimators have a relative convergence rate n −1...

...read moreread less

Journal Article•10.1016/J.STAMET.2010.08.004•

A note on generalized Bernstein polynomial density estimators

[...]

Yoshihide Kakizawa¹•Institutions (1)

Hokkaido University¹

01 Mar 2011-Statistical Methodology

TL;DR: In this article, a rescaled generalized Bernstein polynomial was proposed for approximating any continuous function defined on the closed interval [ 0, Δ ], whose coefficients are probabilities of the binomial random variable with parameters (m − 1, x / Δ ) depending on the location x ∈ [ 0, Δ ] where the density estimation is made.

...read moreread less

Topics in nonparametric statistics

[...]

Christopher Chang

1 Jan 2011

TL;DR: A kernel density estimator of a bootstrap series that estimates their marginal densities root-$n$ consistently is presented, equal to the rate of the best known convolution estimators, and faster than the standard kerneldensity estimator.

...read moreread less

Abstract: This thesis is concerned with nonparametric techniques for inferring properties of time series. First, we consider finite-order moving average and nonlinear autoregressive processes with no parametric assumption on the innovation distribution, and present a kernel density estimator of a bootstrap series that estimates their marginal densities root-$n$ consistently. This is equal to the rate of the best known convolution estimators, and faster than the standard kernel density estimator. We also conduct simulations to check the finite sample properties of our estimator, and the results are generally better than corresponding results for the standard kernel density estimator. Next, given stationary time series data, we study the problem of finding the best linear combination of a set of lag window spectral density estimators with respect to the mean squared risk. We present an aggregation procedure and prove a sharp oracle inequality for its risk. We also provide simulations demonstrating the performance of our aggregation procedure, given Bartlett and other estimators of varying bandwidths as input. This extends work by Rigollet and Tsybakov on aggregation of density estimators. The last part of this thesis introduces a class of robust autocorrelation estimators based on interpreting the sample autocorrelation function as a linear regression. We investigate the efficiency and robustness properties of the estimators that result from plugging on three common robust regression techniques. Construction of robust autocovariance and positive definite autocorrelation estimates is discussed, as well as application of the estimators to AR model fitting. We finish with simulations, which suggest that the estimators are especially well suited for AR model fitting

...read moreread less

Journal Article•10.1016/J.SPL.2011.01.013•

Kernel adjusted density estimation

[...]

Ramidha Srihera¹, Winfried Stute²•Institutions (2)

Thammasat University¹, University of Giessen²

01 May 2011-Statistics & Probability Letters

TL;DR: In this article, a kernel estimator of a density in which the kernel is adapted to the data but not fixed is proposed and studied, which naturally leads to an adaptive choice of the smoothing parameters which avoids asymptotic expansions.

...read moreread less

Journal Article•10.1016/J.SPL.2010.10.001•

Assessing log-concavity of multivariate densities

[...]

Martin L. Hazelton¹•Institutions (1)

Massey University¹

01 Jan 2011-Statistics & Probability Letters

TL;DR: In this article, the authors developed a test for log-concavity of multivariate densities using kernel density estimation, where the test statistic is the smallest bandwidth for which the estimate is logconcave.

...read moreread less

Journal Article•10.1016/J.CRMA.2011.10.017•

Nonparametric estimation of the density of regression errors

[...]

Rawane Samb¹•Institutions (1)

Université catholique de Louvain¹

01 Dec 2011-Comptes Rendus Mathematique

TL;DR: The asymptotic normality of the error density estimator and its rate-optimality are investigated, and the optimal choices of the first and second-step bandwidths used for estimating the regression function and the errordensity respectively are proposed.

...read moreread less

Posted Content•

Bayesian multivariate mixed-scale density estimation

[...]

Antonio Canale, David B. Dunson

06 Oct 2011-arXiv: Statistics Theory

TL;DR: In this article, the authors considered a general framework to jointly model continuous, count and categorical variables under a nonparametric prior, which is induced through rounding latent variables having an unknown density with respect to Lebesgue measure.

...read moreread less

Abstract: Although continuous density estimation has received abundant attention in the Bayesian nonparametrics literature, there is limited theory on multivariate mixed scale density estimation. In this note, we consider a general framework to jointly model continuous, count and categorical variables under a nonparametric prior, which is induced through rounding latent variables having an unknown density with respect to Lebesgue measure. For the proposed class of priors, we provide sufficient conditions for large support, strong consistency and rates of posterior contraction. These conditions allow one to convert sufficient conditions obtained in the setting of multivariate continuous density estimation to the mixed scale case. To illustrate the procedure a rounded multivariate nonparametric mixture of Gaussians is introduced and applied to a crime and communities dataset.

...read moreread less

Journal Article•10.1016/J.JMVA.2010.10.006•

Nonparametric estimation of the anisotropic probability density of mixed variables

[...]

Sam Efromovich¹•Institutions (1)

University of Texas at Dallas¹

01 Mar 2011-Journal of Multivariate Analysis

TL;DR: A data-driven estimator is developed that adapts to unknown anisotropic smoothness of the joint density and, whenever the density depends on a smaller number of variables, performs a dimension reduction that implies the corresponding optimal rate of the mean integrated squared error (MISE) convergence.

...read moreread less

Journal Article•10.1198/JBES.2010.07327•

Infinite Density at the Median and the Typical Shape of Stock Return Distributions

[...]

Chirok Han, Jin Seo Cho, Peter C.B. Phillips

01 Apr 2011-Journal of Business & Economic Statistics

TL;DR: In this paper, statistics are developed to test for the presence of an asymptotic discontinuity (or infinite density or peakedness) in a probability density at the median.

...read moreread less

Abstract: Statistics are developed to test for the presence of an asymptotic discontinuity (or infinite density or peakedness) in a probability density at the median. The approach makes use of work by Knight (1998) on L1 estimation asymptotics in conjunction with non-parametric kernel density estimation methods. The size and power of the tests are assessed, and conditions under which the tests have good performance are explored in simulations. The new methods are applied to stock returns of leading companies across major U.S. industry groups. The results confirm the presence of infinite density at the median as a new significant empirical evidence for stock return distributions.

...read moreread less

Journal Article•10.1002/ENV.1082•

Variable location kernel method using line transect sampling

[...]

Omar Eidous¹•Institutions (1)

Yarmouk University¹

01 May 2011-Environmetrics

TL;DR: In this article, the variable location kernel (VLK) method was used to fit line transect data in order to estimate the density of a biological population, which improved upon the performance of the classical kernel estimator.

...read moreread less

Abstract: The variable location kernel (VLK) method provides a nonparametric estimator for a probability density function. This article proposes the VLK method to fit line transect data in order to estimate the density of a biological population. The method produces two promising estimators for the density of objects which improve upon the performance of the classical kernel estimator. Although the two proposed estimators share a common form, they exhibit rather different performances. To compute the bias and variance of the proposed estimators, the bootstrap technique is proposed. For a wide range of possible models for line transect data, a comparison of the two estimators and the classical kernel estimator is carried out by simulation. The results show the practical potential of the proposed estimators over the classical kernel estimator for almost all cases considered. Two previously published data sets are also analyzed and the results confirm the good performances of the proposed estimators. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Posted Content•

Local Component Analysis

[...]

Nicolas Le Roux¹, Francis Bach¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Sep 2011-arXiv: Learning

TL;DR: In this article, the authors propose to learn a full Euclidean metric through an expectation-minimization (EM) procedure, which can be seen as an unsupervised counterpart to neighbourhood component analysis (NCA).

...read moreread less

Abstract: Kernel density estimation, a.k.a. Parzen windows, is a popular density estimation method, which can be used for outlier detection or clustering. With multivariate data, its performance is heavily reliant on the metric used within the kernel. Most earlier work has focused on learning only the bandwidth of the kernel (i.e., a scalar multiplicative factor). In this paper, we propose to learn a full Euclidean metric through an expectation-minimization (EM) procedure, which can be seen as an unsupervised counterpart to neighbourhood component analysis (NCA). In order to avoid overfitting with a fully nonparametric density estimator in high dimensions, we also consider a semi-parametric Gaussian-Parzen density model, where some of the variables are modelled through a jointly Gaussian density, while others are modelled through Parzen windows. For these two models, EM leads to simple closed-form updates based on matrix inversions and eigenvalue decompositions. We show empirically that our method leads to density estimators with higher test-likelihoods than natural competing methods, and that the metrics may be used within most unsupervised learning techniques that rely on such metrics, such as spectral clustering or manifold learning methods. Finally, we present a stochastic approximation scheme which allows for the use of this method in a large-scale setting.

...read moreread less

Proceedings Article•

On the Robustness of Kernel Density M-Estimators

[...]

JooSeuk Kim¹, Clayton Scott¹•Institutions (1)

University of Michigan¹

28 Jun 2011

TL;DR: A method for nonparametric density estimation that exhibits robustness to contamination of the training sample is analyzed, achieving robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation.

...read moreread less

Abstract: We analyze a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. The KDE based on a Gaussian kernel is interpreted as a sample mean in the associated reproducing kernel Hilbert space (RKHS). This mean is estimated robustly through the use of a robust loss, yielding the so-called robust kernel density estimator (RKDE). This robust sample mean can be found via a kernelized iteratively re-weighted least squares (IR-WLS) algorithm. Our contributions are summarized as follows. First, we present a representer theorem for the RKDE, which gives an insight into the robustness of the RKDE. Second, we provide necessary and sufficient conditions for kernel IRWLS to converge to the global minimizer, in the Gaussian RKHS, of the objective function defining the RKDE. Third, characterize and provide a method for computing the influence function associated with the RKDE. Fourth, we illustrate the robustness of the RKDE through experiments on several data sets.

...read moreread less

Journal Article•10.1103/PHYSREVE.84.066702•

Nonparametric model reconstruction for stochastic differential equations from discretely observed time-series data

[...]

Jun Ohkubo¹•Institutions (1)

Kyoto University¹

14 Dec 2011-Physical Review E

TL;DR: A scheme is developed for estimating state-dependent drift and diffusion coefficients in a stochastic differential equation from time-series data using a maximum likelihood method combined with a concept based on a kernel density estimation.

...read moreread less

Abstract: A scheme is developed for estimating state-dependent drift and diffusion coefficients in a stochastic differential equation from time-series data. The scheme does not require to specify parametric forms for the drift and diffusion coefficients in advance. In order to perform the nonparametric estimation, a maximum likelihood method is combined with a concept based on a kernel density estimation. In order to deal with discrete observation or sparsity of the time-series data, a local linearization method is employed, which enables a fast estimation.

...read moreread less

Journal Article•10.1111/J.1467-9574.2011.00485.X•

Combining kernel estimators in the uniform deconvolution problem

[...]

Bert van Es¹•Institutions (1)

University of Amsterdam¹

01 Aug 2011-Statistica Neerlandica

TL;DR: In this article, a density estimator and an estimator of the distribution function in the uniform deconvolution model were constructed based on inversion formulas and kernel estimators of the density of the observations and its derivative.

...read moreread less

Abstract: We construct a density estimator and an estimator of the distribution function in the uniform deconvolution model. The estimators are based on inversion formulas and kernel estimators of the density of the observations and its derivative. Initially the inversions yield two different estimators of the density and two estimators of the distribution function. We construct asymptotically optimal convex combinations of these two estimators. We also derive pointwise asymptotic normality of the resulting estimators, the pointwise asymptotic biases and an expansion of the mean integrated squared error of the density estimator. It turns out that the pointwise limit distribution of the density estimator is the same as the pointwise limit distribution of the density estimator introduced by Groeneboom and Jongbloed (Neerlandica, 57, 2003, 136), a kernel smoothed nonparametric maximum likelihood estimator of the distribution function.

...read moreread less

...

Expand