Top 85 papers published in the topic of Multivariate kernel density estimation in 2017

Showing papers on "Multivariate kernel density estimation published in 2017"

Journal Article•10.1177/0962280215609948•

Fast clustering using adaptive density peak detection

[...]

Xiao-Feng Wang¹, Yifan Xu²•Institutions (2)

Cleveland Clinic Lerner Research Institute¹, Case Western Reserve University²

01 Dec 2017-Statistical Methods in Medical Research

TL;DR: This paper proposes a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation and develops an automatic cluster centroid selection method through maximizing an average silhouette index.

...read moreread less

Abstract: Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

...read moreread less

98 citations

Journal Article•10.1016/J.GSF.2017.05.002•

Visualising data distributions with kernel density estimation and reduced chi-squared statistic

[...]

Christopher Spencer¹, Chris Yakymchuk², Mahmoudreza Ghaznavi²•Institutions (2)

Curtin University¹, University of Waterloo²

01 Nov 2017-Geoscience frontiers

TL;DR: A Java-based computer application is presented called KD X to facilitate the visualization of data and the utilization of numerical tools used in frequency distribution statistics to data.

...read moreread less

Abstract: The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean Due to the wide applicability of these tools, we present a Java-based computer application called KD X to facilitate the visualization of data and the utilization of these numerical tools

...read moreread less

89 citations

Proceedings Article•10.1109/FOCS.2017.99•

Hashing-Based-Estimators for Kernel Density in High Dimensions

[...]

Moses Charikar¹, Paris Siminelakis¹•Institutions (1)

Stanford University¹

1 Oct 2017

TL;DR: This work introduces a class of unbiased estimators for kernel density implemented through locality-sensitive hashing, and gives general theorems bounding the variance of such estimators.

...read moreread less

Abstract: Given a set of points P⊄ R^d and a kernel k, the Kernel Density Estimate at a point x∊R^d is defined as \mathrm{KDE}_{P}(x)=\frac{1}{|P|}\sum_{y\in P} k(x,y). We study the problem of designing a data structure that given a data set P and a kernel function, returns approximations to the kernel density} of a query point in sublinear time}. We introduce a class of unbiased estimators for kernel density implemented through locality-sensitive hashing, and give general theorems bounding the variance of such estimators. These estimators give rise to efficient data structures for estimating the kernel density in high dimensions for a variety of commonly used kernels. Our work is the first to provide data-structures with theoretical guarantees that improve upon simple random sampling in high dimensions.

...read moreread less

63 citations

Journal Article•10.1007/S40565-015-0172-5•

Wind speed model based on kernel density estimation and its application in reliability assessment of generating systems

[...]

Bo Hu¹, Yudun Li², Hejun Yang¹, He Wang¹•Institutions (2)

Chongqing University¹, Electric Power Research Institute²

01 Mar 2017-Journal of Modern Power Systems and Clean Energy

TL;DR: In this paper, a kernel density estimation (KDE) method is proposed to estimate the probability density function (PDF) of wind speed, without making any assumption on the form of the underlying wind speed distribution, and capable of uncovering the statistical information hidden in the historical data.

...read moreread less

Abstract: An accurate probability distribution model of wind speed is critical to the assessment of reliability contribution of wind energy to power systems. Most of current models are built using the parametric density estimation (PDE) methods, which usually assume that the wind speed are subordinate to a certain known distribution (e.g. Weibull distribution and Normal distribution) and estimate the parameters of models with the historical data. This paper presents a kernel density estimation (KDE) method which is a nonparametric way to estimate the probability density function (PDF) of wind speed. The method is a kind of data-driven approach without making any assumption on the form of the underlying wind speed distribution, and capable of uncovering the statistical information hidden in the historical data. The proposed method is compared with three parametric models using wind data from six sites. The results indicate that the KDE outperforms the PDE in terms of accuracy and flexibility in describing the long-term wind speed distributions for all sites. A sensitivity analysis with respect to kernel functions is presented and Gauss kernel function is proved to be the best one. Case studies on a standard IEEE reliability test system (IEEE-RTS) have verified the applicability and effectiveness of the proposed model in evaluating the reliability performance of wind farms.

...read moreread less

53 citations

Journal Article•10.1007/S10035-017-0771-0•

3D particle shape modelling and optimization through proper orthogonal decomposition

[...]

Noura Ouhbi¹, Noura Ouhbi², Charles Voivret², Guillaume Perrin, Jean-Noël Roux¹ - Show less +1 more•Institutions (2)

University of Paris¹, SNCF²

01 Nov 2017-Granular Matter

TL;DR: In this paper, a new method is presented in order to statistically characterize arbitrary particle shapes using an optimal choice of shape functions identified on a set of 1000 digitized railway ballast particles obtained through 3D Scan.

...read moreread less

Abstract: Based on proper orthogonal decomposition (POD), a new method is presented in order to statistically characterize arbitrary particle shapes using an optimal choice of shape functions identified on a set of 1000 digitized railway ballast particles obtained through 3D Scan. The coefficients of the POD expansion enable a description of ballast grains with varying levels of accuracy. On exploiting the knowledge of their statistical distribution we are able, implementing an appropriate multivariate kernel density estimation method, to generate irregular particles with similar morphological features. The description and generation methods are validated by comparing statistical distributions of basic characteristics: surface area, volume, average radius, elongation, flatness, and aspect ratio. Using suitable geometric descriptors defining local curvatures, we identify which surface points might be regarded as forming faces. This shows that the proposed particle generation method is well suited for irregularly shaped granular materials, as a first geometric definition step, before numerical simulations of their collective mechanical properties are carried out by a Discrete Element code dealing with polyhedral shapes. We illustrate this process with the simple case of the assembling of a granular pack from a loose configuration, by one-dimensional compression, using different levels of accuracy in the representation of grain shape.

...read moreread less

46 citations

Journal Article•10.4225/03/59389CAE32A30•

Bandwidth Selection for Multivariate Kernel Density Estimation Using Mcmc

[...]

Xibin Zhang¹, Maxwell L. King¹, Rob J. Hyndman•Institutions (1)

Monash University¹

08 Jun 2017-Research Papers in Economics

TL;DR: This work provides Markov chain Monte Carlo algorithms for computing the bandwidth matrix for multivariate kernel density estimation by optimizing the likelihood cross-validation criterion, and shows that the resulting bandwidths are superior to all existing methods.

...read moreread less

Abstract: Paper not available. Full text of working paper suppressed by author. We provide Markov chain Monte Carlo (MCMC) algorithms for computing the bandwidth matrix for multivariate kernel density estimation. Our approach is based on treating the elements of the bandwidth matrix as parameters to be estimated, which we do by optimizing the likelihood cross-validation criterion. Numerical results show that the resulting bandwidths are superior to all existing methods; for dimensions greater than two, our algorithm is the first practical method for estimating the optimal bandwidth matrix. Moreover, the MCMC algorithm for bandwidth selection for multivariate data has no increased difficulty as the dimension of data increases.

...read moreread less

46 citations

Journal Article•10.1007/S11222-016-9706-6•

The locally Gaussian density estimator for multivariate data

[...]

Håkon Otneim¹, Dag Tjøstheim¹•Institutions (1)

University of Bergen¹

01 Nov 2017-Statistics and Computing

TL;DR: This paper presents the Locally Gaussian Density Estimator (LGDE), which introduces a similar idea to the problem of density estimation, and it is shown that the LGDE converges at a speed that does not depend on the dimension.

...read moreread less

Abstract: It is well known that the Curse of Dimensionality causes the standard Kernel Density Estimator to break down quickly as the number of variables increases. In non-parametric regression, this effect is relieved in various ways, for example by assuming additivity or some other simplifying structure on the interaction between variables. This paper presents the Locally Gaussian Density Estimator (LGDE), which introduces a similar idea to the problem of density estimation. The LGDE is a new method for the non-parametric estimation of multivariate probability density functions. It is based on preliminary transformations of the marginal observation vectors towards standard normality, and a simplified local likelihood fit of the resulting distribution with standard normal marginals. The LGDE is introduced, and asymptotic theory is derived. In particular, it is shown that the LGDE converges at a speed that does not depend on the dimension. Examples using real and simulated data confirm that the new estimator performs very well on finite sample sizes.

...read moreread less

41 citations

Journal Article•10.1111/GEB.12492•

A cautionary note on the use of hypervolume kernel density estimators in ecological niche modelling

[...]

Huijie Qiao¹, Luis E. Escobar², Erin E. Saupe³, Liqiang Ji¹, Jorge Soberón⁴ - Show less +1 more•Institutions (4)

Chinese Academy of Sciences¹, University of Minnesota², Yale University³, University of Kansas⁴

01 Sep 2017-Global Ecology and Biogeography

TL;DR: In this paper, a new multivariate kernel density estimation (KDE) method was introduced to infer Hutchinsonian hypervolumes in the modelling of ecological niches, and the authors argued that their method matches or outperforms several methods for estimating hypervolume geometries and for conducting species distribution modelling.

...read moreread less

Abstract: Blonder et al. (2014, Global Ecology and Biogeography, 23, 595–609) introduced a new multivariate kernel density estimation (KDE) method to infer Hutchinsonian hypervolumes in the modelling of ecological niches. The authors argued that their KDE method matches or outperforms several methods for estimating hypervolume geometries and for conducting species distribution modelling. Further clarification, however, is appropriate with respect to the assumptions and limitations of KDE as a method for species distribution modelling. Using virtual species and controlled environmental scenarios, we show that KDE both under- and overestimates niche volumes depending on the dimensionality of the dataset and the number of occurrence records considered. We suggest that KDE may be a viable approach when dealing with large sample sizes, limited sampling bias and only a few environmental dimensions.

...read moreread less

39 citations

Journal Article•10.1016/J.CSDA.2016.09.001•

FFT-based fast bandwidth selector for multivariate kernel density estimation

[...]

Artur Gramacki¹, J. Gramacki¹•Institutions (1)

University of Zielona Góra¹

01 Feb 2017-Computational Statistics & Data Analysis

TL;DR: In this article, a more general solution is presented where the above mentioned limitation is relaxed and the presented solution can be easily adopted also for the task of efficient computation of integrated density derivative functionals involving an arbitrary derivative order.

...read moreread less

38 citations

Journal Article•10.1111/RSSA.12179•

Estimating the density of ethnic minorities and aged people in Berlin: multivariate kernel density estimation applied to sensitive georeferenced administrative data protected via measurement error

[...]

Marcus Groß¹, Ulrich Rendtel¹, Timo Schmid¹, Sebastian M. Schmon², N. Tzavidis³ - Show less +1 more•Institutions (3)

Free University of Berlin¹, University of Oxford², University of Southampton³

01 Jan 2017-Journal of The Royal Statistical Society Series A-statistics in Society

TL;DR: This work proposes multivariate non-parametric kernel density estimation that reverses the rounding process by using a Bayesian measurement error model, applied to the Berlin register of residents for deriving density estimates of ethnic minorities and aged people.

...read moreread less

Abstract: Modern systems of official statistics require the timely estimation of area-specific densities of subpopulations. Ideally estimates should be based on precise geocoded information, which is not available because of confidentiality constraints. One approach for ensuring confidentiality is by rounding the geoco-ordinates. We propose multivariate non-parametric kernel density estimation that reverses the rounding process by using a measurement error model. The methodology is applied to the Berlin register of residents for deriving density estimates of ethnic minorities and aged people. Estimates are used for identifying areas with a need for new advisory centres for migrants and infrastructure for older people.

...read moreread less

33 citations

Proceedings Article•10.1145/3035918.3064035•

Scalable Kernel Density Classification via Threshold-Based Pruning

[...]

Edward Gan¹, Peter Bailis¹•Institutions (1)

Stanford University¹

9 May 2017

TL;DR: This paper introduces a simple technique for improving the performance of using a KDE to classify points by their density (density classification), and applies threshold-based pruning to spatial index traversal to achieve asymptotic speedups over naïve KDE, while maintaining accuracy guarantees.

...read moreread less

Abstract: Density estimation forms a critical component of many analytics tasks including outlier detection, visualization, and statistical testing. These tasks often seek to classify data into high and low-density regions of a probability distribution. Kernel Density Estimation (KDE) is a powerful technique for computing these densities, offering excellent statistical accuracy but quadratic total runtime. In this paper, we introduce a simple technique for improving the performance of using a KDE to classify points by their density (density classification). Our technique, thresholded kernel density classification (tKDC), applies threshold-based pruning to spatial index traversal to achieve asymptotic speedups over naive KDE, while maintaining accuracy guarantees. Instead of exactly computing each point's exact density for use in classification, tKDC iteratively computes density bounds and short-circuits density computation as soon as bounds are either higher or lower than the target classification threshold. On a wide range of dataset sizes and dimensions, tKDC demonstrates empirical speedups of up to 1000x over alternatives.

...read moreread less

Journal Article•10.1080/10618600.2018.1549052•

Fast and stable multivariate kernel density estimation by fast sum updating.

[...]

Nicolas Langrené¹, Xavier Warin•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

04 Dec 2017-arXiv: Computation

TL;DR: In this article, the Fast Sum Updating approach is extended to the general multivariate case for general input data and rectilinear evaluation grid, including the triangular, cosine and Silverman kernels, and its combination with a fast approximate k-nearest-neighbors bandwidth for multivariate datasets.

...read moreread less

Abstract: Kernel density estimation and kernel regression are powerful but computationally expensive techniques: a direct evaluation of kernel density estimates at $M$ evaluation points given $N$ input sample points requires a quadratic $\mathcal{O}(MN)$ operations, which is prohibitive for large scale problems. For this reason, approximate methods such as binning with Fast Fourier Transform or the Fast Gauss Transform have been proposed to speed up kernel density estimation. Among these fast methods, the Fast Sum Updating approach is an attractive alternative, as it is an exact method and its speed is independent of the input sample and the bandwidth. Unfortunately, this method, based on data sorting, has for the most part been limited to the univariate case. In this paper, we revisit the fast sum updating approach and extend it in several ways. Our main contribution is to extend it to the general multivariate case for general input data and rectilinear evaluation grid. Other contributions include its extension to a wider class of kernels, including the triangular, cosine and Silverman kernels, its combination with parsimonious additive multivariate kernels, and its combination with a fast approximate k-nearest-neighbors bandwidth for multivariate datasets. Our numerical tests of multivariate regression and density estimation confirm the speed, accuracy and stability of the method. We hope this paper will renew interest for the fast sum updating approach and help solve large-scale practical density estimation and regression problems.

...read moreread less

Journal Article•10.1007/S00521-015-2164-9•

Multi-kernel learning for multivariate performance measures optimization

[...]

Fan Lin¹, Jingbin Wang², Nian Zhang³, Jianbing Xiahou¹, Nancy McDonald⁴ - Show less +1 more•Institutions (4)

Xiamen University¹, Chinese Academy of Sciences², Xiamen University of Technology³, Tulane University⁴

01 Aug 2017-Neural Computing and Applications

TL;DR: This paper investigates the problem of optimizing complex multivariate performance measures to learn classifiers for pattern classification problems and proposes to construct an optimal kernel by weighted linear combination of some candidate kernels.

...read moreread less

Abstract: In this paper, we investigate the problem of optimizing complex multivariate performance measures to learn classifiers for pattern classification problems. For the first time, the multi-kernel learning is considered to construct a classifier to optimize a given nonlinear and non-smooth multivariate classifier performance measure. We estimate and optimize the upper bound of the given multivariate performance measure, instead of optimizing it directly. Moreover, to solve the problem of kernel function selection and kernel parameter tuning, we proposed to construct an optimal kernel by weighted linear combination of some candidate kernels. The learning of the classifier parameter and the kernel weight are unified in a single objective function considering minimizing the upper bound of the given multivariate performance measure. The objective function is optimized with regard to classifier parameter and kernel weight alternately in an iterative algorithm. The developed algorithm is evaluated on two different pattern classification methods with regard to various multivariate performance measure optimization problems. The experiment results show the proposed algorithm outperforms the competing methods.

...read moreread less

Journal Article•10.1080/03610926.2015.1019144•

Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results

[...]

Salim Bouzebda¹, Sultana Didi²•Institutions (2)

University of Technology of Compiègne¹, Pierre-and-Marie-Curie University²

01 Feb 2017-Communications in Statistics-theory and Methods

TL;DR: The asymptotic normality of considered wavelet-based estimators, under easily verifiable conditions, is characterized, by means of the martingale approach.

...read moreread less

Abstract: In the present paper, we are mainly concerned with the non parametric estimation of the density as well as the regression function by using orthonormal wavelet bases. We provide the strong uniform consistency properties with rates of these estimators, over compact subsets of , under a general ergodic condition on the underlying processes. We characterize the asymptotic normality of considered wavelet-based estimators, under easily verifiable conditions. The asymptotic properties of these estimators are obtained, by means of the martingale approach.

...read moreread less

Journal Article•10.1021/ACS.IECR.6B04068•

Nonparametric Density Estimation of Hierarchical Probabilistic Graph Models for Assumption-Free Monitoring

[...]

Jiusun Zeng¹, Shihua Luo², Jinhui Cai¹, Uwe Kruger³, Lei Xie⁴ - Show less +1 more•Institutions (4)

China Jiliang University¹, Jiangxi University of Finance and Economics², Rensselaer Polytechnic Institute³, Zhejiang University⁴

27 Jan 2017-Industrial & Engineering Chemistry Research

TL;DR: This article shows that decomposing the graphical model into a hierarchical structure reduces estimating a multivariate density function to the estimation of low-dimensional/conditional probabilities.

...read moreread less

Abstract: Probabilistic graphical models, such as Bayesian networks, have recently gained attention in process monitoring and fault diagnosis. Their application, however, is limited to discrete or continuous Gaussian distributed variables, which results from the difficulty in efficiently estimating multivariate density functions. This article shows that decomposing the graphical model into a hierarchical structure reduces estimating a multivariate density function to the estimation of low-dimensional/conditional probabilities. These conditional density functions can be effectively estimated from data using a nonparametric kernel method and the low-dimensional densities can be estimated using a kernel density estimation (KDE). On the basis of the estimated densities, anomalous process behavior can be detected and diagnosed by examining which probability is lower than its corresponding confidence limit. Applications to simulated examples and an industrial blast furnace iron-making process show that the proposed metho...

...read moreread less

Journal Article•10.1109/TCYB.2017.2648261•

An Extreme Learning Machine Approach to Density Estimation Problems

[...]

Cristiano Cervellera¹, Danilo Maccio¹•Institutions (1)

National Research Council¹

17 Jan 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Simulation tests show how ELMs can be successfully employed in the density estimation framework, as a possible alternative to other standard methods.

...read moreread less

Abstract: In this paper, we discuss how the extreme learning machine (ELM) framework can be effectively employed in the unsupervised context of multivariate density estimation. In particular, two algorithms are introduced, one for the estimation of the cumulative distribution function underlying the observed data, and one for the estimation of the probability density function. The algorithms rely on the concept of ${F}$ -discrepancy, which is closely related to the Kolmogorov–Smirnov criterion for goodness of fit. Both methods retain the key feature of the ELM of providing the solution through random assignment of the hidden feature map and a very light computational burden. A theoretical analysis is provided, discussing convergence under proper hypotheses on the chosen activation functions. Simulation tests show how ELMs can be successfully employed in the density estimation framework, as a possible alternative to other standard methods.

...read moreread less

Journal Article•10.1016/J.JKSS.2016.09.002•

Inverse gamma kernel density estimation for nonnegative data

[...]

Yoshihide Kakizawa¹, Gaku Igarashi²•Institutions (2)

Hokkaido University¹, University of Tsukuba²

01 Jun 2017-Journal of The Korean Statistical Society

TL;DR: In this paper, a varying asymmetric kernel estimation of the density f for nonnegative data is proposed, regardless of f (0 ) = 0 or f ( 0 ) > 0.

...read moreread less

Abstract: This paper considers a varying asymmetric kernel estimation of the density f for nonnegative data. Regardless of f ( 0 ) = 0 or f ( 0 ) > 0 , it is important to give a good varying shape/scale parameter for the inverse gamma (IGam) kernel, due to the problem of f ( 0 ) = 0 in some existing literature. After reformulating the IGam kernel density estimator, asymptotic properties like mean integrated squared error, mean integrated absolute error, strong consistency, and asymptotic normality are investigated in detail, under some conditions on the target density f . Simulation studies are conducted to compare the proposed IGam kernel density estimators with the existing gamma kernel density estimators.

...read moreread less

Journal Article•10.1214/16-AOS1486•

Operational time and in-sample density forecasting

[...]

Young K. Lee, Enno Mammen, Jens Perch Nielsen, Byeong U. Park

01 Jun 2017-Annals of Statistics

TL;DR: In this article, a new structural model for in-sample density forecasting is proposed, where the density is a product of one-dimensional functions with one function sitting on the scale of a transformed space of observations.

...read moreread less

Abstract: In this paper we consider a new structural model for in-sample density forecasting. In-sample density forecasting is to estimate a structured density on a region where data are observed and then re-use the estimated structured density on some region where data are not observed. Our structural assumption is that the density is a product of one-dimensional functions with one function sitting on the scale of a transformed space of observations. The transformation involves another unknown one-dimensional function, so that our model is formulated via a known smooth function of three underlying unknown one-dimensional functions. We present an innovative way of estimating the one-dimensional functions and show that all the estimators of the three components achieve the optimal one-dimensional rate of convergence. We illustrate how one can use our approach by analyzing a real dataset, and also verify the tractable finite sample performance of the method via a simulation study.

...read moreread less

Journal Article•10.1016/J.NEUCOM.2017.06.035•

A kernelized non-parametric classifier based on feature ranking in anisotropic Gaussian kernel

[...]

Razieh Sheikhpour¹, Mehdi Agha Sarram¹, Mohammad Ali Zare Chahooki¹, Robab Sheikhpour²•Institutions (2)

Yazd University¹, Shahid Sadoughi University of Medical Sciences and Health Services²

06 Dec 2017-Neurocomputing

TL;DR: A kernelized non-parametric classifier based on feature ranking in anisotropic Gaussian kernel (KNR-AGK), which focuses on the selection of different bandwidths in kernel density estimation and has better performance than Gaussian Kernel density estimation based classifier.

...read moreread less

Journal Article•10.1016/J.SPL.2017.08.003•

Higher order kernel density estimation on the circle

[...]

Yasuhito Tsuruta¹, Masahiko Sagae¹•Institutions (1)

Kanazawa University¹

01 Dec 2017-Statistics & Probability Letters

TL;DR: A new class of p th-order kernels corresponding to new moments on the circle is introduced and two methods for constructing higher-order kernel density estimators are proposed and derived.

...read moreread less

Proceedings Article•

Variable kernel density estimation in high-dimensional feature spaces

[...]

Christiaan M Van der Walt¹, Etienne Barnard²•Institutions (2)

Council for Scientific and Industrial Research¹, North-West University²

13 Feb 2017

TL;DR: This work derives a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and shows that this estimator is capable of performing estimation in high-dimensional feature spaces with great success.

...read moreread less

Abstract: Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum-likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.

...read moreread less

Proceedings Article•

Convergence rates of a partition based Bayesian multivariate density estimation method.

[...]

Linxi Liu, Dangna Li¹, Wing Hung Wong¹•Institutions (1)

Stanford University¹

1 Dec 2017

TL;DR: A class of non-parametric density estimators under Bayesian settings obtained by adaptively partitioning the sample space can adapt to the unknown smoothness of the true density function, thus achieving the optimal convergence rate without artificial conditions on the density.

...read moreread less

Abstract: We study a class of non-parametric density estimators under Bayesian settings. The estimators are obtained by adaptively partitioning the sample space. Under a suitable prior, we analyze the concentration rate of the posterior distribution, and demonstrate that the rate does not directly depend on the dimension of the problem in several special cases. Another advantage of this class of Bayesian density estimators is that it can adapt to the unknown smoothness of the true density function, thus achieving the optimal convergence rate without artificial conditions on the density. We also validate the theoretical results on a variety of simulated data sets.

...read moreread less

Book Chapter•10.1142/9789814663588_0010•

Nonparametric density estimation

[...]

Jayant V. Deshpande, Uttara Naik-Nimbalkar, Isha Dewan

1 Dec 2017

TL;DR: In this paper, the background material related to the nonparametric density estimation is described, and a short overview of the fundamental concepts related to histograms is presented, followed by a description of a smart extension of certain well-known histograms aimed at avoiding some of their drawbacks.

...read moreread less

Abstract: This chapter describes the background material related to the nonparametric density estimation. Techniques such as histograms (together with its extension, known as ASH, see Sect. 2.3), Parzen windows and k-nearest neighbors are at the core of the applications of nonparametric density estimation. For that reason, we decided to include a chapter describing these for the sake of completeness and to allow less experienced readers develop their intuitions in terms of the nonparametric estimation. Most of the material is presented taking into account only the univariate case; extending the results to cover more than one variable, however, is often a straightforward task. The chapter is organized as follows: Sect. 2.2 presents a short overview of the fundamental concepts related to histograms. Section2.3 is devoted to a description of a smart extension of certain well-known histograms aimed at avoiding some of their drawbacks. Section2.4 presents basic concepts related to the nonparametric density estimation. Section2.5 is devoted to the Parzen windows, while Sect. 2.6 to the k-nearest neighbors approach.

...read moreread less

Sparse Estimation of Travel Time Distributions Using Gamma Kernels

[...]

Deepthi Mary Dilip¹, Nikolaos M. Freris, Saif Eddin Jabari•Institutions (1)

Birla Institute of Technology and Science¹

1 Jan 2017

Journal Article•10.1016/J.INSMATHECO.2017.02.007•

Nonparametric estimation of the claim amount in the strong stability analysis of the classical risk model

[...]

A. Touazi¹, Zina Benouaret¹, Djamil Aïssani¹, Smail Adjabi¹•Institutions (1)

University of Béjaïa¹

01 May 2017-Insurance Mathematics & Economics

TL;DR: In this article, an extension of the strong stability analysis in risk models using nonparametric kernel density estimation for the claim amounts is presented. Butt et al. proposed different kernel estimators for the density of claim amounts in the real model, and a simulation study is performed to numerically compare between the approximation errors obtained using the different proposed kernel densities.

...read moreread less

Abstract: This paper presents an extension of the strong stability analysis in risk models using nonparametric kernel density estimation for the claim amounts. First, we detail the application of the strong stability method in risk models realized by V. Kalashnikov in 2000. In particular, we investigate the conditions and the approximation error of the real model, in which the probability distribution of the claim amounts is not known, by the classical risk model with exponentially distributed claim sizes. Using the nonparametric approach, we propose different kernel estimators for the density of claim amounts in the real model. A simulation study is performed to numerically compare between the approximation errors (stability bounds) obtained using the different proposed kernel densities.

...read moreread less

Proceedings Article•10.1109/SMACD.2017.7981609•

An accurate yield estimation approach for multivariate non-normal data in semiconductor quality analysis

[...]

Ingrid Kovacs¹, Marina Topa¹, Andi Buzo², Georg Pelz²•Institutions (2)

Technical University of Cluj-Napoca¹, Infineon Technologies²

1 Jun 2017

TL;DR: A multivariate distribution fitting methodology is introduced, which, combined with multivariate random data sampling provides a global yield estimation approach and the estimation variance of the proposed method is two times smaller.

...read moreread less

Abstract: The standard multivariate metrics for semiconductor product yield estimation and prediction in production processes usually assume that the parameters contributing to the yield are all normally distributed. However, the data met in production processes is not always multivariate normal. A variety of methods has been developed for multivariate non-normal data, but these usually rely on no statistical information, address only a specific type of multivariate distributions, or become very time consuming from the point of view of the computational cost. Moreover, the sample size of the multivariate data is often insufficient, as only a limited number of measurements are affordable. This results in inaccurate product yield estimation and high variance of the estimates. In this paper, a multivariate distribution fitting methodology is introduced, which, combined with multivariate random data sampling provides a global yield estimation approach. Compared with the simple failure counts method the estimation variance of the proposed method is two times smaller.

...read moreread less

Journal Article•10.1080/03610926.2015.1044671•

Wavelet estimation for derivative of a density in a GARCH-type model

[...]

B.L.S. Prakasa Rao

04 Mar 2017-Communications in Statistics-theory and Methods

TL;DR: In this paper, the authors considered the GARCH-type model S = σ2Z where σ 2 and Z are independent random variables, and they constructed adaptive and non-adaptive wavelet estimators for the derivative of the density and obtained sharp upper bounds on their mean integrated squared errors.

...read moreread less

Abstract: We consider the GARCH-type model S = σ2Z where σ2 and Z are independent random variables. We assume that the density of σ2 is unknown with support [0, 1] but differentiable whereas the density fS of S is bounded. We will also assume that the probability density function of the random variable Z is known and has the same distribution as the ν-fold product of independent random variables uniformly distributed on the interval [0, 1]. We want to estimate the derivative of the density of σ2 from n independent and identically distributed observations of S. We will construct adaptive and non adaptive wavelet estimators for the derivative of the density and obtain sharp upper bounds on their mean integrated squared errors.

...read moreread less

Journal Article•10.1007/S11018-017-1228-X•

Analysis of Optimization Methods for Nonparametric Estimation of the Probability Density with Respect to the Blur Factor of Kernel Functions

[...]

A. V. Lapko¹, A. V. Lapko², V. A. Lapko¹, V. A. Lapko²•Institutions (2)

Siberian Federal University¹, Russian Academy of Sciences²

01 Sep 2017-Measurement Techniques

TL;DR: In this paper, the results of a comparison of the most common optimization methods for the nonparametric estimation of the probability density of Rosenblatt-parzen kernel functions are presented.

...read moreread less

Abstract: The results of a comparison of the most common optimization methods for the nonparametric estimation of the probability density of Rosenblatt–Parzen are presented. To select the optimal values of the blur coefficients of kernel functions, minimum conditions for the standard deviation of the nonparametric estimate of the probability density and the maximum of the likelihood function are used.

...read moreread less

Journal Article•10.1109/TIM.2017.2657398•

Nonparametric Probability Density Estimation via Interpolation Filtering

[...]

Paolo Carbone¹, Dario Petri², Kurt Barbé³•Institutions (3)

University of Perugia¹, University of Trento², Vrije Universiteit Brussel³

01 Apr 2017-IEEE Transactions on Instrumentation and Measurement

TL;DR: By considering histogram data as a numerical sequence, a simple approach for PDF estimation is presented, and it is shown that the proposed approach is as accurate as kernel-based estimators, widely adopted in the statistical literature.

...read moreread less

Abstract: In this paper, we discuss nonparametric estimation of the probability density function (PDF) of a univariate random variable. This problem has been the subject of a vast amount of scientific literature in many domains, while statisticians are mainly interested in the analysis of the properties of proposed estimators, and engineers treat the histogram as a ready-to-use tool for a data set analysis. By considering histogram data as a numerical sequence, a simple approach for PDF estimation is presented in this paper. It is based on basic notions related to the reconstruction of a continuous-time signal from a sequence of samples. When estimating continuous PDFs, it is shown that the proposed approach is as accurate as kernel-based estimators, widely adopted in the statistical literature. Conversely, it can provide better accuracy when the PDF to be estimated exhibits a discontinuous behavior. The main statistical properties of the proposed estimators are derived and then verified by simulations related to the common cases of normal and uniform density functions. The obtained results are also used to derive optimal, i.e., minimum integral of the mean square error, estimators.

...read moreread less

Posted Content•

Adaptive Clustering Using Kernel Density Estimators

[...]

Ingo Steinwart¹, Bharath K. Sriperumbudur², Philipp Thomann•Institutions (2)

University of Stuttgart¹, Pennsylvania State University²

17 Aug 2017-arXiv: Machine Learning

TL;DR: A generic, recursive algorithm for estimating all splits in a finite cluster tree as well as the corresponding clusters is derived and an adaptive data-driven strategy for choosing the kernel bandwidth is analyzed.

...read moreread less

Abstract: We derive and analyze a generic, recursive algorithm for estimating all splits in a finite cluster tree as well as the corresponding clusters. We further investigate statistical properties of this generic clustering algorithm when it receives level set estimates from a kernel density estimator. In particular, we derive finite sample guarantees, consistency, rates of convergence, and an adaptive data-driven strategy for choosing the kernel bandwidth. For these results we do not need continuity assumptions on the density such as Holder continuity, but only require intuitive geometric assumptions of non-parametric nature.

...read moreread less