Top 39 Computational Statistics papers published in 2000

TL;DR: This paper proposes some extensions of the PCA with the aim of representing, in a space of reduced dimensions, images of hypercubes, pointing out differences and similarities according to their structural features.

...read moreread less

Abstract: The present paper deals with the study of continuous interval data by means of suitable Principal Component Analyses (PCA). Statistical units described by interval data can be assumed as special cases of Symbolic Objects (SO) (Diday, 1987). In Symbolic Data Analysis (SDA), these data are represented as hypercubes. In the present paper, we propose some extensions of the PCA with the aim of representing, in a space of reduced dimensions, images of such hypercubes, pointing out differences and similarities according to their structural features.

...read moreread less

127 citations

Journal Article•10.1007/S001800000047•

A Comparison of Regression Spline Smoothing Procedures

[...]

Matt P. Wand¹•Institutions (1)

Harvard University¹

University of Tennessee at Chattanooga¹, Indiana University – Purdue University Indianapolis²

TL;DR: In this paper, the authors restrict attention to the univariate smoothing setting with Gaussian noise and the truncated polynomial regression spline basis, and compare them through a simulation study.

...read moreread less

Abstract: Regression spline smoothing involves modelling a regression function as a piecewise polynomial with a high number of pieces relative to the sample size. Because the number of possible models is so large, efficient strategies for choosing among them are required. In this paper we review approaches to this problem and compare them through a simulation study. For simplicity and conciseness we restrict attention to the univariate smoothing setting with Gaussian noise and the truncated polynomial regression spline basis.

...read moreread less

109 citations

Journal Article•10.1007/PL00022717•

A Multivariate and Asymmetric Generalization of Laplace Distribution

[...]

Tomasz J. Kozubowski¹, Krzysztof Podgórski²•Institutions (2)

TL;DR: In this article, it was shown that the class of limiting distributions of such random sums, as the number of terms converges to infinity, consists of multivariate asymmetric distributions that are natural generalizations of univariate Laplace laws.

...read moreread less

Abstract: Consider a sum of independent and identically distributed random vectors with finite second moments, where the number of terms has a geometric distribution independent of the summands. We show that the class of limiting distributions of such random sums, as the number of terms converges to infinity, consists of multivariate asymmetric distributions that are natural generalizations of univariate Laplace laws. We call these limits multivariate asymmetric Laplace laws. We give an explicit form of their multidimensional densities and show representations that effectively facilitate computer simulation of variates from this class. We also discuss the relation to other formerly considered classes of distributions containing Laplace laws.

...read moreread less

70 citations

Journal Article•10.1007/S001800000034•

COMPSTAT 1998 — Proceedings in Computational Statistics

[...]

Martin Theus

61 citations

Journal Article•10.1007/S001800000038•

Posterior predictive checks: Principles and discussion

[...]

Johannes Berkhof¹, Iven Van Mechelen¹, Herbert Hoijtink²•Institutions (2)

Katholieke Universiteit Leuven¹, Utrecht University²

01 Jan 2000-Computational Statistics

TL;DR: This paper gives a description of posterior predictive checking (introduced by Rubin, 1984) for detecting departures between the data and the posited model and illustrates how the posterior predictive check can be used in practice.

...read moreread less

Abstract: In this paper, we give a description of posterior predictive checking (introduced by Rubin, 1984) for detecting departures between the data and the posited model and illustrate how the posterior predictive check can be used in practice. We further discuss interpretability, frequency properties, and prior sensitivity of the posterior predictive p-value.

...read moreread less

49 citations

Journal Article•10.1007/S001800000035•

Statistical Analysis of Extreme Values — from Insurance, Finance, Hydrology and Other Fields

[...]

Hans-Peter Bäumer¹•Institutions (1)

University of Oldenburg¹

Royal Holloway, University of London¹

41 citations

Journal Article•10.1007/S001800050034•

Support vector machine learning algorithm and transduction

[...]

A. Gammermann¹•Institutions (1)

TL;DR: A recently developed method to transform the original input vectors into high-dimensional space, and then construct a linear regression function or hyperplane in that space by applying the kernel technique is reviewed.

...read moreread less

Abstract: The paper first reviews a recently developed method called the Support Vector Machine. The main feature of the method is to transform the original input vectors into high-dimensional space, and then construct a linear regression function or hyperplane in that space. The transformation is usually done by applying the kernel technique. The paper then shows that the same kernel technique can be applied to classical algorithms such as Ridge Regression. In conclusion, we present a new transductive learning algorithm that also allows us to compute confidence levels.

...read moreread less

38 citations

Journal Article•10.1007/S001800050037•

Regression-based nearest neighbour hot decking

[...]

Seppo Laaksonen¹•Institutions (1)

Statistics Finland¹

TL;DR: In this article, the authors developed the imputation method which takes advantage both of a multivariate regression model and a nearest neighbor hot decking method, which is successfully applied to such complex cases where the variable being imputed is of a ratio-scale type and consists of a high number of unknown zero values.

...read moreread less

Abstract: The paper develops the imputation method which takes advantage both of a multivariate regression model and a nearest neighbour hot decking method. This method is successfully applied to such complex cases where the variable being imputed is of a ratio-scale type and consists of a high number of unknown zero values. The results obtained by means of the method are compared with the two other techniques, (i) random hot decking and (ii) two-step model based method. The latter one first takes advantage of logistic regression and then of standard regression imputation. Our results do not give the only one conclusion. On average, regression based nearest neighbour hot decking is the best, but the two-step model based method also has some advantages. The paper cannot deal with other important questions, but we want to emphasise the importance of variance estimation: it leads to an additional variance component called imputation variance. The paper also discusses a diagnostic test for the quality of imputations; this test checks how many times the same donor is used in imputing missing values.

...read moreread less

29 citations

Journal Article•10.1007/S001800000029•

A Wiener Germ approximation of the noncentral chi square distribution and of its quantiles

[...]

Spiridon Penev¹, Tenko Raykov²•Institutions (2)

University of New South Wales¹, Fordham University²

University of Washington¹

TL;DR: This work is proposing an approximation algorithm that has a very solid theoretical background and is surprisingly accurate for extremely large set of arguments/parameter values and is also applied for a reliable approximation of the quantiles of the distribution for large values of noncentrality and degrees of freedom.

...read moreread less

Abstract: The cumulative distribution function (cdf) of the noncentral χ2 distribution with positive degrees of freedom ν > 0 and a noncentrality parameter δ2 ≥ 0 is usually expressed as an infinite weighted sum of central χ2 cdf’s. For the purpose of numerical evaluation this infinite sum is being approximated by a finite sum. For large values of the noncentrality parameter, the sum converges slowly. Alternative approximation algorithms have been proposed instead in the literature. A comparison of these is given in Johnson & Kotz (1970). Most of the approximation algorithms have advantages for certain values of the arguments/parameters and perform poorly for other values. We are proposing an approximation algorithm that has a very solid theoretical background and is surprisingly accurate for extremely large set of arguments/parameter values. It is also applied for a reliable approximation of the quantiles of the distribution for large values of noncentrality and degrees of freedom. Although being asymptotic in spirit (with respect to degrees of freedom ν), the algorithm gives quite accurate approximation even down to ν = 1.

...read moreread less

Journal Article•10.1007/S001800000033•

Book Review¶“Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations” by Adrian W. Bowman and Adelchi Azzalini

[...]

A. J. Rossini¹•Institutions (1)

Case Western Reserve University¹

Journal Article•10.1007/S001800000030•

A nonlinear Gauss-Seidel algorithm for inference about GLMM

[...]

Jiming Jiang¹•Institutions (1)

TL;DR: In this article, a nonlinear Gauss-Seidel type algorithm is proposed for computing the maximum posterior estimates of the random effects in a generalized linear mixed model, and the algorithm converges in virtually all typical situations of GMLM.

...read moreread less

Abstract: A nonlinear Gauss-Seidel type algorithm is proposed for computing the maximum posterior estimates of the random effects in a generalized linear mixed model. We show that the algorithm converges in virtually all typical situations of generalized linear mixed models. A numerical example shows the superiority of the proposed algorithm over the standard Newton-Raphson procedure when the number of random effects is large.

...read moreread less

Book Chapter•10.1007/978-3-642-72253-0_57•

Symbolic Kernel Discriminant Analysis

[...]

Jean-Paul Rasson, Sandrine Lissoir¹•Institutions (1)

Université de Namur¹

Australian National University¹

TL;DR: The aim of this paper is the adaptation of the classical Bayesian discrimination rule to the Symbolic Objects problematic by the a priori probabilities’ estimation and by a kernel density estimation.

...read moreread less

Abstract: Current technological progress in Hardware, Data Bases and Object Oriented languages implies the manipulation, stock and representation of objects with more and more complex data. The notion of Symbolic Objects is introduced on the base of Diday’s work and the necessity to be adapted to this notion appears for most recent classification methods. The aim of this paper is the adaptation of the classical Bayesian discrimination rule to the Symbolic Objects problematic. This will be performed by the a priori probabilities’ estimation and by a kernel density estimation.

...read moreread less

Journal Article•10.1007/S001800000037•

Posterior inference in the random intercept model based on samples obtained with Markov chain Monte Carlo methods

[...]

Herbert Hoijtink¹•Institutions (1)

Utrecht University¹

01 Sep 2000-Computational Statistics

TL;DR: In this article, the augmented Gibbs sampler (a special case of MCMC), illustrated using the random intercept model, is used to give an illustration of the power of Markov Chain Monte Carlo (MCMC).

...read moreread less

Abstract: Many papers (including most of the papers in this issue of Computational Statistics) deal with Markov Chain Monte Carlo (MCMC) methods. This paper will give an introduction to the augmented Gibbs sampler (a special case of MCMC), illustrated using the random intercept model. A’ nonstandard’ application of the augmented Gibbs sampler will be discussed to give an illustration of the power of MCMC methods. Furthermore, it will be illustrated that the posterior sample resulting from an application of MCMC can be used for more than determination of convergence and the computation of simple estimators like the a posteriori expectation and standard deviation. Posterior samples give access to many other inferential possibilities. Using a simulation study, the frequency properties of some of these possibilities will be evaluated.

...read moreread less

Journal Article•10.1007/PL00022715•

Parallel MARS Algorithm Based on B-splines

[...]

Sergey Bakin¹, Markus Hegland¹, Michael R. Osborne¹•Institutions (1)

Complutense University of Madrid¹

TL;DR: In this paper, the authors proposed the use of B-splines instead of truncated power basis functions for flexible modeling of high-dimensional data, which allows to generate models competitive with those of the original MARS.

...read moreread less

Abstract: We investigate one of the possible ways for improving Friedman’s Multivariate Adaptive Regression Splines (MARS) algorithm designed for flexible modelling of high-dimensional data. In our version of MARS called BMARS we use B-splines instead of truncated power basis functions. The fact that B-splines have compact support allows us to introduce the notion of a “scale” of a basis function. The algorithm starts building up models by using large-scale basis functions and switches over to a smaller scale after the fitting ability of the large scale splines has been exhausted. The process is repeated until the prespecified number of basis functions has been produced. In addition, we discuss a parallelisation of BMARS as well as an application of the algorithm to processing of a large commercial data set. The results demonstrate the computational efficiency of our algorithm and its ability to generate models competitive with those of the original MARS.

...read moreread less

Journal Article•10.1007/S001800050032•

Symbolic object description of strata by segmentation trees

[...]

M. Carmen Bravo¹, José M. García-Santesmases¹•Institutions (1)

Centers for Disease Control and Prevention¹, University of Georgia²

TL;DR: A method to obtain simple descriptions of strata with a common rule is presented in a generalised recursive tree-building algorithm for populations partitioned into strata, extended to individuals described by probabilistic symbolic objects.

...read moreread less

Abstract: Based in a generalised recursive tree-building algorithm for populations partitioned into strata a method to obtain simple descriptions of strata is presented. Also strata with a common rule are obtained. Common predictors and criterion variable describe population in all strata or classes of individuals. Algorithm considers strata structure in tree-building algorithm and combines in each step maximisation of an information content measure for the criterion variable in a new binary partition of the population and selection of decisional nodes, based in quality of prediction for subsets of strata. Each decisional tree node is composed of a set of strata and a rule for individuals in these strata that will jointly explain the criterion variable. Symbolic data analysis fits the method. Input of the algorithm is composed of classes of individuals. Algorithm is extended to individuals described by probabilistic symbolic objects. As output, symbolic objects describe tree, decisional nodes and strata.

...read moreread less

Journal Article•10.1007/S001800000031•

The S-U algorithm for missing data problems

[...]

Glen A. Satten¹, Somnath Datta²•Institutions (2)

TL;DR: In this paper, a Monte-Carlo method for finding the solution of an estimating equation that can be expressed as the expected value of a "full data" estimating equation in which the expected values is with respect to the distribution of the missing data given the observed data was presented.

...read moreread less

Abstract: We present a new Monte-Carlo method for finding the solution of an estimating equation that can be expressed as the expected value of a ‘full data’ estimating equation in which the expected value is with respect to the distribution of the missing data given the observed data. Equations such as these arise whenever the E-M algorithm can be used. The algorithm alternates between two steps: an S-step, in which the missing data are simulated, either from the conditional distribution described above or from a more convenient importance sampling distribution, and a U-step, in which parameters are updated using a closed-form expression that does not require a numerical maximization. We present two numerical examples to illustrate the method. Theoretical results are obtained establishing consistency and asymptotic normality of the approximate solution obtained by our method.

...read moreread less

Journal Article•10.1007/S001800050044•

Visions: New techniques and technologies in statistics

[...]

Edward J. Wegman¹•Institutions (1)

George Mason University¹

TL;DR: A futurist vision for data analysts is painted and some of the tools and techniques likely to be used, with respect to data mining, visualization and quantization methods, are discussed.

...read moreread less

Abstract: This paper attempts to paint a futurist vision for data analysts and in doing so discusses some of the tools and techniques likely to be used. A major premise of this vision is that mathematical statistics like classical mechanics is essentially a completed discipline. Moreover, that changes in the nature, modes of collection, and scale of data imply new tools and techniques are inevitable. Complexity of algorithms and data structures imply an increased focus on algorithmic efficiency and, to some extent, more automated procedures. Suggestions for advancement in theory are made with respect to data mining, visualization and quantization methods. Suggestions are also made on likely architectures for digital text and data libraries, for modes of accessing distributed databases, and for the implications on collaboration.

...read moreread less

Journal Article•10.1007/S001800000039•

Bayesian probabilistic extensions of a deterministic classification model

[...]

Iwin Leenen¹, Iven Van Mechelen¹, Andrew Gelman²•Institutions (2)

Katholieke Universiteit Leuven¹, Columbia University²

01 Jan 2000-Computational Statistics

TL;DR: This paper extends deterministic models for Boolean regression within a Bayesian framework to include a proper account of the uncertainty in the model estimates and various possibilities for model checking (using posterior predictive checks).

...read moreread less

Abstract: This paper extends deterministic models for Boolean regression within a Bayesian framework. For a given binary criterion variable Y and a set of k binary predictor variables X1,…, Xk, a Boolean regression model is a conjunctive (or disjunctive) logical combination consisting of a subset S of the X variables, which predicts Y. Formally, Boolean regression models include a specification of a k-dimensional binary indicator vector (θ1,…,θk) with θj = 1 iff Xj ∈ S. In a probabilistic extension, a parameter π is added which represents the probability of the predicted value ${\hat y_i}$ and the observed value yi differing (for any observation i). Within a Bayesian framework, a posterior distribution of the parameters (θ1,…, θk, π) is looked for. The advantages of such a Bayesian approach include a proper account of the uncertainty in the model estimates and various possibilities for model checking (using posterior predictive checks). We illustrate this method with an example using real data.

...read moreread less

Journal Article•10.1007/S001800050031•

Outliers — finding and classifying which genuine and which spurious

[...]

Anna Bartkowiak¹, Adam Szustalewicz¹•Institutions (1)

University of Wrocław¹

TL;DR: The grand tour method implemented in a dynamic graphics environment and endowed with dynamically changing concentration ellipses and count plots is recalled and a sort of classification of the found outliers is carried out by performing cluster analysis based on angular similarities of the suspected outliers.

...read moreread less

Abstract: The paper presents our experience with identifying and verifying outlying data points. Firstly we recall the grand tour method implemented in a dynamic graphics environment and endowed with dynamically changing concentration ellipses and count plots — as proposed by Bartkowiak & Szustalewicz (1997). The method permits to select and identify some data points as suspected outliers. Next we propose to carry out a sort of classification of the found outliers by performing cluster analysis based on angular similarities of the suspected outliers. The procedure returns bundles of data vectors similar with respect to their outlyingness. The considerations are illustrated with the Milk container data, analyzed formerly, a.o. by Atkinson (1994) and Muruzabal and Munoz (1997).

...read moreread less

Journal Article•10.1007/S001800000028•

Exchangeable stable random vectors and their simulations

[...]

Adel Mohammadpour¹, A.Reza Soltani¹•Institutions (1)

Shiraz University¹

Erik Bergkvist, Per Johansson

TL;DR: A characterization for exchangeability of a stable random vector, in terms of its spectral measure, is given and forTRAN subroutines to simulate a desirable exchangeable stable random vectors and to create an exchangeable partition are written.

...read moreread less

Abstract: This work concerns the simulation of an exchangeable stable random vector. A characterization for exchangeability of a stable random vector, in terms of its spectral measure, is given. The Modarres and Nolan’s simulating method on stable random vectors is modified to the exchangeable case. FORTRAN subroutines to simulate a desirable exchangeable stable random vector and to create an exchangeable partition are written.

...read moreread less

Journal Article•10.1007/PL00022716•

Weighted Derivative Estimation of Quantal Response Models: Simulations and Applications to Choice of Truck Freight Carrier

[...]

National and Kapodistrian University of Athens¹

TL;DR: In this paper, the weighted average density derivative (WAD) estimator is used to estimate regression parameters up to scale under the assumption of a single-index model and the small sample performance of ratio estimators is studied.

...read moreread less

Abstract: Under the assumption of a single-index model the weighted average density derivative (WAD) estimator, estimates regression parameters up to scale. The small sample performance of ratio estimators are studied. For spherical errors in a latent variable specification the WAD estimator, in terms of bias and mean square error (MSE), demonstrates performance similar to the logit maximum likelihood estimator. Under heteroskedastic errors the WAD estimator performs better. In an empirical application concerning choices of freight transports we find that the WAD estimator evidences improved performance in one of the two sectors studied compared with standard parametric models.

...read moreread less

Journal Article•10.1007/S001800050039•

Recent advances on metadata

[...]

H. Papageorgiou¹, Maria Vardaki¹, Fragkiskos Pentaris¹•Institutions (1)

TL;DR: Some of the latest results of research in the area of metadata are summarized, including modelling of metainformation using templates and object oriented models within the context of a metadata database is initially compared to the traditional way of using simple verbal footnotes.

...read moreread less

Abstract: This paper aims in summarising some of the latest results of research in the area of metadata. The modelling of metainformation using templates and object oriented models within the context of a metadata database is initially compared to the traditional way of using simple verbal footnotes. The possibility of further automating the procedures inside National Statistical Offices using metadata guided statistical processing is discussed and certain related aspects of man-machine interfaces are considered. Finally, selected topics concerning the quality of metadata and their integration inside large statistical information systems are examined.

...read moreread less

Journal Article•10.1007/S001800000027•

Wavelet-based random densities

[...]

David Ríos Insua¹, Brani Vidakovic²•Institutions (2)

King Juan Carlos University¹, Duke University²

TL;DR: Theoretical properties of wavelet based random densities subject to some standard constraints: smoothness, symmetry, unimodality, and skewness are described.

...read moreread less

Abstract: In this paper we describe the theoretical properties of wavelet based random densities and present algorithms for their generation. We exhibit random densities subject to some standard constraints: smoothness, symmetry, unimodality, and skewness. We also provide three relevant applications of wavelet based-random densities.

...read moreread less

Journal Article•10.1007/S001800050035•

Metadata usage in the statistical production process

[...]

Wilfried Grossmann

TL;DR: A transformation model is sketched which allows the use of metadata in production activities and some conceptual issues for the architecture of metadata driven statistical processing systems are discussed.

...read moreread less

Abstract: Based on the analysis of the statistical production process inside and outside national statistical offices requirements for metadata structures accompanying statistical data are outlined. Furthermore a transformation model is sketched which allows the use of metadata in production activities and some conceptual issues for the architecture of metadata driven statistical processing systems are discussed.

...read moreread less

Journal Article•10.1007/S001800000045•

Are Regression Series Estimators Efficient in Practice? A Computational Comparison Study

[...]

Michel Delecroix, Camelia Protopopescu¹•Institutions (1)

Charité¹

TL;DR: In this paper, the authors compare the performance of series-type estimators with the results obtained by two of the most popular nonparametric regression estimation methods: kernel estimation and least-squares cubic splines.

...read moreread less

Abstract: This paper is concerned with the practical performances of series-type estimators of a regression function. For different choices of orthonormal bases (Legendre polynomials, trigonometric functions, wavelets) we compare, by simulation arguments, the performances of series-type estimators with the results obtained by two of the most popular nonparametric regression estimation methods: kernel estimation and least-squares cubic splines. It will be shown that orthonormal series estimators are competitive in relation to these former nonparametric procedures. No agreement has emerged on the best method, the results being highly dependent on the nature of the estimated regression function.

...read moreread less

Journal Article•10.1007/S001800050036•

The impact of EDI on statistical data processing

[...]

Wouter J. Keller¹, Jelke Bethlehem¹•Institutions (1)

Statistics Netherlands¹

TL;DR: In this paper, the authors argue that the change from paper questionnaires and paper publications to e-questionnaires and electronic publications makes a Business Process Redesign of National Statistical Institutes (NSI) inevitable.

...read moreread less

Abstract: In a world facing so many and such rapid technological developments, the statistical processes of National Statistical Institutes (NSI’s) can not remain unchanged. This paper argues that the change from paper questionnaires and paper publications to electronic questionnaires and electronic publications makes a Business Process Redesign of NSI’s inevitable. The traditional stove-pipe approach, where each individual survey had its own questionnaire and publications, must be replaced by a new approach focused on external sources (for data collection) and external customers (for dissemination). As a result, the internal processes will be integrated, and the focus will be on corporate databases instead of separate departmental databases. Some aspects of the new approach will be illustrated by examples of developments at Statistics Netherlands (SN).

...read moreread less

Journal Article•10.1007/S001800000048•

Estimating the Inverse Autocorrelation Function from Outlier Contaminated Data

[...]

Richard H. Glendinning¹•Institutions (1)

Defence Research Agency¹

TL;DR: In this paper, a number of commonly used estimates of the inverse autocorrelation function can be modified to deal with outlier contaminated data, and the robust analogues of the orthogonal and interpolation based techniques appear to be new, and provide an alternative to the robust autoregressive approach.

...read moreread less

Abstract: We show how a number of commonly used estimates of the inverse autocorrelation function can be modified to deal with outlier contaminated data. The robust analogues of the orthogonal and interpolation based techniques appear to be new, and provide an alternative to the robust autoregressive approach. We examine the performance of these techniques in a large scale numerical experiment. This shows significant improvements in performance in outlier contaminated data when robust techniques are used. While there was no uniformly best robust technique, our experiments support the use of the autoregressive approach to avoid catastrophic reductions in performance, and robust interpolation for short series corrupted by few outliers.

...read moreread less

Journal Article•10.1007/S001800000026•

Exact mean and mean squared error of the smoothed bootstrap mean integrated squared error estimator

[...]

Dominic S. Lee¹, Carey E. Priebe²•Institutions (2)

DSO National Laboratories¹, Johns Hopkins University²