Top 34 Statistical Methods and Applications papers published in 2013

Showing papers in "Statistical Methods and Applications in 2013"

Journal Article•10.1007/S10260-012-0220-5•

Detection of biomass change in a Norwegian mountain forest area using small footprint airborne laser scanner data

[...]

Ole Martin Bollandsås¹, Timothy G. Gregoire², Erik Næsset¹, Bernt-Håvard Øyen•Institutions (2)

Norwegian University of Life Sciences¹, Yale University²

01 Mar 2013-Statistical Methods and Applications

TL;DR: Evaluated approaches for estimation of change in biomass between two points in time by means of airborne laser scanner data indicated that the two direct approaches were better than relying on modeling biomass at both occasions and taking change as the difference between biomass estimates.

...read moreread less

Abstract: Different approaches for estimation of change in biomass between two points in time by means of airborne laser scanner data were tested. Both field and laser data were collected at two occasions on 52 sample plots in a mountain forest in southeastern Norway. In the first approach, biomass change was estimated as the difference between predicted biomass for the two measurement occasions. Joint models for the biomass at both occasions were fitted using different height and density variables from laser data as explanatory variables. The second approach modelled the observed change directly using the change in different variables extracted from the laser data as explanatory variables. In the third approach we modelled the relative change in biomass. The explanatory variables were also expressed as relative change between measurement occasions. In all approaches we allowed spline terms to be entered. We also investigated the aptness of models for which the residual variance was modeled by allowing it to be proportional to the area of the plot on which biomass was assessed. All alternative models were initially assessed by AIC. All models were also evaluated by estimating biomass change on the model development data. This evaluation indicated that the two direct approaches (approach 2 and 3) were better than relying on modeling biomass at both occasions and taking change as the difference between biomass estimates. Approach 2 seemed to be slightly better than approach 3 based on assessments of bias in the evaluation.

...read moreread less

78 citations

Journal Article•10.1007/S10260-012-0221-4•

Improved inference on capture recapture models with behavioural effects

[...]

Danilo Alunni Fegatelli¹, Luca Tardella¹•Institutions (1)

Sapienza University of Rome¹

01 Mar 2013-Statistical Methods and Applications

TL;DR: It is pointed out that a fully Bayesian analysis overcomes the likelihood failure phenomenon and the overall improved performance of alternative Bayesian estimators is investigated under different non-informative prior distributions verifying their comparative merits.

...read moreread less

Abstract: In the context of capture-recapture modeling for estimating the unknown size of a finite population it is often required a flexible framework for dealing with a behavioural response to trapping. Many alternative settings have been proposed in the literature to account for the variation of capture probability at each occasion depending on the previous capture history. Inference is typically carried out relying on the so-called conditional likelihood approach. We highlight that such approach may, with positive probability, lead to inferential pathologies such as unbounded estimates for the finite size of the population. The occurrence of such likelihood failures is characterized within a very general class of behavioural effect models. It is also pointed out that a fully Bayesian analysis overcomes the likelihood failure phenomenon. The overall improved performance of alternative Bayesian estimators is investigated under different non-informative prior distributions verifying their comparative merits with both simulated and real data.

...read moreread less

20 citations

Journal Article•10.1007/S10260-012-0219-Y•

On the parameters of Zenga distribution

[...]

Alberto Arcagni¹, Francesco Porro¹•Institutions (1)

University of Milan¹

01 Aug 2013-Statistical Methods and Applications

TL;DR: A summary of Zenga's main properties is proposed, followed by a focus on the interpretation of the parameters in terms of inequality, and analytical solution of method of moments is obtained.

...read moreread less

Abstract: In 2010 Zenga introduced a new three-parameter model for distributions by size that can be used to represent income, wealth, financial and actuarial variables. This paper proposes a summary of its main properties, followed by a focus on the interpretation of the parameters in terms of inequality. The scale parameter μ is equal to the expectation, and it does not affect the inequality, while the two shape parameters α and θ are inverse and direct inequality indicators respectively. This result is obtained through stochastic orders based on inequality curves. A procedure to generate a random sample from Zenga distribution is also proposed. The second part of this article looks at the parameter estimation. Analytical solution of method of moments is obtained. This result is used as a starting point of numerical procedures to obtain maximum likelihood estimates both on ungrouped and grouped data. In the application, three empirical income distributions are considered and the aforementioned estimates are evaluated. A comparison with other well-known models is provided, by the evaluation of three goodness-of-fit indexes.

...read moreread less

17 citations

Journal Article•10.1007/S10260-013-0240-9•

Discussion of “Model-based clustering with non-normal mixture distributions” by S. X. Lee and G. J. McLachlan

[...]

Christian Hennig¹•Institutions (1)

University College London¹

20 Sep 2013-Statistical Methods and Applications

TL;DR: This paper aims to inspire the increasing use of mixtures of non-normal distributions in model-based clustering, and the presented material looks very good to me.

...read moreread less

Abstract: I’d first like to thank Lee and McLachlan for this very useful overview. Being a user of model-based clustering myself, the limitations of the normal distribution are all too obvious to me, so it is very welcome that the authors with their paper will inspire the increasing use of mixtures of non-normal distributions. The presented material looks very good to me, so critical comments will focus on what may be or should have been added.

...read moreread less

12 citations

Journal Article•10.1007/S10260-012-0215-2•

Multilevel dimensionality-reduction methods

[...]

Pietro Giorgio Lovaglio, Giorgio Vittadini

01 Jun 2013-Statistical Methods and Applications

TL;DR: This paper proposes the multilevel version of the multivariate regression model and dimensionality-reduction methods (used to predict responses with fewer linear composites of explanatory variables) and a case study of an application focusing on the relationships between mental health severity and the intensity of care in the Lombardy region mental health system.

...read moreread less

Abstract: When data sets are multilevel (group nesting or repeated measures), different sources of variations must be identified. In the framework of unsupervised analyses, multilevel simultaneous component analysis (MSCA) has recently been proposed as the most satisfactory option for analyzing multilevel data. MSCA estimates submodels for the different levels in data and thereby separates the “within”-subject and “between”-subject variations in the variables. Following the principles of MSCA and the strategy of decomposing the available data matrix into orthogonal blocks, and taking into account the between- and the within data structures, we generalize, in a multilevel perspective, multivariate models in which a matrix of response variables can be used to guide the projections (formed by responses predicted by explanatory variables or by a limited number of their combinations/composites) into choices of meaningful directions. To this end, the current paper proposes the multilevel version of the multivariate regression model and dimensionality-reduction methods (used to predict responses with fewer linear composites of explanatory variables). The principle findings of the study are that the minimization of the loss functions related to multivariate regression, principal-component regression, reduced-rank regression, and canonical-correlation regression are equivalent to the separate minimization of the sum of two separate loss functions corresponding to the between and within structures, under some constraints. The paper closes with a case study of an application focusing on the relationships between mental health severity and the intensity of care in the Lombardy region mental health system.

...read moreread less

11 citations

Journal Article•10.1007/S10260-012-0211-6•

Variance predictors for isotropic geometric sampling, with applications in forestry

[...]

Luis M. Cruz-Orive¹•Institutions (1)

University of Cantabria¹

01 Mar 2013-Statistical Methods and Applications

TL;DR: A coherent set of explicit approximations is presented for the variance of planar area and volume estimators obtained under systematic geometric sampling, based on G. Matheron's transitive theory.

...read moreread less

Abstract: A coherent set of explicit approximations is presented for the variance of planar area and volume estimators obtained under systematic geometric sampling. For planar objects (e.g. a land plot, or a tissue section), sampling is considered with test systems of points, lines, segments, stripes, or quadrats. For three dimensional objects analogous probes are considered. For the formulae to apply the design has to be uniform random (which suffices to estimate planar area or volume only) and also isotropic. The formulae are based on G. Matheron’s transitive theory. A synthetic example on the estimation of canopy cover is explained in detail.

...read moreread less

10 citations

Journal Article•10.1007/S10260-012-0222-3•

Adaptive web sampling in ecology

[...]

Steven K. Thompson¹•Institutions (1)

Simon Fraser University¹

01 Mar 2013-Statistical Methods and Applications

TL;DR: Adaptive sampling strategies for ecological and environmental studies are described and design-based and model-based approaches to inference with adaptive sampling strategies are summarized.

...read moreread less

Abstract: Adaptive sampling strategies for ecological and environmental studies are described in this paper. The motivations for adaptive sampling are discussed. Developments in this area over recent decades are reviewed. Adaptive cluster sampling and a number of its variations are described. The newer class of adaptive web sampling designs and their spatial sampling uses are discussed. Case studies in the use of adaptive sampling strategies with ecological populations are cited. The nature of optimal sampling strategies is described. Design-based and model-based approaches to inference with adaptive sampling strategies are summarized.

...read moreread less

9 citations

Journal Article•10.1007/S10260-012-0224-1•

Estimating common standard deviation of two normal populations with ordered means

[...]

Manas Ranjan Tripathy¹, Somesh Kumar², Nabendu Pal³•Institutions (3)

National Institute of Technology, Rourkela¹, Indian Institutes of Technology², University of Louisiana at Lafayette³

01 Aug 2013-Statistical Methods and Applications

TL;DR: A general minimaxity result is proved and a class of minimax estimators is derived and an admissibility result is proving in this class.

...read moreread less

Abstract: Independent random samples are taken from two normal populations with means $\mu _1$ and $\mu _2$ and a common unknown variance $\sigma ^2.$ It is known that $\mu _1\le \mu _2.$ In this paper, estimation of the common standard deviation $\sigma $ is considered with respect to a scale invariant loss function. A general minimaxity result is proved and a class of minimax estimators is derived. An admissibility result is proved in this class. Further a class of equivariant estimators with respect to a subgroup of affine group is considered and dominating estimators in this class are obtained. The risk performance of some of these estimators is compared numerically.

...read moreread less

8 citations

Journal Article•10.1007/S10260-012-0213-4•

Quantile based stop-loss transform and its applications

[...]

N. Unnikrishnan Nair¹, Paduthol Godan Sankaran¹, S. M. Sunoj¹•Institutions (1)

Cochin University of Science and Technology¹

01 Jun 2013-Statistical Methods and Applications

TL;DR: Relationships of the scaled stop-loss transform curve with the Lorenz, Gini, Bonferroni and Leinkuhler curves are developed and distributional and geometric properties of the first and second order partial moments defined in terms of quantile function are discussed.

...read moreread less

Abstract: Partial moments are extensively used in actuarial science for the analysis of risks. Since the first order partial moments provide the expected loss in a stop-loss treaty with infinite cover as a function of priority, it is referred as the stop-loss transform. In the present work, we discuss distributional and geometric properties of the first and second order partial moments defined in terms of quantile function. Relationships of the scaled stop-loss transform curve with the Lorenz, Gini, Bonferroni and Leinkuhler curves are developed.

...read moreread less

7 citations

Journal Article•10.1007/S10260-012-0210-7•

Nonparametric Phase-II monitoring for detecting monotone trend based on inverse sampling

[...]

Amitava Mukherjee¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Jun 2013-Statistical Methods and Applications

TL;DR: Two nonparametric tests for the identity of some unknown univariate continuous distribution functions against monotone or unidirectional trend in location are developed.

...read moreread less

Abstract: Recently, Mukherjee and Bandyopadhyay (J Stat Plan Inference, 2011, doi: 10.1016/j.jspi.2011.02.017 ) introduced some partially sequential tests for detecting liner trend among the incoming series of observations when a training sample is available a-priori. Their work is very useful in econometric or environmental monitoring under certain situations. The present work is intended for generalization of their tests for any monotone trend. We develop two nonparametric tests for the identity of some unknown univariate continuous distribution functions against monotone or unidirectional trend in location. One of these two tests is based on usual ranks and the other is based on sequential ranks. These are typical nonparametric tests for monitoring structural changes. Performance of the two tests are compared using asymptotic studies as well as through some numerical results based on Monte-Carlo simulations. An illustration is offered using a real data on monthly production of certain beverage.

...read moreread less

7 citations

Journal Article•10.1007/S10260-013-0249-0•

Rejoinder to the discussion of “Model-based clustering and classification with non-normal mixture distributions”

[...]

Sharon X. Lee¹, Geoffrey J. McLachlan¹•Institutions (1)

University of Queensland¹

22 Oct 2013-Statistical Methods and Applications

TL;DR: Responding in the sequel to the four contributions of the discussants (in alphabetical order), the authors shall refer to Christian Hennig as CH, Luis Angel Garcia-Escudero, Alfonso Gordaliza, and Augustin Mayo-Iscar as EGM, Giuliano Galimberti and Angela Montanari as GM, and Paul McNicholas, Ryan Browne, and Paula Murray as MBM.

...read moreread less

Abstract: We thank the discussants for their thoughtful comments on our paper and the raising of issues for further examination. As it is not possible to respond to all comments here, our intent is to cover at least the main scientific points made. In responding in the sequel to the four contributions of the discussants (in alphabetical order), we shall refer to Christian Hennig as CH, Luis Angel Garcia-Escudero, Alfonso Gordaliza, and Augustin Mayo-Iscar as EGM, Giuliano Galimberti and Angela Montanari as GM, and Paul McNicholas, Ryan Browne, and Paula Murray as MBM. Our response to the latter contribution is somewhat longer than to each of the others since a number of questions are asked of us. We are keen to answer them as it gives us the opportunity to clarify further the underlying assumptions and hence the consequent limitations of the various proposals. Starting with the discussion of CH, a number of pertinent points are made about clustering. The notion of a cluster is a vague concept in a general context and so we can thus understand why CH made the statement that “reconstructing the given clusters in a data set for supervised classification is not of real scientific interest, and one may wonder whether such a setup can be seen as representative for a real unsupervised clustering task.” But in the context of the model-based approach as adopted in our paper, the clustering is undertaken within a probabilistic (mixture) framework, where each data point in the sample to be clustered is assumed to come from a mixture of a stipulated number of subpopulations with distributions specified by the component distributions of the mixture model. In the case where this framework can be viewed as being a realistic approximation to how the data were generated, it seems to us it is of scientific interest to see if the data points can be assigned to their population of origin

...read moreread less

Journal Article•10.1007/S10260-012-0226-Z•

Nonparametric estimation of nonlinear dynamics by metric-based local linear approximation

[...]

Isao Shoji¹•Institutions (1)

University of Tsukuba¹

01 Jan 2013-Statistical Methods and Applications

TL;DR: Simulation studies and application to ECG signals show the proposed method is easy to manipulate and has performance comparable to or better than the first order local polynomial modeling.

...read moreread less

Abstract: This paper discusses nonparametric estimation of nonlinear dynamical system models by a method of metric-based local linear approximation. We assume no functional form of a given model but estimate it from experimental data by approximating the curve implied by the function by the tangent plane around the neighborhood of a tangent point. To specify an appropriate neighborhood, we prepare a metric defined over the Euclidean space in which the curve exists and then evaluate the closeness to the tangent point according to the distances. The proposed method differs from the first order polynomial modeling in discerning the metric and the weighting function, but the first order polynomial modeling with Gaussian kernels is shown to be a special version of the proposed method. Simulation studies and application to ECG signals show the proposed method is easy to manipulate and has performance comparable to or better than the first order local polynomial modeling.

...read moreread less

Journal Article•10.1007/S10260-012-0214-3•

Latent class models for financial data analysis: some statistical developments

[...]

Luca De Angelis¹•Institutions (1)

University of Bologna¹

01 Jun 2013-Statistical Methods and Applications

TL;DR: This work exploits the potential of latent class models for proposing an innovative framework for financial data analysis by stressing the latent nature of the most important financial variables, expected return and risk, and proposes a financial data classification consistent with the latent risk-return profile.

...read moreread less

Abstract: I exploit the potential of latent class models for proposing an innovative framework for financial data analysis. By stressing the latent nature of the most important financial variables, expected return and risk, I am able to introduce a new methodological dimension in the analysis of financial phenomena. In my proposal, (i) I provide innovative measures of expected return and risk, (ii) I suggest a financial data classification consistent with the latent risk-return profile, and (iii) I propose a set of statistical methods for detecting and testing the number of groups of the new data classification. The results lead to an improvement in both risk measurement theory and practice and, if compared to traditional methods, allow for new insights into the analysis of financial data. Finally, I illustrate the potentiality of my proposal by investigating the European stock market and detailing the steps for the appropriate choice of a financial portfolio.

...read moreread less

Journal Article•10.1007/S10260-013-0234-7•

Threshold selection for extremes under a semiparametric model

[...]

Juan Domingo Gonzalez¹, Daniela Rodriguez¹, Daniela Rodriguez², Mariela Sued¹, Mariela Sued² - Show less +1 more•Institutions (2)

University of Buenos Aires¹, National Scientific and Technical Research Council²

07 Jul 2013-Statistical Methods and Applications

TL;DR: This work proposes a semiparametric likelihood procedure for the threshold selection for extreme values, which assumes there is a threshold above which the excess distribution belongs to the generalized Pareto family.

...read moreread less

Abstract: In this work we propose a semiparametric likelihood procedure for the threshold selection for extreme values. This is achieved under a semiparametric model, which assumes there is a threshold above which the excess distribution belongs to the generalized Pareto family. The motivation of our proposal lays on a particular characterization of the threshold under the aforementioned model. A simulation study is performed to show empirically the properties of the proposal and we also compare it with other estimators.

...read moreread less

Journal Article•10.1007/S10260-012-0208-1•

Local stationarity in small area estimation models

[...]

Roberto Benedetti¹, Monica Pratesi², Nicola Salvati²•Institutions (2)

University of Chieti-Pescara¹, University of Pisa²

01 Mar 2013-Statistical Methods and Applications

TL;DR: The results are promising and show that introducing local stationarity in a small area model may lead to useful improvements in the performance of the estimators.

...read moreread less

Abstract: Small area estimators are often based on linear mixed models under the assumption that relationships among variables are stationary across the area of interest (Fay–Herriot models). This hypothesis is patently violated when the population is divided into heterogeneous latent subgroups. In this paper we propose a local Fay–Herriot model assisted by a Simulated Annealing algorithm to identify the latent subgroups of small areas. The value minimized through the Simulated Annealing algorithm is the sum of the estimated mean squared error (MSE) of the small area estimates. The technique is employed for small area estimates of erosion on agricultural land within the Rathbun Lake Watershed (IA, USA). The results are promising and show that introducing local stationarity in a small area model may lead to useful improvements in the performance of the estimators.

...read moreread less

Journal Article•10.1007/S10260-013-0233-8•

Estimating health expectancy in presence of missing data: an application using HID survey

[...]

Cristina Giudici¹, Maria Felice Arezzo¹, N. Brouard²•Institutions (2)

Sapienza University of Rome¹, Institut national d'études démographiques²

09 Jul 2013-Statistical Methods and Applications

TL;DR: In this article, the authors estimate health transition probabilities using longitudinal data collected in France for the survey on handicaps, disabilities and dependencies from 1998 to 2001, using a Markov-based multi-state life table approach with two nonabsorbing states: able to perform all activities of daily living (ADLs) and unable or in need of help to perform one or more ADLs, and the absorbing state of death.

...read moreread less

Abstract: In this article we estimate health transition probabilities using longitudinal data collected in France for the survey on handicaps, disabilities and dependencies from 1998 to 2001. Life expectancies with and without disabilities are estimated using a Markov-based multi-state life table approach with two non-absorbing states: able to perform all activities of daily living (ADLs) and unable or in need of help to perform one or more ADLs, and the absorbing state of death. The loss of follow-up between the two waves induces biases in the probabilities estimates: mortality estimates were biased upwards; also the incidence of recovery and the onset of disability seemed to be biased. Since individuals were not missing completely at random, we correct this bias by estimating health status for drop-outs using a non parametric model. After imputation, we found that at the age of 70 disability-free life expectancy decreases by 0.5 years, whereas the total life expectancy increases by 1 year. The slope of the stable prevalence increases, but it remains lower than the slope of the cross sectional prevalence. The gender differences on life expectancy did not change significantly after imputation. Globally, there is no evidence of a general reduction in ADL disability, as defined in our study. The added value of the study is the reduction of the bias induced by sample attrition.

...read moreread less

Journal Article•10.1007/S10260-012-0212-5•

Sample size determination for the confidence interval of mean comparison adjusted by multiple covariates

[...]

Xiaofeng Steven Liu¹•Institutions (1)

University of South Carolina¹

01 Jun 2013-Statistical Methods and Applications

TL;DR: The current method of determining sample size for confidence intervals does not accommodate multiple covariate adjustment, so sample size can be calculated to obtain a desired probability of achieving a predetermined width in the confidence interval of the mean comparison with multiple covariates.

...read moreread less

Abstract: The current method of determining sample size for confidence intervals does not accommodate multiple covariate adjustment. Under the normality assumption, the effect of multiple covariate adjustment on the standard error of the mean comparison is related to a Hotelling T 2 statistic. Sample size can be calculated to obtain a desired probability of achieving a predetermined width in the confidence interval of the mean comparison with multiple covariate adjustment, given that the confidence interval includes the population parameter.

...read moreread less

Journal Article•10.1007/S10260-013-0231-X•

A genealogy of Florence Nightingale, Charles Darwin, Francis Galton and Francis Ysidro Edgeworth with special reference to their Italian connections and an annexe on Beatrice Webb and Charles Booth

[...]

Richard William Farebrother

04 Jul 2013-Statistical Methods and Applications

TL;DR: It is shown that several leading natural scientists, statisticians and social scientists born between 1730 and 1930 are closely related by marriage, thereby forming what Annan has named an Intellectual Aristocracy.

...read moreread less

Abstract: In this article we show that several leading natural scientists, statisticians and social scientists born between 1730 and 1930 are closely related by marriage, thereby forming what Annan (Studies in social history: a tribute to C. M. Trevelyan, Longmans, Green, London, pp 241–287, 1955) has named an Intellectual Aristocracy. We also establish that the first three individuals mentioned in our title had family connections with Italy.

...read moreread less

Journal Article•10.1007/S10260-012-0225-0•

Predictive control of posterior robustness for sample size choice in a Bernoulli model

[...]

Fulvio De Santis¹, Maria Clara Fasciolo¹, Stefania Gubbiotti¹•Institutions (1)

Sapienza University of Rome¹

01 Aug 2013-Statistical Methods and Applications

TL;DR: The sample size determination problem in the context of robust Bayesian parameter estimation of the Bernoulli model is considered and criteria based on predictive distributions of lower bound, upper bound and range of the posterior quantity of interest are considered.

...read moreread less

Abstract: In this article we consider the sample size determination problem in the context of robust Bayesian parameter estimation of the Bernoulli model. Following a robust approach, we consider classes of conjugate Beta prior distributions for the unknown parameter. We assume that inference is robust if posterior quantities of interest (such as point estimates and limits of credible intervals) do not change too much as the prior varies in the selected classes of priors. For the sample size problem, we consider criteria based on predictive distributions of lower bound, upper bound and range of the posterior quantity of interest. The sample size is selected so that, before observing the data, one is confident to observe a small value for the posterior range and, depending on design goals, a large (small) value of the lower (upper) bound of the quantity of interest. We also discuss relationships with and comparison to non robust and non informative Bayesian methods.

...read moreread less

Journal Article•10.1007/S10260-013-0245-4•

Comments on: model-based clustering and classification with non-normal mixture distributions

[...]

Luis Angel García-Escudero¹, Alfonso Gordaliza¹, Agustín Mayo-Iscar¹•Institutions (1)

University of Valladolid¹

19 Oct 2013-Statistical Methods and Applications

Journal Article•10.1007/S10260-013-0247-2•

Discussion of “Model-based clustering and classification with non-normal mixture distributions” by S.X. Lee and G.J. McLachlan

[...]

Giuliano Galimberti¹, Angela Montanari¹•Institutions (1)

University of Bologna¹

20 Oct 2013-Statistical Methods and Applications

TL;DR: This paper focuses on the selection of the shape of the mixture components by comparing restricted skew t finite mixtures with unrestricted ones (which in the examples of the paper seem to give the best results), and wonders if such a strategy can effectively lead to the recovery of the “true” group structure.

...read moreread less

Abstract: It is a great pleasure to have the chance of reading and commenting on this very interesting paper that provides a unified view on non-gaussian mixture models. It is a very hot topic that has recently been receiving increasing attention in the literature. This paper is especially welcome as it offers to the reader an up-to-date review, with interesting stimuli for reflection and further insight. We would like to comment on the clustering side of the work. The many examples discussed in the paper show how the choice of the distributional shape for the mixture components can affect the clustering performances of the corresponding mixture model. The clustering results are assessed by comparison with a priori known information about group membership through the Adjusted Rand Index (ARI) or the misclassification rate. But in real applications the group structure is unknown and the researcher is faced with the need to derive the “best” clustering with no a priori information on group membership. In the literature on model-based clustering, this problem is often viewed as a model selection problem and likelihood based criteria, such as BIC or ICL, are usually suggested. We wonder if such a strategy can effectively lead to the recovery of the “true” group structure and we try to give an answer through a simple simulation study (which has been performed using the R packages EMMIX-skew and EMMIX-uskew described in the paper). In particular we focus on the selection of the shape of the mixture components by comparing restricted skew t finite mixtures with unrestricted ones (which in the examples of the paper seem to give the best results). Starting from the Australian Institute of Sports data (Section 5.2 in Lee and McLachlan’s paper), a two-component mixture of unrestricted skew t is fitted and the corresponding parameter estimates are used to simulate 500 datasets (with sample size equal to 202, i.e. the same size as the original dataset). For each unit of these data sets, the generating mixture component is recorded and assumed to be the “true” class. Afterwards, on each sample, a two component mixture model is fitted both with restricted and unrestricted skew t components, thus leading to two partitions of the units into two groups. The agreement between each clustering result and the corresponding “true” classification is evaluated through the ARI.

...read moreread less

Journal Article•10.1007/S10260-013-0235-6•

A two-sample test when data are contaminated

[...]

Denys Pommeret¹•Institutions (1)

Aix-Marseille University¹

09 Jul 2013-Statistical Methods and Applications

TL;DR: This paper considers the problem of testing whether two samples of contaminated data arise from the same distribution and proposes a test based on the polynomials moments of the difference between observations and noises.

...read moreread less

Abstract: In this paper we consider the problem of testing whether two samples of contaminated data arise from the same distribution. Is is assumed that the contaminations are additive noises with known, or estimated moments. This situation can also be viewed as two signals observed before and after perturbations. The problem is then to test the equality of both perturbations. The test statistic is based on the polynomials moments of the difference between observations and noises. The test is very simple and allows one to compare two independent as well as two paired contaminated samples. A data driven selection is proposed to choose automatically the number of involved polynomials. We present a simulation study in order to investigate the power of the proposed test within discrete and continuous cases. Real-data examples are presented to illustrate the method.

...read moreread less

Journal Article•10.1007/S10260-013-0248-1•

Discussion of ‘Model-based clustering and classification with non-normal mixture distributions’ by Lee and McLachlan

[...]

Paul D. McNicholas¹, Ryan P. Browne¹, Paula M. Murray¹•Institutions (1)

University of Guelph¹

23 Oct 2013-Statistical Methods and Applications

Journal Article•10.1007/S10260-013-0244-5•

The association between multidose vaccinations and death: comparing case series methods when the first exposure changes the general risk of an event

[...]

Ronny Kuhnert¹, Stefania Spila-Alegiani², Gianpaolo Scalia Tomba³, Giuseppe Traversa², Mechtild Vennemann⁴, Hartmut Hecker⁵ - Show less +2 more•Institutions (5)

Robert Koch Institute¹, National Institutes of Health², University of Rome Tor Vergata³, University of Münster⁴, Hochschule Hannover⁵

27 Sep 2013-Statistical Methods and Applications

TL;DR: This paper applied the SCCS method and Cox regression to data from the German study on sudden infant death (GeSID) and to a case series study from Italy (the HERA study) examining sudden unexpected deaths and vaccinations during the first 2 years of life.

...read moreread less

Abstract: Many case-control studies have shown a protective effect of vaccinations on the risk of sudden unexplained death (SUD). In this paper we compare the properties of different statistical methods in this situation, when the first vaccination appears to reduce the overall risk of an event (SUD). The first method is the self controlled case series (SCCS) method, which considers only subjects with an event during the observation time. This method yields unbiased estimates in the situation of non-censoring events. We show by simulation studies that the second method considered, the adjusted SCCS method, underestimates the parameter of interest, the effect of the first dose, when the general risk of SUD is lower in control periods after the first vaccination than in the period before vaccination. This type of bias could be eliminated by considering only cases who had received at least one vaccination. Additionally, we compare the adjusted SCCS method with the Cox model as a third method. Cox regression can take into account the time before the first vaccination, and this method yields unbiased estimates at modest effect sizes and short risk periods. We applied the SCCS method and Cox regression to data from the German study on sudden infant death (GeSID) and to a case series study from Italy (the HERA study) examining sudden unexpected deaths and vaccinations during the first 2 years of life. We show that the adjusted SCCS analysis with all cases underestimates the vaccination effect, as expected from the simulation analyses. Using Cox regression, we examined the general risk reduction after vaccination, as was the focus of the above mentioned studies. With a relative incidence of 0.8, our results were less pronounced than in the case-control analysis of the GeSID study (adjusted odds ratio: 0.51). SCCS analyses of both the GeSID and HERA studies yielded very similar estimates for the first and second vaccine doses.

...read moreread less

Journal Article•10.1007/S10260-013-0229-4•

Consistency of the estimator of binary response models based on AUC maximization

[...]

Igor Fedotenkov¹•Institutions (1)

University of Verona¹

02 Feb 2013-Statistical Methods and Applications

TL;DR: Compared to parametric methods, such as logit and probit, AUC maximization relaxes assumptions about error distribution, but imposes some restrictions on the distribution of explanatory variables, which can be easily checked, since this information is observable.

...read moreread less

Abstract: This paper examines the asymptotic properties of a binary response model estimator based on maximization of the Area Under receiver operating characteristic Curve (AUC). Given certain assumptions, AUC maximization is a consistent method of binary response model estimation up to normalizations. As AUC is equivalent to Mann-Whitney U statistics and Wilcoxon test of ranks, maximization of area under ROC curve is equivalent to the maximization of corresponding statistics. Compared to parametric methods, such as logit and probit, AUC maximization relaxes assumptions about error distribution, but imposes some restrictions on the distribution of explanatory variables, which can be easily checked, since this information is observable.

...read moreread less

Journal Article•10.1007/S10260-012-0216-1•

On the use of MCMC computerized adaptive testing with empirical prior information to improve efficiency

[...]

Mariagiulia Matteucci¹, Bernard P. Veldkamp²•Institutions (2)

University of Bologna¹, University of Twente²

01 Oct 2013-Statistical Methods and Applications

TL;DR: By using both simulated and real data, it is proved that the introduction of empirical prior information in the estimation of candidate's ability within computerized adaptive testing produces more accurate ability estimates, especially for short tests and when reproducing boundary abilities.

...read moreread less

Abstract: The paper deals with the introduction of empirical prior information in the estimation of candidate’s ability within computerized adaptive testing (CAT). CAT is generally applied to improve efficiency of test administration. In this paper, it is shown how the inclusion of background variables both in the initialization and the ability estimation is able to improve the accuracy of ability estimates. In particular, a Gibbs sampler scheme is proposed in the phases of interim and final ability estimation. By using both simulated and real data, it is proved that the method produces more accurate ability estimates, especially for short tests and when reproducing boundary abilities. This implies that operational problems of CAT related to weak measurement precision under particular conditions, can be reduced as well. In the empirical examples, the methods were applied to CAT for intelligence testing in the area of personnel selection and to educational measurement. Other promising applications would be in the medical world, where testing efficiency is of paramount importance as well.

...read moreread less

Journal Article•10.1007/S10260-013-0236-5•

A test for bivariate normality with applications in microeconometric models

[...]

Riccardo Lucchetti, Claudia Pigini¹•Institutions (1)

University of Perugia¹

07 Aug 2013-Statistical Methods and Applications

TL;DR: A test for bivariate normality in imperfectly observed models, based on the information matrix test for censored models with bootstrap critical values, is proposed, and it is found that, while asymptotic critical values can be seriously misleading, the use of bootstrapcritical values results in a test that has excellent size and power properties even in small samples.

...read moreread less

Abstract: In this paper, we propose a test for bivariate normality in imperfectly observed models, based on the information matrix test for censored models with bootstrap critical values. In order to evaluate its properties, we run a comprehensive Monte Carlo experiment, in which we use the bivariate probit model and Heckman sample selection model as examples. We find that, while asymptotic critical values can be seriously misleading, the use of bootstrap critical values results in a test that has excellent size and power properties even in small samples. Since this procedure is relatively inexpensive from a computational viewpoint and is easy to generalise to models with arbitrary censoring schemes, we recommend it as an important and valuable testing tool.

...read moreread less

Journal Article•10.1007/S10260-013-0237-4•

Model-based clustering and classification with non-normal mixture distributions

[...]

Sharon X. Lee¹, Geoffrey J. McLachlan¹•Institutions (1)

University of Queensland¹

31 Jul 2013-Statistical Methods and Applications

TL;DR: This paper considers some of these existing proposals of multivariate non-normal mixture models and compares the relative performance of restricted and unrestricted skew mixture models in clustering, discriminant analysis, and density estimation on six real datasets from flow cytometry, finance, and image analysis.

...read moreread less

Abstract: Non-normal mixture distributions have received increasing attention in recent years. Finite mixtures of multivariate skew-symmetric distributions, in particular, the skew normal and skew $t$-mixture models, are emerging as promising extensions to the traditional normal and $t$-mixture models. Most of these parametric families of skew distributions are closely related, and can be classified into four forms under a recently proposed scheme, namely, the restricted, unrestricted, extended, and generalised forms. In this paper, we consider some of these existing proposals of multivariate non-normal mixture models and illustrate their practical use in several real applications. We first discuss the characterizations along with a brief account of some distributions belonging to the above classification scheme, then references for software implementation of EM-type algorithms for the estimation of the model parameters are given. We then compare the relative performance of restricted and unrestricted skew mixture models in clustering, discriminant analysis, and density estimation on six real datasets from flow cytometry, finance, and image analysis. We also compare the performance of mixtures of skew normal and $t$-component distributions with other non-normal component distributions, including mixtures with multivariate normal-inverse-Gaussian distributions, shifted asymmetric Laplace distributions and generalized hyperbolic distributions.

...read moreread less

Journal Article•10.1007/S10260-012-0218-Z•

An empirical likelihood ratio based goodness-of-fit test for skew normality

[...]

Wei Ning¹, Grace Ngunkeng¹•Institutions (1)

Bowling Green State University¹

01 Jun 2013-Statistical Methods and Applications

TL;DR: An empirical likelihood ratio based goodness-of-fit test for the skew normality is proposed and the asymptotic results of the test statistic under the null hypothesis and the alternative hypothesis are derived.

...read moreread less

Abstract: In this paper, an empirical likelihood ratio based goodness-of-fit test for the skew normality is proposed. The asymptotic results of the test statistic under the null hypothesis and the alternative hypothesis are derived. Simulations indicate that the Type I error of the proposed test can be well controlled for a given nominal level. The power comparison with other available tests shows that the proposed test is competitive. The test is applied to IQ scores data set and Australian Institute of Sport data set to illustrate the testing procedure.

...read moreread less

Journal Article•10.1007/S10260-013-0232-9•

A new mobility index for transition matrices

[...]

Camilla Ferretti¹, Piero Ganugi²•Institutions (2)

Catholic University of the Sacred Heart¹, University of Parma²

30 May 2013-Statistical Methods and Applications

TL;DR: In this paper, the authors proposed a mobility index able to grasp the prevailing direction in the evolution of a given set of statistical units, and defined a whole family of directional indices defined as functions of the transition matrix, so that their absolute value measures the intensity of mobility, and their sign (€ +/-€ ǫ ) represents the prevailing trend towards improvement/worsening in the dynamics under study.

...read moreread less

Abstract: In this work we construct a mobility index able to grasp the prevailing direction in the evolution of a given set of statistical units. We consider the case of dynamics ruled by a transition matrix, whose states are based on an ordered economic variable (firm size or income, among others) such that the future position of an individual can be better or worse than the current one. The existing indices measure only the absolute value of mobility, without providing information about the main direction in the dynamics. We propose here a whole family of directional indices defined as functions of the transition matrix, so that their absolute value measures the intensity of mobility, and their sign ( $$+/-$$ ) represents the prevailing direction towards improvement/worsening in the dynamics under study.

...read moreread less