TL;DR: In this article, it was shown that the largest eigenvalue of Rn almost surely converges to a constant provided n/p goes to a positive constant, and that the empirical distribution of eigenvalues of R n converges weakly to the Marcenko-Pastur law and the semi-circular law.
Abstract: Let X n = (xij) be an n by p data matrix, where the n rows form a random sample of size n from a certain p-dimensional population distribution. Let R n = (ρ i j ) be the p x p sample correlation coefficient matrix of X n . Assuming that x i j 's are independent and identically distributed (x i j 's are required to be only independent when they are normals), we show that the largest eigenvalue of Rn almost surely converges to a constant provided n/p goes to a positive constant. Under two conditions on the ratio n/p, we show that the empirical distribution of eigenvalues of R n converges weakly to the Marcenko-Pastur law and the semi-circular law, respectively. This work is motivated by testing the hypothesis, assuming population distribution N p (μ,Σ), that the p variates are uncorrelated.
TL;DR: In this paper, it is shown that the estimators obtained based on the improved method of moments (IMM) may also be highly inefficient as compared to the estimation obtained by a proposed quasi-likelihood (QL) approach.
Abstract: It is well-known that the penalized quasi-likelihood (PQL) approach may not yield consistent estimators for the parameters of the generalized linear mixed model (GLMM). Jiang (1998) introduced a method of moments (MM) to estimate the parameters of the GLMM. The moment estimators may however be highly inefficient. To overcome this inefficiency problem, recently Jiang and Zhang (2001) suggest an improvement over the method of moments. It is however demonstrated in this paper that the estimators obtained based on the improved method of moments (IMM) may also be highly inefficient as compared to the estimators obtained based on a proposed quasi-likelihood (QL) approach. The QL estimators are consistent and highly efficient, the exact maximum likelihood estimators being fully efficient (i.e., optimal) which are however known to be difficult to compute.
TL;DR: In this article, a general result about identifiability of finite mixtures of a family of distributions is obtained via tail conditions on the corresponding characteristic functions, applied to location-scale families on the real line and to circular distributions.
Abstract: A general result about identifiability and strong identifiability of finite mixtures of a family of distributions is obtained via tail conditions on the corresponding characteristic functions. This is applied to location-scale families on the real line and to circular distributions. Particular cases include circular wrapped distributions of location-scale families, stable distributions and the d-dimensional wrapped normal distribution. Finally, counter examples are given which highlight differences between identifiability on the real line and on the circle.
TL;DR: In this article, goodness-of-fit tests for the logistic distribution are proposed that are based on weighted integrals involving empirical transforms, and the consistency of the test based on the empirical characteristic function as well as its asymptotic distribution under the null hypothesis are investigated.
Abstract: In this paper goodness-of-fit tests for the logistic distribution are proposed that are based on weighted integrals involving empirical transforms. The consistency of the test based on the empirical characteristic function as well as its asymptotic distribution under the null hypothesis are investigated. In a particular case, as the decay of the weight function tends to infinity the test statistic approaches a limit value. The resulting limit statistic is related to the first nonzero component of Neyman's smooth test for this distribution. The new tests are compared with other omnibus tests for the logistic distribution.
TL;DR: In this article, the estimations of regression coefficients in a partitioned weakly singular linear model are considered and questions concerning the Watson efficiency of the ordinary least squares estimator of a subset of the parameters with respect to the best linear unbiased estimator are investigated.
Abstract: We consider the estimations of regression coefficients in a partitioned weakly singular linear model and focus on questions concerning the Watson efficiency of the ordinary least squares estimator of a subset of the parameters with respect to the best linear unbiased estimator. Certain submodels are also considered. The conditions under which the Watson efficiency in the full model splits into a function of some other Watson efficiencies is given special attention. In particular, a new decomposition of the Watson efficiency into a product of three particular factors appears to be very useful.
TL;DR: In this article, a batch arrival queue with a Bernoulli vacation schedule is proposed, where after completion of a service the server either goes for a vacation of random length with probability θ(0 < θ < 1) or may continue to serve the next unit, if anv, with probability (1 - θ), under a restricted admissibility policy of arriving batches.
Abstract: We cousider a batch arrival queue with a Bernoulli vacation schedule, where after completion of a service the server either goes for a vacation of random length with probability θ(0 < θ < 1) or may continue to serve the next unit, if anv, with probability (1 - θ), under a restricted admissibility policy of arriving batches. Unlike the usual batch arrival queueing system, the restricted admissibility policy differs during a busy period and a vacation period and hence all arriving batches are not allowed to join the system at all time. We derive the steady state queue size distribution at a random point of time as well as at a departure epoch. Also we obtain some important performance measures of this model. More over, this paper attempts to unify several classes of related batch arrival queneing systems.
TL;DR: In this paper, the authors considered a single server retrial queue with batch arrivals under the so-called linear retrial discipline, where each individual customer is subject to a control admission policy upon arrival.
Abstract: We consider a single server retrial queue with batch arrivals which operates under the so-called linear retrial discipline. In addition, each individual customer is subject to a control admission policy upon arrival. Thismodel generalizes the classical M/G/1 retrial policy with arrivals in batches. We carry out an extensive analysis of the system, including existence of the stationary regime, embedded Markov chain, stochastic decomposition and calculation of the first moments.
TL;DR: The hierarchical Bayesian gene selection model for survival data is considered and the use of the methodology is demonstrated to diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and Breast Carcinomas data.
Abstract: Selection of significant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This paper considers hierarchical Bayesian gene selection model for survival data. In survival analysis the popular models are usually well suited for data with few covariates and many observations (subjects). In contrast for a typical setting of gene expression data from DNA microarray, we need to consider the case where the number of covariates p exceeds the number of samples n. For a given vector of response values which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the significant genes. This approach enables us to estimate the survival curve when n « p. In our approach, rather than fixing the number of selected genes, we assign a prior distribution to this number. That way it creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology to diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and Breast Carcinomas data.
TL;DR: In this article, the authors analyse the M x /G/1 queue with server vacations and an additional feature, reflecting various real-life situations, in which the server, upon finding an empty system at the end of a vacation, activates a timer of duration T and waits dormant.
Abstract: We analyse the M x /G/1 queue with server vacations and an additional feature, reflecting various real-life situations, in which the server, upon finding an empty system at the end of a vacation, activates a timer of duration T and waits dormant. If a batch arrives during the dormant period, a new busy period starts, but if no arrivals occur, the server waits no more and takes another vacation. The M x /G/1 queues with multiple or with single vacations become limiting cases of the above model when T → 0 or T → oo, respectively.
TL;DR: Several assignment rules are suggested for designing good experiments to deal with the trade-off between run-size reduction and the possibly negligible effects in two-level factorial experiments.
Abstract: We need extra runs to design two-level factorial experiments in blocks of size two to estimate all the available effects, as is possible in experiments without blocking. The number of runs suggested is (n - p)2 n - p for 2 n - p fractional factorial experiments. In designing such an experiment, two issues need to be considered. First, the precision of estimates is usually different because different numbers of observations are used for estimation in the analysis of the resulting data. It is important to have more precise estimates of the effects with which we are most concerned. Second, the trade-off between run-size reduction and the possibly negligible effects is of significance, especially when the number of factors is large. To deal with these two issues, several assignment rules are suggested for designing good experiments.
TL;DR: In this article, the authors presented some approximation results for nonhomogeneous Markov chains that extend Hajnal's results from finite to general state space, compared to Isaacson and Madsen's strong ergodicity conditions that require the exitence of stationary distributions.
Abstract: In this note we present some approximation results for non-homogeneous Markov chains that extend Hajnal's results from finite to general state space. These extensions will be compared to Isaacson and Madsen's strong ergodicity conditions that require the exitence of stationary distributions.
TL;DR: In this article, Bayes estimates and predictors for a normal distribution are derived for α-absolute error losses, the LINEX losses and the entropy loss as a special case of the α-relative error losses.
Abstract: Comparisons of estimates between Bayes and frequentist methods are interesting and challenging topics in statistics. In this paper, Bayes estimates and predictors are derived for a normal distribution. The commonly used frequentist predictor such as the maximum likelihood estimate (MLE) is a "plug-in" procedure by substituting the MLE of μ into the predictive distribution. We examine Bayes prediction under the α-absolute error losses, the LINEX losses and the entropy loss as special case of the α-absolute error losses. If the variance is unknown, the joint conjugate prior is used to estimate the unknown mean for the α-absolute error losses and an ad hoc method by replacing the unknown variance by the sample variance for the LINEX losses. Bayes estimates are also extended to the linear combinations of regression coefficients. Under certain assumptions for a design matrix, the asymptotic expected losses are derived. Under suitable priors, Bayes estimate and predictor perform better than the MLE. Under the LINEX loss, the Bayes estimate under the Jeffreys prior is superior to the MLE. However, for prediction, it is not clear whether Bayes prediction or MLE performs better. Under some circumstances, even when one loss is the "true" loss function, Bayes estimate under another loss performs better than the Bayes estimate under the "true" loss. This serves as a warning to naive Bayesians who assume that Bayes methods always perform well regardless of circumstances.
TL;DR: In this paper, the authors compared the performance of restricted and unrestricted maximum likelihood estimators of two normal populations under LINEX (liner-exponential) loss functions, and showed that the restricted estimator is better than the unrestricted estimator under the Pitman nearness criterion.
Abstract: We compare the performance of restricted and unrestricted maximum likelihood estimators of means ${\mu}_1$ and ${\mu}_2$ , and common variance ${\sigma}^2$, of two normal populations under LINEX (liner-exponential) loss functions, when it is known aprori that ${\mu}_1 \leq {\mu}_2$. If \delta is any estimator of the real parameter g(\b{\theta}), then the LINEX loss function is defined by L(\b{\theta}, \delta) = $e^{a(\delta-g(\b{\theta})} - a (\delta - g(\b{\theta})) - 1, a
ot= 0$. We show that the restricted maximum likelihood estimator (MLE) $\hat{\mu}_1$ of ${\mu}_1$ is better than the unrestricted MLE ${\={X}}_1$, for a \in $[a_1, 0)$ \cup (0, \infty), where $a_1 0, depending on the sample sizes, the restricted MLE $\hat{\mu}_2$ of ${\mu}_2$ is shown to be superior to the unrestricted MLE $(\={X})_2$ for a \in (- \infty, 0) \cup (0, $a^*_1$], and the two estimators are shown to be not comparable for a > $a^*_1$. Similar results are obtained for the simultaneous estimation of (${\mu}_1, {\mu}_2$ under the sum of LINEX loss functions. For the estimation of ${\sigma}^2$, we show that the restricted MLE $\hat{\sigma}^2$ is superior to the unrestricted MLE $S^2$ for a \in (- \infty, 0) \cup (0, $a_2$] and the two estimators are shown to be not comparable for a \in $(a_2, a_3)$, where 0 < $a_2$ < $a_3$ are constants depending on the sample sizes. Interestingly, for a \in $[a_3, (n_1 + n_2)/2)$, it turns out that the unrestricted MLE $S^2$ is better than the restricted MLE $\hat{\sigma}^2$. We also prove a conjecture of Gupta and Singh (1992) concerning the dominance of the restricted MLE $\hat{\sigma}^2$ over the unrestricted MLE $S^2$ under the Pitman nearness criterion. Finally, we generalize some of these results to the case of k (\geq 2) normal populations.
TL;DR: In this article, the authors derive a necessary and sufficient condition under which two distributions have equal Fisher information in any order statistics, which can be used to define an equivalence relation on parametric distributions.
Abstract: Any collection of order statistics from two different probability distributions may contain equal Fisher information about a scalar parameter. We derive a necessary and sufficient condition under which two distributions have equal Fisher information in any order statistics. Hence this condition can be used to define an equivalence relation on parametric distributions. Within the location (scale) family of distributions, we show that this equivalence relation uniquely determines the parametric family by the values of the Fisher information about the location (scale) parameter in any order statistics. The results are used to derive some location-scale distribution and obtain a simple characterization in terms of the Fisher information in the sequence of the minimum order statistics.
TL;DR: In this paper, it was shown that gamma distributions provide models for departures from randomness since every neighbourhood of an exponential distribution contains a neighbourhood of gamma distributions, using an information theoretic metric topology.
Abstract: We show that gamma distributions provide models for departures from randomness since every neighbourhood of an exponential distribution contains a neighbourhood of gamma distributions, using an information theoretic metric topology. Moreover, every neighbourhood of the uniform distribution contains a neighbourhood of log-gamma distributions. We derive also the information geometry of the 3-manifold of McKay bivariate gamma distributions, which can provide a metrization of departures from randomness and departures from independence for bivariate processes. The curvature objects are derived, including those on three submanifolds. As in the case of bivariate normal manifolds, we have negative scalar curvature but here it is not constant and we show how it depends on correlation. These results have applications, for example, in the characterization of stochastic materials.
TL;DR: In this article, it was shown that the distribution of (X, Y) is a Lancaster probability if and only if the natural exponential families generated by the random variables U, V, W are simple quadratic.
Abstract: Let X and Y be two random variables on R d and let (P n ) n , N d and (Q k ) k , N d be two basis of or thonormal polynomials with respect to the distributions of X and Y, respectively. The joint distribution of (X, Y) is called a Lancaster probability if the expectation E(P n (X)Q k (Y)) vanishes for n ≠ k. This paper concerns the characterization of Lancaster probabilities for the particular case X = U + V and Y = V + W, where U, V, W are independent random variables on R d . It is shown that the distribution of (X, Y) is a Lancaster probability if and only if the natural exponential families generated by the random variables U, V, W are simple quadratic (the variance is a specific quadratic function of the mean). This result is an extension of Lancaster (1975). We also generalize the definition of Lancaster probabilities that we relate to the more general class of quadratic natural exponential families on R d . Finally, two tests for independence and a goodness of fit test for these multivariate joint distributions are outlined.
TL;DR: In this article, an additive and a multiplicative random effect Poisson model is proposed to account for the unobserved heterogeneity of count data, which can be used for analysing data which shows overdispersion.
Abstract: It is well known that count data show overdispersion compared to the Poisson distribution, which is extensively used for the analysis of discrete data. In order to account for the unobserved heterogeneity, in this paper we introduce an additive and a multiplicative random effect Poisson model. The random effect is modelled by the gamma distribution and the inverse Gaussian distribution and both univariate as well as multivariate models are developed. Expressions for the various conditionals and marginal distributions are obtained and the correlation introduced by sharing a common random effect is studied. Some computational aspects, of the models developed , are presented to illustrate the results. Thus the purpose of this paper is to provide some alternative models that can be used for analysing data which shows overdispersion.
TL;DR: In this article, an explicit expression for evaluating the expectation of the mean search time of a demanded item in equilibrium is provided for the 7-stable case and Kingman's results are recovered in the limit.
Abstract: The model for the so-called "heaps" problem as set in Kingman (1975) is considered and an explicit expression for evaluating the expectation of the mean search time of a demanded item in equilibrium is provided. Particular attention is devoted to the 7-stable case and Kingman's results are recovered in the limit.
TL;DR: The notion of similarity by means of a system of axioms provides a class of indices of similarity which includes most of those proposed in the literature, since the fit of a model of an empirical f.d. can be understood as "asymmetric" similarity.
Abstract: Perhaps the most important task of Descriptive Statistics is the comparison of frequency distributions (f.d.s.), i.e. the evaluation of their similarity or dissimilarity. Many indices of similarity (or dissimilarity) havebeen proposed in the literature. In this paper, we describe the notion of similarity by means of a system of axioms. The numerical representation of this system provides a class of indices of similarity which includes most of those proposed. in the literature. Finally, since the fit of a model of an empirical f.d. can be understood as "asymmetric" similarity, the class of indices described here also includes indices of fit.
TL;DR: An alternative sampling-based method to fit a two-stage hierarchical model in which there is conjugacy conditional on the parameters in the second stage using the sampling importance resampling (SIR) algorithm.
Abstract: Although it is common practice to fit a complex Bayesian model using Markov chain Monte Carlo (MCMC) methods, we provide an alternative sampling-based method to fit a two-stage hierarchical model in which there is conjugacy conditional on the parameters in the second stage. Using the sampling importance resampling (SIR) algorithm, our method subsamples independent samples from an approximate joint posterior density. This is an alternative to a Metropolis-Hastings (MH) algorithm normally used to draw samples from the joint posterior density. We also provide comparison with a Metropolis (MET) algorithm. We illustrate our method using a Poisson regression model which has much interest for the analysis of rare events from small areas. We also illustrate our method using a relatively new logistic regression model. We use four examples, three on Poisson regression and one on logistic regression, and a simulation study on the Poisson regression model to assess the performance of our method relative to the MH and the MET algorithms.
TL;DR: In this article, the authors investigated the optimal prediction of the linear predictable variable in the multivariate linear model with arbitrary rank, and the necessary and sufficient conditions for a linear predictor of such a variable to be admissible in the class of homogeneous linear predictors and nonhomogeneous linear predictor are established respectively.
Abstract: In this paper we investigate optimal prediction of the linear predictable variable in the multivariate linear model with arbitrary rank. The definition of the admissible linear predictor is given, and the necessary and sufficient conditions for a linear predictor of the linear predictable variable to be admissible in the class of homogeneous linear predictors and the class of nonhomogeneous linear predictors are established respectively.
TL;DR: In this article, a balanced fractional 2 m factorial design derived from a simple array such that the general mean and all the main effects are estimable, where the four-factor and higher-order interactions are assumed to be negligible.
Abstract: Using the algebraic structure of the triangular multidimensional partially balanced association scheme and a matrix equation, we give a balanced fractional 2 m factorial design derived from a simple array such that the general mean and all the main effects are estimable, where the four-factor and higher-order interactions are assumed to be negligible. We also give optimal designs with respect to the generalized A-optimality criterion for 6 < m ≤ 8 when the number of assemblies is less than the number of non-negligible factorial effects.
TL;DR: In this article, a symmetric version of the Neyman-Pearson test is developed for discriminating between sets of hypotheses and is extended to encompass a new formulation of the problem of parameter estimation based on finite data sets.
Abstract: A symmetric version of the Neyman-Pearson test is developed for discriminating between sets of hypotheses and is extended to encompass a new formulation of the problem of parameter estimation based on finite data sets. Such problems can arise in distributed sensing and localization problems in sensor networks, where sensor data must be compressed to account for communication constraints. In this setting it is natural to focus on methods that balance coarse resolution of the estimates for achieving higher reliability.
TL;DR: In this paper, a distribution-free test for the two-sample scale problem in ranked set samples is proposed, assuming that the distribution functions of the populations have a common quantile.
Abstract: This paper proposes a distribution-free test for the two-sample scale problem in ranked set samples. The proposed test assumes that the distribution functions of the populations have a common quantile. It is shown that the new test is uniformly more efficient than its simple random sample analog. The paper also proposes a method to provide a consistent variance estimate of the test statistic under null hypothesis and imperfect judgment ranking so that test is asymptotically distribution free even under imperfect judgment ranking.
TL;DR: In this paper, a nonparametric Bayesian approach to the estimation of the adjustment coefficient for the distribution of the maximum of a random walk is performed, and the consistency and asymptotic normality of its posterior law are also proved under mild conditions.
Abstract: In this paper, a nonparametric Bayesian approach to the estimation of the adjustment coefficient for the distribution of the maximum of a random walk is performed. Approximations of the posterior distribution of the adjustment coefficient are studied. The consistency and asymptotic normality of its posterior law are also proved under mild conditions. Finally, an application to real data is provided.
TL;DR: In this paper, the effect of imperfect inspection on screening acceptance-sampling procedure with retesting in a low prevalence population was developed, based on pooling samples and testing the pool as a whole for presence or absence of infected samples.
Abstract: This article develops the effect of imperfect inspection on screening acceptance-sampling procedure with retesting in a low prevalence population. The procedure is based on pooling samples and testing the pool as a whole for presence or absence of infected samples, then retesting the pools that test positive on the first test, and proceeding to individual testing only if such presence is also indicated on retest. If sampling is perfect, the procedure can reduce the expected cost, and accuracy is attained especially if the proportion of infection is low. Modifications on the procedure to hierarchical screening, which are aimed at further reduction in expected number of tests, are discussed.
TL;DR: In this paper, the authors investigated the asymptotic property and the performance of the repeated half sampling (RHS) criterion in the context of variable selection under a lineal regression model.
Abstract: In this paper, the asymptotic property and the performance of the repeated half sampling (RHS) criterion are investigated. In the context of variable selection under a lineal regression model, we show that RHS is asymptotically equivalent to the multifold cross-validated (MCV) criterion. While in the case where the candidate family of models doesn't include the true model, we establish that RHS and also MCV are asymptotically equivalent to a criterion similar to Takeuchi information criterion (TIC). The performance of RHS criterion is compared with CV, Akaike, corrected Akaike and BIC criteria. The results of a simulation study show that RHS improve upon the performance of some criteria in two important areas of application: multiple linear regression and multivariate regression.
TL;DR: In this article, the stochastic Laplace and the integral harmonic mean residual life orders of random sums with geometric stopping times were investigated. But they were not shown to preserve the Laplace order.
Abstract: Some new order preservation properties of stopped sums of independent non-negative random variables, when the stopping variable is independent of the summands, is investigated. We show that such randomly stopped sums preserve the stochastic Laplace as well as the integral harmonic mean residual life orders. For the case of Laplace orders, there is a suitable converse for each of the order preservation results. Exponential distributions are characterized within the class of random sums with geometric stopping times, via simple moment'conditions on the summand obeying a suitably weak aging hypothesis.
TL;DR: In this paper, an integral test is presented to determine the limiting behaviour of forward delayed sums, Abel and Cesaro sums of independent identically distributed random variables with stable distribution, and deduce Chover type law of the iterated logarithm for them.
Abstract: We present an integral test to determine the limiting behaviour of forward delayed sums, Abel and Cesaro sums of independent identically distributed random variables with stable distribution, and deduce Chover type law of the iterated logarithm (LIL) for them.
TL;DR: In this article, a practical problem of estimating the total area under cultivation in Indian districts is addressed by two-stage sampling with unequal selection-probabilities, and the accuracy in estimation bootstrap technique is employed in constructing confidence intervals and simulation-based performance criteria are evaluated from live-data as shown for competitive procedures.
Abstract: A practical problem of estimating the total area under cultivation in Indian districts is addressed by two-stage sampling with unequal selection-probabilities. To assess the accuracy in estimation bootstrap technique is employed in constructing confidence intervals and simulation-based performance criteria are evaluated from live-data as are shown for competitive procedures. Rao-Hartley-Cochran's (RHC, 1962) scheme is employed in both stages of sampling. Sitter's mirrormatch bootstrap procedure is employed suitably modifying it to cover the two-stages.