TL;DR: In this paper, a test of the null hypothesis that an observable series is stationary around a deterministic trend is proposed, where the series is expressed as the sum of deterministic trends, random walks, and stationary error.
TL;DR: In this article, the authors proposed a global test statistic for multivariate data with missing values, that is, whether the missing data are missing completely at random (MCAR), that is whether missingness depends on the variables in the data set.
Abstract: A common concern when faced with multivariate data with missing values is whether the missing data are missing completely at random (MCAR); that is, whether missingness depends on the variables in the data set. One way of assessing this is to compare the means of recorded values of each variable between groups defined by whether other variables in the data set are missing or not. Although informative, this procedure yields potentially many correlated statistics for testing MCAR, resulting in multiple-comparison problems. This article proposes a single global test statistic for MCAR that uses all of the available data. The asymptotic null distribution is given, and the small-sample null distribution is derived for multivariate normal data with a monotone pattern of missing data. The test reduces to a standard t test when the data are bivariate with missing data confined to a single variable. A limited simulation study of empirical sizes for the test applied to normal and nonnormal data suggests th...
TL;DR: In this paper, the authors proposed a method called the "gap statistic" for estimating the number of clusters (groups) in a set of data, which uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution.
Abstract: We propose a method (the ‘gap statistic’) for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution. Some theory is developed for the proposal and a simulation study shows that the gap statistic usually outperforms other methods that have been proposed in the literature.
TL;DR: In this paper, the null distribution of residual-based cointegration tests depends on the asymptotics of the least-squares dummy variable (LSDV) estimator and other conventional statistics.
TL;DR: An error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data is proposed and provides good detection power.
Abstract: Motivation: High throughput nucleotide sequencing provides quantitative readouts in assays for RNA expression (RNA-Seq), protein-DNA binding (ChIP-Seq), cell counting. Statistical inference of differential signal in these data needs to take into account their natural variability throughout the dynamic range. When the number of replicates is small, error modeling is needed to achieve statistical power. Results: We propose an error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data. The method controls type-I error and provides good detection power.Availability: A free open-source R/Biondonductor software package, called "DESeq", is available from "http://www-huber.embl.de/users/anders/DESeq":http://www-huber.embl.de/users/anders/DESeq