TL;DR: The purpose of this article was to survey the use of the Bonferroni correction in research articles published in three optometric journals and to provide advice to authors contemplating multiple testing.
TL;DR: An overview of the current state of the art in multiple testing in genomics data from a user's perspective is presented, including methods for familywise error control, false discovery rate control and false discovery proportion estimation and confidence.
Abstract: This paper presents an overview of the current state of the art in multiple testing in genomics data from a user's perspective. We describe methods for familywise error control, false discovery rate control and false discovery proportion estimation and confidence, both conceptually and practically, and explain when to use which type of error rate. We elaborate on the assumptions underlying the methods and discuss pitfalls in the interpretation of results. In our discussion, we take into account the exploratory nature of genomics experiments, looking at selection of genes before or after testing, and at the role of validation experiments.
TL;DR: In this article, the authors identified all multi-arm clinical trials published in 2012 by four major medical journals and extracted several aspects of the trial design, including whether the trial was exploratory or confirmatory, whether a multiple-testing correction was applied and, if one was used, what type it was.
Abstract: Background: Multi-arm trials enable the evaluation of multiple treatments within a single trial. They provide a way of substantially increasing the efficiency of the clinical development process. However, since multi-arm trials test multiple hypotheses, some regulators require that a statistical correction be made to control the chance of making a type-1 error (false-positive). Several conflicting viewpoints are expressed in the literature regarding the circumstances in which a multiple-testing correction should be used. In this article we discuss these conflicting viewpoints and review the frequency with which correction methods are currently used in practice. Methods: We identified all multi-arm clinical trials published in 2012 by four major medical journals. Summary data on several aspects of the trial design were extracted, including whether the trial was exploratory or confirmatory, whether a multiple-testing correction was applied and, if one was used, what type it was. Results: We found that almost half (49%) of published multi-arm trials report using a multiple-testing correction. The percentage that corrected was higher for trials in which the experimental arms included multiple doses or regimens of the same treatments (67%). The percentage that corrected was higher in exploratory than confirmatory trials, although this is explained by a greater proportion of exploratory trials testing multiple doses and regimens of the same treatment. Conclusions: A sizeable proportion of published multi-arm trials do not correct for multiple-testing. Clearer guidance about whether multiple-testing correction is needed for multi-arm trials that test separate treatments against a common control group is required.
TL;DR: An empirical approach for estimating genome‐wide significance thresholds for data arising from WGS studies is proposed and it is demonstrated that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region.
Abstract: Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices. Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10−8 and 8 × 10−8 for our analytic choices in window-based testing, and thresholds of 0.6 × 10−8–1.5 × 10−8 for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.
TL;DR: The book is a summary of multipleTesting in pharmaceutical clinical trials, covering a broad range of topics from multiple testing in dose-finding studies, adaptive designed studies and microarray experiments to discussing the regulatory issues and modern multiplicity procedures such as gatekeeping.
Abstract: The book is a summary of multiple testing in pharmaceutical clinical trials, covering a broad range of topics from multiple testing in dose-finding studies, adaptive designed studies and microarray experiments to discussing the regulatory issues and modern multiplicity procedures such as gatekeeping. It is aimed predominantly at biostatisticians working in preclinical and clinical trials. The book consists of seven chapters. Each chapter starts with an introduction to the multiple testing issues, followed by a theoretical description of the available statistical techniques and examples. However, the book does not cover all areas of multiplicity faced by a clinical trial biostatisticians, such as pharmacokinetic/pharmacodynamic modelling, and Bayesian theory is not considered. Chapter 1 begins the book with a summary of multiplicity problems from a regulatory perspective. The chapter is written by two influential regulatory statisticians from the European and FDA regulatory environments and provides a broad overview of the different areas where multiplicity issues may arise. The chapter covers the regulatory issues with regard to multiple endpoints, multiple dose comparisons, subgroups and multiplicity concerns in special situations. It also provides some methods for reducing the multiplicity within clinical trials, such as using hierarchical ordering and composite endpoints, and mentions situations where adjustment is not considered necessary. Chapter 2 provides the theoretical foundation for the rest of the book. The authors begin with introducing the different error rates, including the definition and importance of the family-wise error rate. The authors go on to explain popular multiple testing principles (union-intersection and intersection-union testing, and the closure and partitioning principles), followed by clear explanations (with examples) of commonly used approaches to multiple testing, such as procedures based on univariate p-values (e.g. Bonferroni, Fallback and Hochberg), parametric testing procedures (e.g. Dunnett procedures) and resampling-based procedures. Chapter 3 gives an overview of multiple testing problems in dose-finding studies and how trend tests are used for the detection of dose–response signals. The authors also provide definitions for the minimum effective and maximum safe doses and how they are estimated. Power and sample size calculations are also included in the chapter for the maximum safe dose estimation. Finally, the authors explain model-based methods that can be used to estimate an adequate dose to achieve a desired response. Chapter 4 continues the principles and procedures introduced in Chapter 2, covering different methods for adjusting for multiple endpoints. There is a lot of repetition within the ‘at-least-one’ procedures section and the contents of Chapter 2, which is noticeable when reading the book in order, but does allow for the two chapters to be stand-alone. As well as at-least-one procedures, the chapter includes global procedures for assessing the overall efficacy of a treatment, ‘all-or-none’ procedures and the superiority-noninferiority approach. Chapter 5 focuses on gatekeeping procedures, which offer a more flexible hierarchical structure than the fixed-sequence procedure (introduced in Chapter 2). The authors walk the reader through serial, parallel and tree gatekeeping procedures, using worked (easy to understand) examples along the way. The chapter could be improved by discussing the graphical approach published by Bretz et al. [1] when explaining implementing gatekeeping procedures. Chapter 6 starts with a review of the design and analysis of adaptive trials and the multiplicity issues that can arise above those already present in fixed design trials. Repeated hypothesis testing at interim analyses and sample size adjustment (both in a blinded and unblinded setting) are reviewed, including discussion on stopping boundaries and formulae for updating the sample size. The authors discuss applications of the closure procedure to adaptive designs based on combination tests and conditional error rates, which allow trial design modifications based on unblinded interim data. The chapter finishes with a description of two case studies based on adaptive treatment selection and subgroup selection at an interim analysis. The final chapter covers the design and analysis of microarray experiments for pharmacogenomics. The chapter starts with a clear overview of microarrays and introduces the two stages of pharmacogenetic development. The introduction is easy to understand, even for readers lacking experience in the field. The multiplicity concerns around individual biomarkers and subgroups are then discussed, along with the control of multiple error rates (not just the family-wise error rate). The design of pharmacogenomic studies is demonstrated with the use of a case study. In conclusion, the book is well written and covers a wide range of clinical trial settings in which multiple testing issues arise. Descriptions of the procedures are supported with clinical trial-related examples, with many of the chapters providing guidance into the implementation in commonly used software applications (SAS and R), making the book a useful tool for biostatisticians dealing with multiple testing problems in clinical trials. Each chapter begins with an overview of the multiple testing issues faced within different clinical trial settings, which may be of interest to clinical trial practitioners. However, with the
TL;DR: The 𝖱 package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures, and explains their applicability to general data sets.
Abstract: The 𝖱 package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures Unlike many multiple testing schemes, these methods specifically incorporate existing information about the grouped or hierarchical dependence between hypotheses under consideration while controlling the false discovery rate Doing so increases statistical power and interpretability Furthermore, these procedures provide novel approaches to the central problem of encoding complex dependency between hypotheses We briefly describe the group Benjamini-Hochberg and hierarchical false discovery rate procedures and then illustrate them using two examples, one a measure of ecological microbial abundances and the other a global temperature time series For both procedures, we detail the steps associated with the analysis of these particular data sets, including establishing the dependence structures, performing the test, and interpreting the results These steps are encapsulated by 𝖱 functions, and we explain their applicability to general data sets
TL;DR: Modifications to the Lancaster procedure are proposed by taking the correlation structure among p-values into account, and a novel association between B cell pathways and allograft tolerance is identified.
Abstract: Rapid developments in molecular technology have yielded a large amount of high throughput genetic data to understand the mechanism for complex traits. The increase of genetic variants requires hundreds and thousands of statistical tests to be performed simultaneously in analysis, which poses a challenge to control the overall Type I error rate. Combining p-values from multiple hypothesis testing has shown promise for aggregating effects in high-dimensional genetic data analysis. Several p-value combining methods have been developed and applied to genetic data; see [Dai, et al. 2012b] for a comprehensive review. However, there is a lack of investigations conducted for dependent genetic data, especially for weighted p-value combining methods. Single nucleotide polymorphisms (SNPs) are often correlated due to linkage disequilibrium. Other genetic data, including variants from next generation sequencing, gene expression levels measured by microarray, protein and DNA methylation data, etc. also contain complex correlation structures. Ignoring correlation structures among genetic variants may lead to severe inflation of Type I error rates for omnibus testing of p-values. In this work, we propose modifications to the Lancaster procedure by taking the correlation structure among p-values into account. The weight function in the Lancaster procedure allows meaningful biological information to be incorporated into the statistical analysis, which can increase the power of the statistical testing and/or remove the bias in the process. Extensive empirical assessments demonstrate that the modified Lancaster procedure largely reduces the Type I error rates due to correlation among p-values, and retains considerable power to detect signals among p-values. We applied our method to reassess published renal transplant data, and identified a novel association between B cell pathways and allograft tolerance.
TL;DR: A multiple testing problem in which each null hypothesis in the family of null hypotheses specifies whether the program has an effect on the outcome of interest for a particular sub‐population, in the context of PROGRESA, a large‐scale poverty‐reduction program in Mexico.
TL;DR: It is demonstrated that applying correction procedures developed in the statistics literature can fully address the issue of testing for heterogeneous treatment effects, and the implications of multiple testing adjustments for power calculations and experimental design are discussed.
Abstract: We review the statistical models applied to test for heterogeneous treatment effects in the recent empirical literature, with a particular focus on data from randomised field experiments. We show that testing for heterogeneous treatment effects is highly common, and likely to result in a large number of false discoveries when conventional decision rules are applied. We demonstrate that applying correction procedures developed in the statistics literature can fully address this issue, and discuss the implications of multiple testing adjustments for power calculations and experimental design.
TL;DR: It is demonstrated that, unlike when testing direct associations, replacing the Bonferroni correction with a permutation approach that focuses on the maximum of the test statistics can significantly improve the power to detect mediators even when all biomarkers are independent.
Abstract: Motivation: Modern biomedical and epidemiological studies often measure hundreds or thousands of biomarkers, such as gene expression or metabolite levels. Although there is an extensive statistical literature on adjusting for ‘multiple comparisons’ when testing whether these biomarkers are directly associated with a disease, testing whether they are biological mediators between a known risk factor and a disease requires a more complex null hypothesis, thus offering additional methodological challenges. Results: We propose a permutation approach that tests multiple putative mediators and controls the family wise error rate. We demonstrate that, unlike when testing direct associations, replacing the Bonferroni correction with a permutation approach that focuses on the maximum of the test statistics can significantly improve the power to detect mediators even when all biomarkers are independent. Through simulations, we show the power of our method is 2–5� larger than the power achieved by Bonferroni correction. Finally, we apply our permutation test to a case-control study of dietary risk factors and colorectal adenoma to show that, of 149 test metabolites, docosahexaenoate is a possible mediator between fish consumption and decreased colorectal adenoma risk. Availability and implementation: R-package included in online Supplementary Material.
TL;DR: In this article, the authors explain the multiple comparison problem in multiway analysis of variance and demonstrate that researchers almost never correct for it, and propose several correction procedures (i.e., sequential Bonferroni) and show that their application alters at least one of the substantive conclusions in 45 of 60 articles considered.
Abstract: Many empirical researchers do not realize that the common multiway analysis of variance (ANOVA) harbors a multiple comparison problem. In the case of two factors, three separate null hypotheses are subject to test (i.e., two main effects and one interaction). Consequently, the probability of at least one Type I error (if all null hypotheses are true) is 14% rather than 5% if the three tests are independent. We explain the multiple comparison problem and demonstrate that researchers almost never correct for it. We describe one of several correction procedures (i.e., sequential Bonferroni), and show that its application alters at least one of the substantive conclusions in 45 out of 60 articles considered. An additional method to mitigate the multiplicity in multiway ANOVA is preregistration of hypotheses.
TL;DR: FDR smoothing automatically finds spatially localized regions of significant test statistics, and then relaxes the threshold of statistical significance within these regions, and tightens it elsewhere, in a manner that controls the overall false discovery rate at a given level.
Abstract: We present false discovery rate smoothing, an empirical-Bayes method for exploiting spatial structure in large multiple-testing problems. FDR smoothing automatically finds spatially localized regions of significant test statistics. It then relaxes the threshold of statistical significance within these regions, and tightens it elsewhere, in a manner that controls the overall false-discovery rate at a given level. This results in increased power and cleaner spatial separation of signals from noise. The approach requires solving a non-standard high-dimensional optimization problem, for which an efficient augmented-Lagrangian algorithm is presented. In simulation studies, FDR smoothing exhibits state-of-the-art performance at modest computational cost. In particular, it is shown to be far more robust than existing methods for spatially dependent multiple testing. We also apply the method to a data set from an fMRI experiment on spatial working memory, where it detects patterns that are much more biologically plausible than those detected by standard FDR-controlling methods. All code for FDR smoothing is publicly available in Python and R.
TL;DR: A novel group sequential design is developed that incorporates adaptive choice of the patient subgroup among several possibilities which include the entire patient population as a choice and shows how asymptotically optimal tests can be constructed by using generalized likelihood ratio statistics for parametric problems and analogous standardized or Studentized statistics for nonparametric tests.
TL;DR: This work takes a genome region as a basic unit of interaction analysis and uses high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms within two genome regions.
Abstract: The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10) in the ESP, and 11 were replicated in the CHARGE-S study.
TL;DR: In this paper, a new estimator of the proportion of true null hypotheses was proposed, which is less upwardly biased than Storey's estimator and two other estimators.
Abstract: We consider multiple testing with false discovery rate (FDR) control when p-values have discrete and heterogeneous null distributions. We propose a new estimator of the proportion of true null hypotheses and demonstrate that it is less upwardly biased than Storey's estimator and two other estimators. The new estimator induces two adaptive procedures, i.e., an adaptive Benjamini-Hochberg (BH) procedure and an adaptive Benjamini-Hochberg-Heyse (BHH) procedure. We prove that the the adaptive BH procedure is conservative non-asymptotically. Through simulation studies, we show that these procedures are usually more powerful than their non-adaptive counterparts and that the adaptive BHH procedure is usually more powerful than the adaptive BH procedure and a procedure based on randomized p-value. The adaptive procedures are applied to a study of HIV vaccine efficacy, where they identify more differentially polymorphic positions than the BH procedure at the same FDR level.
TL;DR: This study systematically investigated the impact of three different permutation test approaches for over-representation analysis to detect false positive pathway candidates and evaluated them on genome-wide association data of Dilated Cardiomyopathy and Ulcerative Colitis to provide evidence that the gold standard - permuting the case–control status – effectively improves specificity of GWAS pathway analysis.
Abstract: Genome wide association studies (GWAS) are applied to identify genetic loci, which are associated with complex traits and human diseases. Analogous to the evolution of gene expression analyses, pathway analyses have emerged as important tools to uncover functional networks of genome-wide association data. Usually, pathway analyses combine statistical methods with a priori available biological knowledge. To determine significance thresholds for associated pathways, correction for multiple testing and over-representation permutation testing is applied. We systematically investigated the impact of three different permutation test approaches for over-representation analysis to detect false positive pathway candidates and evaluate them on genome-wide association data of Dilated Cardiomyopathy (DCM) and Ulcerative Colitis (UC). Our results provide evidence that the gold standard - permuting the case–control status – effectively improves specificity of GWAS pathway analysis. Although permutation of SNPs does not maintain linkage disequilibrium (LD), these permutations represent an alternative for GWAS data when case–control permutations are not possible. Gene permutations, however, did not add significantly to the specificity. Finally, we provide estimates on the required number of permutations for the investigated approaches. To discover potential false positive functional pathway candidates and to support the results from standard statistical tests such as the Hypergeometric test, permutation tests of case control data should be carried out. The most reasonable alternative was case–control permutation, if this is not possible, SNP permutations may be carried out. Our study also demonstrates that significance values converge rapidly with an increasing number of permutations. By applying the described statistical framework we were able to discover axon guidance, focal adhesion and calcium signaling as important DCM-related pathways and Intestinal immune network for IgA production as most significant UC pathway.
TL;DR: A new multiple testing approach is proposed, constructed by combining an Intersection Union Test (IUT) with the Holm correction, which strongly controls the family-wise error rate (FWER) without any additional assumptions on the joint distribution of the test statistics or dependence structure of the markers.
Abstract: Linkage Disequilibrium (LD) is a powerful approach for the identification and characterization of morphological shape, which usually involves multiple genetic markers. However, multiple testing corrections substantially reduce the power of the associated tests. In addition, the principle component analysis (PCA), used to quantify the shape variations into several principal phenotypes, further increases the number of tests. As a result, a powerful multiple testing correction for simultaneous large-scale gene-shape association tests is an essential part of determining statistical significance. Bonferroni adjustments and permutation tests are the most popular approaches to correcting for multiple tests within LD based Quantitative Trait Loci (QTL) models. However, permutations are extremely computationally expensive and may mislead in the presence of family structure. The Bonferroni correction, though simple and fast, is conservative and has low power for large-scale testing. We propose a new multiple testing approach, constructed by combining an Intersection Union Test (IUT) with the Holm correction, which strongly controls the family-wise error rate (FWER) without any additional assumptions on the joint distribution of the test statistics or dependence structure of the markers. The power improvement for the Holm correction, as compared to the standard Bonferroni correction, is examined through a simulation study. A consistent and moderate increase in power is found under the majority of simulated circumstances, including various sample sizes, Heritabilities, and numbers of markers. The power gains are further demonstrated on real leaf shape data from a natural population of poplar, Populus szechuanica var tietica, where more significant QTL associated with morphological shape are detected than under the previously applied Bonferroni adjustment. The Holm correction is a valid and powerful method for assessing gene-shape association involving multiple markers, which not only controls the FWER in the strong sense but also improves statistical power.
TL;DR: In this article, a single-index modulated (SIM) multiple testing procedure was proposed, which maintains control of the false discovery rate while incorporating prior information, by assuming the availability of a bivariate $p$-value, $(p_1,p_2)$ for each hypothesis, where $p 1$ is a preliminary $p $-value from prior information and $p 2$ is the primary value for the ultimate analysis.
Abstract: In the context of large-scale multiple testing, hypotheses are often accompanied with certain prior information. In this paper, we present a single-index modulated (SIM) multiple testing procedure, which maintains control of the false discovery rate while incorporating prior information, by assuming the availability of a bivariate $p$-value, $(p_1,p_2)$, for each hypothesis, where $p_1$ is a preliminary $p$-value from prior information and $p_2$ is the primary $p$-value for the ultimate analysis. To find the optimal rejection region for the bivariate $p$-value, we propose a criteria based on the ratio of probability density functions of $(p_1,p_2)$ under the true null and nonnull. This criteria in the bivariate normal setting further motivates us to project the bivariate $p$-value to a single-index, $p(\theta)$, for a wide range of directions $\theta$. The true null distribution of $p(\theta)$ is estimated via parametric and nonparametric approaches, leading to two procedures for estimating and controlling the false discovery rate. To derive the optimal projection direction $\theta$, we propose a new approach based on power comparison, which is further shown to be consistent under some mild conditions. Simulation evaluations indicate that the SIM multiple testing procedure improves the detection power significantly while controlling the false discovery rate. Analysis of a real dataset will be illustrated.
TL;DR: A new approach is introduced to overcome the limitations of standard methods, in which active voxels are detected according to a consensus on several random parcellations of the brain images, while a permutation test controls the false positive risk.
TL;DR: Descriptive metrics for both parametric and nonparametric data distributions are described where normality and homogeneity of variances are determined to understand the basic properties of data distributions.
Abstract: The purpose of this chapter is to present to the experimenter some of the basic and special statistical tools for data reporting, identification of aberrant values, and basic methods for hypothesis testing using univariate and multivariate tools. Descriptive metrics for both parametric and nonparametric data distributions are described where normality and homogeneity of variances are determined to understand the basic properties of data distributions. Multiple comparison tests and analysis of variance are presented for hypothesis testing and seeking differences in biomarkers between various treatment groups. Trend analysis of biomarker data is also examined using correlations and factorial analyses. The concept of residuals in simple and multiple linear regression is also explained, which is often used in biomarker research in ecotoxicology to identify interactions between biomarker variables and treatment groups or sampling sites or times. Data mining analysis using factorial and discriminant function for site or treatment classification is also proposed to reduce the complexity of large datasets and identify the principal or major biomarker responses. An introduction to artificial neural networks is also given to identify inter-relationships between biomarker responses in complex and large datasets to predict effects at different levels of biological organization. These networks have the capacity to “learn” from real-life data (artificial intelligence) and find (nonlinear) trends and relationships in complex datasets.
TL;DR: In this article, a decision-theoretic interpretation of the confidence distribution was used to obtain a lower bound on the mixing proportion of true null hypotheses, which can yield reliable hypothesis tests and confidence intervals given as few as one comparison.
Abstract: Summary
Empirical Bayes methods of estimating the local false discovery rate (LFDR) by maximum likelihood estimation (MLE), originally developed for large numbers of comparisons, are applied to a single comparison. Specifically, when assuming a lower bound on the mixing proportion of true null hypotheses, the LFDR MLE can yield reliable hypothesis tests and confidence intervals given as few as one comparison. Simulations indicate that constrained LFDR MLEs perform markedly better than conventional methods, both in testing and in confidence intervals, for high values of the mixing proportion, but not for low values. (A decision-theoretic interpretation of the confidence distribution made those comparisons possible.) In conclusion, the constrained LFDR estimators and the resulting effect-size interval estimates are not only effective multiple comparison procedures but also they might replace p-values and confidence intervals more generally. The new methodology is illustrated with the analysis of proteomics data.
TL;DR: The principal aim of this package is to implement SGoF-type multiple testing methods, known to be more powerful than the classical false discovery rate (FDR) and family-wise error rate (FWER) based methods in certain situations, particularly when the number of tests is large.
Abstract: In this paper we present a new R package called sgof for multiple hypothesis testing. The principal aim of this package is to implement SGoF-type multiple testing methods, known to be more powerful than the classical false discovery rate (FDR) and family-wise error rate (FWER) based methods in certain situations, particularly when the number of tests is large. This package includes Binomial and Conservative SGoF and the Bayesian and Beta-Binomial SGoF multiple testing procedures, which are adaptations of the original SGoF method to the Bayesian setting and to possibly correlated tests, respectively. The sgof package also implements the Benjamini-Hochberg and Benjamini-Yekutieli FDR controlling procedures. For each method the package provides (among other things) the number of rejected null hypotheses, estimation of the corresponding FDR, and the set of adjusted p values. Some automatic plots of interest are implemented too. Two real data examples are used to illustrate how sgof works.
TL;DR: Under certain conditions, the new UPT procedure achieves the fastest convergence rate of marginal false non-discovery rates, while controlling the marginal false discovery rate at any designated level $\alpha$ asymptotically.
Abstract: Multiple testing and variable selection have gained much attention in statistical theory and methodology research. They are dealing with the same problem of identifying the important variables among many (Jin, 2012). However, there is little overlap in the literature. Research on variable selection has been focusing on selection consistency, i.e., both type I and type II errors converging to zero. This is only possible when the signals are sufficiently strong, contrary to many modern applications. For the regime where the signals are both rare and weak, it is inevitable that a certain amount of false discoveries will be allowed, as long as some error rate can be controlled. In this paper, motivated by the research by Ji and Jin (2012) and Jin (2012) in the rare/weak regime, we extend their UPS procedure for variable selection to multiple testing. Under certain conditions, the new UPT procedure achieves the fastest convergence rate of marginal false non-discovery rates, while controlling the marginal false discovery rate at any designated level $\alpha$ asymptotically. Numerical results are provided to demonstrate the advantage of the proposed method.
TL;DR: The statistics based on the Dempster trace criterion are given, and the approximate upper percentiles are derived by using the Bonferroni’s inequality for pairwise multiple comparisons and multiple comparisons with a control among mean vectors under the multivariate normality.
Abstract: We consider pairwise multiple comparisons and multiple comparisons with a control among mean vectors for high-dimensional data under the multivariate normality For such cases, the statistics based on the Dempster trace criterion are given, and also their approximate upper percentiles are derived by using the Bonferroni’s inequality Finally, the accuracy of their approximate values is evaluated by Monte Carlo simulation
TL;DR: A rank-based statistical meta-analysis framework is proposed that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons, and shows that it is possible to perform a meta- analysis of transcriptomics Studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons.
Abstract: Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons.
TL;DR: In this article, the authors propose general and flexible stepup and stepdown procedures for testing multiple hypotheses about sequential data that simultaneously control both the type I and II versions of FDP and FWER.
Abstract: The $\gamma$-FDP and $k$-FWER multiple testing error metrics, which are tail probabilities of the respective error statistics, have become popular recently as less-stringent alternatives to the FDR and FWER. We propose general and flexible stepup and stepdown procedures for testing multiple hypotheses about sequential (or streaming) data that simultaneously control both the type I and II versions of $\gamma$-FDP, or $k$-FWER. The error control holds regardless of the dependence between data streams, which may be of arbitrary size and shape. All that is needed is a test statistic for each data stream that controls the conventional type I and II error probabilities, and no information or assumptions are required about the joint distribution of the statistics or data streams. The procedures can be used with sequential, group sequential, truncated, or other sampling schemes. We give recommendations for the procedures' implementation including closed-form expressions for the needed critical values in some commonly-encountered testing situations. The proposed sequential procedures are compared with each other and with comparable fixed sample size procedures in the context of strongly positively correlated Gaussian data streams. For this setting we conclude that both the stepup and stepdown sequential procedures provide substantial savings over the fixed sample procedures in terms of expected sample size, and the stepup procedure performs slightly but consistently better than the stepdown for $\gamma$-FDP control, with the relationship reversed for $k$-FWER control.
TL;DR: The findings indicate that the proposed method should be used to maximize power and control type 1 errors when analyzing genetic data using additive, dominant, and recessive models.
Abstract: Several methods have been proposed to account for multiple comparisons in genetic association studies. However, investigators typically test each of the SNPs using multiple genetic models. Association testing using the Cochran-Armitage test for trend assuming an additive, dominant, or recessive genetic model, is commonly performed. Thus, each SNP is tested three times. Some investigators report the smallest p-value obtained from the three tests corresponding to the three genetic models, but such an approach inherently leads to inflated type 1 errors. Because of the small number of tests (three) and high correlation (functional dependence) among these tests, the procedures available for accounting for multiple tests are either too conservative or fail to meet the underlying assumptions (e.g., asymptotic multivariate normality or independence among the tests). We propose a method to calculate the exact p-value for each SNP using different genetic models. We performed simulations, which demonstrated the control of type 1 error and power gains using the proposed approach. We applied the proposed method to compute p-value for a polymorphism eNOS -786T>C which was shown to be associated with breast cancer risk. Our findings indicate that the proposed method should be used to maximize power and control type 1 errors when analyzing genetic data using additive, dominant, and recessive models.
TL;DR: In this paper, the authors derived an optimal empirical Bayes testing procedure to detect variants for next-generation sequencing (NGS) study and proved that their testing procedure is valid and optimal in the sense of rejecting the maximum number of nonnulls while the Bayesian false discovery rate is controlled at a given nominal level.
Abstract: Because of the decreasing cost and high digital resolution, next-generation sequencing (NGS) is expected to replace the traditional hybridization-based microarray technology. For genetics study, the first-step analysis of NGS data is often to identify genomic variants among sequenced samples. Several statistical models and tests have been developed for variant calling in NGS study. The existing approaches, however, are based on either conventional Bayesian or frequentist methods, which are unable to address the multiplicity and testing efficiency issues simultaneously. In this paper, we derive an optimal empirical Bayes testing procedure to detect variants for NGS study. We utilize the empirical Bayes technique to exploit the across-site information among many testing sites in NGS data. We prove that our testing procedure is valid and optimal in the sense of rejecting the maximum number of nonnulls while the Bayesian false discovery rate is controlled at a given nominal level. We show by both simulation studies and real data analysis that our testing efficiency can be greatly enhanced over the existing frequentist approaches that fail to pool and utilize information across the multiple testing sites.
TL;DR: In this article, the Chen-Stein theorem is used to show that family-wise error rate can be controlled for cluster-dependent microRNAs under weak assumptions, and the theory is illustrated with an analysis of real data, a microRNA expression data set on Finnish (aggressive and non-aggressive) prostate cancer patients and their controls.
Abstract: New statistical procedures are introduced to analyse typical microRNA expression data sets. For each separate microRNA expression, the null hypothesis to be tested is that there is no difference between the distributions of the expression in different groups. The test statistics are then constructed having certain type of alternatives in mind. To avoid strong (parametric) distributional assumptions, the alternatives are formulated using probabilities of different orders of pairs or triples of observations coming from different groups, and the test statistics are then constructed using corresponding several-sample U-statistics, natural estimates of these probabilities. Classical several-sample rank test statistics, such as the Kruskal–Wallis and Jonckheere–Terpstra tests, are special cases in our approach. Also, as the number of variables (microRNAs) is huge, we confront a serious simultaneous testing problem. Different approaches to control the family-wise error rate or the false discovery rate are shortly discussed, and it is shown how the Chen–Stein theorem can be used to show that family-wise error rate can be controlled for cluster-dependent microRNAs under weak assumptions. The theory is illustrated with an analysis of real data, a microRNA expression data set on Finnish (aggressive and non-aggressive) prostate cancer patients and their controls.
TL;DR: The proposed hierarchical models yield more efficient estimates of parameters than the traditional methods that analyze genetic variants separately, and also coherently address the multiple comparisons problem due to largely reducing the effective number of genetic effects and the number of statistically “significant” effects.
Abstract: Multiple comparisons or multiple testing has been viewed as a thorny issue in genetic association studies aiming to detect disease-associated genetic variants from a large number of genotyped variants. We alleviate the problem of multiple comparisons by proposing a hierarchical modeling approach that is fundamentally different from the existing methods. The proposed hierarchical models simultaneously fit as many variables as possible and shrink unimportant effects towards zero. Thus, the hierarchical models yield more efficient estimates of parameters than the traditional methods that analyze genetic variants separately, and also coherently address the multiple comparisons problem due to largely reducing the effective number of genetic effects and the number of statistically "significant" effects. We develop a method for computing the effective number of genetic effects in hierarchical generalized linear models, and propose a new adjustment for multiple comparisons, the hierarchical Bonferroni correction, based on the effective number of genetic effects. Our approach not only increases the power to detect disease-associated variants but also controls the Type I error. We illustrate and evaluate our method with real and simulated data sets from genetic association studies. The method has been implemented in our freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).