TL;DR: Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue.
Abstract: Popular procedures to control the chance of making type I errors when multiple statistical tests are performed come at a high cost: a reduction in power. As the number of tests increases, power for an individual test may become unacceptably low. This is a consequence of minimizing the chance of making even a single type I error, which is the aim of, for instance, the Bonferroni and sequential Bonferroni procedures. An alternative approach, control of the false discovery rate (FDR), has recently been advocated for ecological studies. This approach aims at controlling the proportion of significant results that are in fact type I errors. Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue. To encourage practical use of the approach, in this note we illustrate how the proposed procedure works, we compare it to more traditional methods that control the familywise error rate, and we discuss some recent useful developments in FDR control.
TL;DR: A novel framework for small-sample inference of graphical models from gene expression data that focuses on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes is introduced.
Abstract: Motivation: Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem.
Methods: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology.
Results: Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes.
Availability: The authors have implemented the approach in the R package 'GeneTS' that is freely available from http://www.stat.uni-muenchen.de/~strimmer/genets/, from the R archive (CRAN) and from the Bioconductor website.
Contact: korbinian.strimmer@lmu.de
TL;DR: A technique for making statistical comparisons of ALE meta‐analyses is proposed and its efficacy on different groups of foci divided by task or response type and random groups of similarly obtained foci is investigated.
Abstract: Activation likelihood estimation (ALE) has greatly advanced voxel-based meta-analysis research in the field of functional neuroimaging. We present two improvements to the ALE method. First, we evaluate the feasibility of two techniques for correcting for multiple comparisons: the single threshold test and a procedure that controls the false discovery rate (FDR). To test these techniques, foci from four different topics within the literature were analyzed: overt speech in stuttering subjects, the color-word Stroop task, picture-naming tasks, and painful stimulation. In addition, the performance of each thresholding method was tested on randomly generated foci. We found that the FDR method more effectively controls the rate of false positives in meta-analyses of small or large numbers of foci. Second, we propose a technique for making statistical comparisons of ALE meta-analyses and investigate its efficacy on different groups of foci divided by task or response type and random groups of similarly obtained foci. We then give an example of how comparisons of this sort may lead to advanced designs in future meta-analytic research.
TL;DR: The results in this note show that, when combining P‐values from multiple tests of the same hypothesis, the weighted Z‐method should be preferred.
Abstract: The most commonly used method in evolutionary biology for combining information across multiple tests of the same null hypothesis is Fisher's combined probability test. This note shows that an alternative method called the weighted Z-test has more power and more precision than does Fisher's test. Furthermore, in contrast to some statements in the literature, the weighted Z-method is superior to the unweighted Z-transform approach. The results in this note show that, when combining P-values from multiple tests of the same hypothesis, the weighted Z-method should be preferred.
TL;DR: In this article, the authors proposed the false coverage-statement rate (FCR) as a measure of interval coverage following selection, and proposed a general procedure to construct a marginal CI for each selected parameter, but instead of the confidence level 1 − q being used marginally, q is divided by the number of parameters considered and multiplied by the selected.
Abstract: Often in applied research, confidence intervals (CIs) are constructed or reported only for parameters selected after viewing the data. We show that such selected intervals fail to provide the assumed coverage probability. By generalizing the false discovery rate (FDR) approach from multiple testing to selected multiple CIs, we suggest the false coverage-statement rate (FCR) as a measure of interval coverage following selection. A general procedure is then introduced, offering FCR control at level q under any selection rule. The procedure constructs a marginal CI for each selected parameter, but instead of the confidence level 1 − q being used marginally, q is divided by the number of parameters considered and multiplied by the number selected. If we further use the FDR controlling testing procedure of Benjamini and Hochberg for selecting the parameters, the newly suggested procedure offers CIs that are dual to the testing procedure and are shown to be optimal in the independent case. Under the positive re...
TL;DR: The multtest package as discussed by the authors implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTPs) for controlling a broad class of Type I error rates.
Abstract: The Bioconductor R package multtest implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates. The current version of multtest provides MTPs for tests concerning means, differences in means, and regression parameters in linear and Cox proportional hazards models. Typical testing scenarios are illustrated by applying various MTPs implemented in multtest to the Acute Lymphoblastic Leukemia (ALL) data set of Chiaretti et al. (2004), with the aim of identifying genes whose expression measures are associated with (possibly censored) biological and clinical outcomes.
TL;DR: In this paper, a statistical framework is proposed to detect statistical signifcance of a biological process within a group of interesting genes when compared to a reference group, based on functional annotation provided by the Gene Ontology.
Abstract: Increasingly used high throughput experimental techniques, like DNA or protein microarrays give as a result groups of interesting, e.g. difierentially regulated genes which require further biological interpretation. With the systematic functional annotation provided by the Gene Ontology the information required to automate the interpretation task is now accessible. However, the determination of statistical signiflcance of a biological process within these groups is still an open question. In answering this question, multiple testing issues must be taken into account to avoid misleading results. Here we present a statistical framework that tests whether functions, processes or locations described in the Gene Ontology are signiflcantly enriched within a group of interesting genes when compared to a reference group. First we deflne an exact analytical expression for the expected number of false positives that allows us to calculate adjusted p-values to control the false discovery rate. Next, we demonstrate and discuss the capabilities of our approach using publicly available microarray data on cell-cycle regulated genes. Further, we analyze the robustness of our framework with respect to the exact gene group composition and compare the performance with earlier approaches. The software package GOSSIP implements our method and is made freely available at http://gossip.gene-groups.net/.
TL;DR: A new methodology that has not previously been used to evaluate economic forecasts: multiple comparisons is introduced, which concludes that the accuracy of the various methods does differ significantly, and that some methods are significantly better than others.
TL;DR: In this paper, the authors consider two generalizations of the FWER, i.e., the k-FWER and the false discovery proportion (FDP), defined as the number of false rejections divided by the total number of rejections and defined to be 0 if there are no rejections.
Abstract: Consider the multiple testing problem of testing null hypotheses H 1 ,...,H s . A classical approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. But if s is large, control of the FWER is so stringent that the ability of a procedure that controls the FWER to detect false null hypotheses is limited. It is therefore desirable to consider other measures of error control. This article considers two generalizations of the FWER. The first is the k-FWER, in which one is willing to tolerate k or more false rejections for some fixed k ≥ I. The second is based on the false discovery proportion (FDP), defined to be the number of false rejections divided by the total number of rejections (and defined to be 0 if there are no rejections). Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289-300] proposed ccntrol of the false discovery rate (FDR), by which they meant that, for fixed a, E(FDP) ≤ a. Here, we consider control of the FDP in the sense that, for fixed y and a, P{FDP > γ} ≤ a. Beginning with any nondecreasing sequence of constants and p-values for the individual tests, we derive stepup procedures that control each of these two measures of error control without imposing any assumptions on the dependence structure of the p-values. We use our results to point out a few interesting connections with some closely related stepdown procedures. We then compare and contrast two FDP-controlling procedures obtained using our results with the stepup procedure for control of the FDR of Benjamini and Yekutieli [Ann. Statist. 29 (2001) 1165-1188].
TL;DR: Power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele, and it is found that power of TP is generally superior to that for the other procedures, including TR.
Abstract: A goal of association analysis is to determine whether variation in a particular candidate region or gene is associated with liability to complex disease. To evaluate such candidates, ubiquitous Single Nucleotide Polymorphisms (SNPs) are useful. It is critical, however, to select a set of SNPs that are in substantial linkage disequilibrium (LD) with all other polymorphisms in the region. Whether there is an ideal statistical framework to test such a set of 'tag SNPs' for association is unknown. Compared to tests for association based on frequencies of haplotypes, recent evidence suggests tests for association based on linear combinations of the tag SNPs (Hotelling T(2) test) are more powerful. Following this logical progression, we wondered if single-locus tests would prove generally more powerful than the regression-based tests? We answer this question by investigating four inferential procedures: the maximum of a series of test statistics corrected for multiple testing by the Bonferroni procedure, T(B), or by permutation of case-control status, T(P); a procedure that tests the maximum of a smoothed curve fitted to the series of of test statistics, T(S); and the Hotelling T(2) procedure, which we call T(R). These procedures are evaluated by simulating data like that from human populations, including realistic levels of LD and realistic effects of alleles conferring liability to disease. We find that power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele. The clearest pattern emerges between power and the number of SNPs selected. When a large fraction of the SNPs within a gene are tested, and multiple SNPs are highly correlated with the liability allele, T(S) has better power. Using a SNP selection scheme that optimizes power but also requires a substantial number of SNPs to be genotyped (roughly 10-20 SNPs per gene), power of T(P) is generally superior to that for the other procedures, including T(R). Finally, when a SNP selection procedure that targets a minimal number of SNPs per gene is applied, the average performances of T(P) and T(R) are indistinguishable.
TL;DR: A modified FDR procedure for discrete data is developed and applies it to the human immunodeficiency virus data, detecting 15 positions with significantly different mutation rates compared with 11 that are detected by the original FDR method.
Abstract: to determine statistical significance. When the test statistics have discrete distributions, the FDR procedure can be made more powerful by a simple modification. The paper develops a modified FDR procedure for discrete data and applies it to the human immunodeficiency virus data. The new procedure detects 15 positions with significantly different mutation rates compared with 11 that are detected by the original FDR method. Simulations delineate conditions under which the modified FDR procedure confers large gains in power over the original technique. In general FDR adjustment methods can be improved for discrete data by incorporating the modification proposed.
TL;DR: Empirical and theoretical considerations show that Nyholt’s approach may be useful as an exploratory tool, but it is not an adequate substitute for permutation tests.
Abstract: Objective: A simple method for accounting efficiently for multiple testing of many SNPs in an association study was recently proposed by Nyholt, but its performance was not extensively evaluated. The method involves estimating an ‘effective number’ of independent tests and then adjusting the smallest observed p value using Sidak’s formula based on this number of tests. We sought to carry out an empirical and theoretical evaluation of Nyholt’s method. Methods: Nyholt’s method was applied to a sample of 31 genes typed at a total of 291 SNPs and permutation used to determine the type-I error rate for each gene. Based on our empirical results, we algebraically investigated the effective number of independent tests for a simple model of haplotype block structure. Results: The nominal 5% type I error rate varied from under 3% to over 7%, and was dependent on linkage disequilibrium. Theoretical considerations show further that the method can be very conservative in the presence of haplotype block structure. Conclusion: Although Nyholt’s approach may be useful as an exploratory tool, it is not an adequate substitute for permutation tests.
TL;DR: It is concluded that the better strategy to deal with phase ambiguities is to assigning to each individual its list of weighted haplotype explanations, rather than to assign to eachindividual its most likely haplotype explanation.
Abstract: Summary
We have lately presented a testing procedure for family data which accounts for the multiple testing problem that is induced by the enormous number of different marker combinations that can be analyzed in a set of tightly linked markers. Most methods of haplotype based association analysis already require simulations to obtain an uncorrected P value for a specific marker combination. As shown before, it is nevertheless not necessary to carry out nested simulations to obtain a global P value that properly corrects for the multiple testing of different marker combinations without neglecting the dependency of the tests. We have now implemented this approach for case-control data in our program FAMHAP, as this data structure currently plays a dominant role in the field. We consider different ways to deal with phase ambiguities and two different statistical tests for the underlying single marker combinations to obtain uncorrected P values. One test statistic is chi-square based, the other is a haplotype trend regression. The performance of these different tests in the multiple testing situation is investigated in a large simulation study. We obtain a considerable gain in power with our global P values as opposed to Bonferroni corrected P values for all suggested test statistics. Good power was obtained both with the haplotype trend regression approach as well as with the simpler chi-square based test. Furthermore, we conclude that the better strategy to deal with phase ambiguities is to assign to each individual its list of weighted haplotype explanations, rather than to assign to each individual its most likely haplotype explanation. Finally, we demonstrate the usefulness of our approach by a real data example.
TL;DR: The proposed estimator is shown to deliver a tight probabilistic lower bound for the number of false null hypotheses in a multiple testing situation even under strong dependence between test statistics.
Abstract: We propose probabilistic lower bounds for the number of false null hypotheses when testing multiple hypotheses of association simultaneously. The bounds are valid under general and unknown dependence structures between the test statistics. The power of the proposed estimator to detect the full proportion of false null hypotheses is discussed and compared to other estimators. The proposed estimator is shown to deliver a tight probabilistic lower bound for the number of false null hypotheses in a multiple testing situation even under strong dependence between test statistics.
TL;DR: A new re-sampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers.
Abstract: Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. In this article we propose a new re-sampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers. The procedure involves 1) specifying a conditional distribution for a guessed set of true null hypotheses, given the data, which asymptotically is degenerate at the true set of null hypotheses, and 2) specifying a generally valid null distribution for the vector of test-statistics proposed in Pollard & van der Laan (2003), and generalized in our subsequent article Dudoit, van der Laan, & Pollard (2004), van der Laan, Dudoit, & Pollard (2004), and van der Laan, Dudoit, & Pollard (2004b). Ingredient 1) is established by fitting the empirical Bayes two component mixture model (Efron (2001b)) to the data to obtain an upper bound for marginal posterior probabilities of the null being true, given the data. We establish the finite sample rational behind our proposal, and prove that this new multiple testing procedure asymptotically controls the wished tail probability for the proportion of false positives under general data generating distributions. In addition, we provide simulation studies establishing that this method is generally more powerful in finite samples than our previously proposed augmentation multiple testing procedure (van der Laan, Dudoit, & Pollard (2004b)) and competing procedures from the literature. Finally, we illustrate our methodology with a data analysis.
TL;DR: Although it is believed that the FDR or one of its variants will be applied more often in the future, longterm experience with microarray technology is missing and thus the validity of appropriate multiple test procedures cannot yet be assessed for microarray data analysis.
Abstract: Objectives: Discussion of different error concepts relevant to microarray experiments. Review of some commonly used multiple testing procedures. Comparison of different approaches as applied to gene expression data. Methods: This article focuses on familywise error rate (FWER) and false discovery rate (FDR) controlling procedures. Methods under investigation include: Bonferroni-type methods and their improvements (including resampling approaches), modified Bonferroni methods, data-driven approaches, as well as the linear step-up method and its modifications. Particular emphasis lies on the description of the assumptions, advantages and limitations for the investigated methods. Results: FWER controlling procedures are often too conservative in high dimensional screening studies. A better balance between the raw P -values and the stringent FWER-adjusted P -values may be required in many situations, as provided by FDR controlling and related procedures. Conclusions: The questions remain open, which error concept to apply and which multiple testing procedure to use. Although we believe that the FDR or one of its variants will be applied more often in the future, longterm experience with microarray technology is missing and thus the validity of appropriate multiple test procedures cannot yet be assessed for microarray data analysis.
TL;DR: This work presents a method to analyze case‐control studies with multiple SNP data without phase information that considers gene‐gene interaction effects while correcting appropriately for multiple testing, and allows for interactions of haplotypes that belong to different unlinked regions, as haplotype analysis often proves to be more powerful than single marker analysis.
TL;DR: In this article, a Bayesian estimation of a vector known to have a large number of zero components is proposed to compare the expression of thousands of genes in two different cell lines, and the prior knowledge on expression changes using mixture priors that incorporate a mass at zero.
Abstract: Gene microarray technology is often used to compare the expression of thousand of genes in two different cell lines. Typically, one does not expect measurable changes in transcription amounts for a large number of genes; furthermore, the noise level of array experiments is rather high in relation to the available number of replicates. For the purpose of statistical analysis, inference on the "population'' difference in expression for genes across the two cell lines is often cast in the framework of hypothesis testing, with the null hypothesis being no change in expression. Given that thousands of genes are investigated at the same time, this requires some multiple comparison correction procedure to be in place. We argue that hypothesis testing, with its emphasis on type I error and family analogues, may not address the exploratory nature of most microarray experiments. We instead propose viewing the problem as one of estimation of a vector known to have a large number of zero components. In a Bayesian framework, we describe the prior knowledge on expression changes using mixture priors that incorporate a mass at zero, and we choose a loss function that favors the selection of sparse solutions. We consider two different models applicable to the microarray problem, depending on the nature of replicates available, and show how to explore the posterior distributions of the parameters using MCMC. Simulations show an interesting connection between this Bayesian estimation framework and false discovery rate (FDR) control. Finally, two empirical examples illustrate the practical advantages of this Bayesian estimation paradigm.
TL;DR: This work applies the different multiple tests to EEG coherence data and shows that subsequently recalled nouns elicited significantly higher coherence than not recalled ones.
TL;DR: In this paper, the authors considered the multiple hypothesis testing problem as a 2 k finite action problem and showed that single-step, step-down and step-up procedures are inadmissible.
Abstract: A resurgence of interest in multiple hypothesis testing has occurred in the last decade. Motivated by studies in genomics, microarrays, DNA sequencing, drug screening, clinical trials, bioassays, education and psychology, statisticians have been devoting considerable research energy in an effort to properly analyze multiple endpoint data. In response to new applications, new criteria and new methodology, many ad hoc procedures have emerged. The classical requirement has been to use procedures which control the strong familywise error rate (FWE) at some predetermined level a. That is, the probability of any false rejection of a true null hypothesis should be less than or equal to a. Finding desirable and powerful multiple test procedures is difficult under this requirement. One of the more recent ideas is concerned with controlling the false discovery rate (FDR), that is, the expected proportion of rejected hypotheses which are, in fact, true. Many multiple test procedures do control the FDR. A much earlier approach to multiple testing was formulated by Lehmann [Ann. Math. Statist. 23 (1952) 541-552 and 28 (1957) 1-25]. Lehmann's approach is decision theoretic and he treats the multiple endpoints problem as a 2 k finite action problem when there are k endpoints. This approach is appealing since unlike the FWE and FDR criteria, the finite action approach pays attention to false acceptances as well as false rejections. In this paper we view the multiple endpoints problem as a 2 k finite action problem. We study the popular procedures single-step, step-down and step-up from the point of view of admissibility, Bayes and limit of Bayes properties. For our model, which is a prototypical one, and our loss function, we are able to demonstrate the following results under some fairly general conditions to be specified: (i) The single-step procedure is admissible. (ii) A sequence of prior distributions is given for which the step-down procedure is a limit of a sequence of Bayes procedures. (iii) For a vector risk function, where each component is the risk for an individual testing problem, various admissibility and inadmissibility results are obtained. In a companion paper [Cohen and Sackrowitz, Ann. Statist. 33 (2005) 145-158], we are able to give a characterization of Bayes procedures and their limits. The characterization yields a complete class and the additional useful result that the step-up procedure is inadmissible. The inadmissibility of step-up is demonstrated there for a more stringent loss function. Additional decision theoretic type results are also obtained in this paper.
TL;DR: Three-dimensional statistical maps for electroencephalogram (EEG) source localization are presented, which allow for the systematic exploration of the solution space for dipolar sources and permit to test whether the data support a given solution.
Abstract: We present a method that estimates three-dimensional statistical maps for electroencephalogram (EEG) source localization. The maps assess the likelihood that a point in the brain contains a dipolar source, under the hypothesis of one, two or three activated sources. This is achieved by examining all combinations of one to three dipoles on a coarse grid and attributing to each combination a score based on an F statistic. The probability density function of the statistic under the null hypothesis is estimated nonparametrically, using bootstrap resampling. A theoretical F distribution is then fitted to the empirical distribution in order to allow correction for multiple comparisons. The maps allow for the systematic exploration of the solution space for dipolar sources. They permit to test whether the data support a given solution. They do not rely on the assumption of uncorrelated source time courses. They can be compared to other statistical parametric maps such as those used in functional magnetic resonance imaging (fMRI). Results are presented for both simulated and real data. The maps were compared with LORETA and MUSIC results. For the real data consisting of an average of epileptic spikes, we observed good agreement between the EEG statistical maps, intracranial EEG recordings, and fMRI activations.
TL;DR: This paper will address issues by utilizing a general correlation measure, a non-parametric test statistic, and control of the family-wise error rate by employing permutation resampling in microarray studies on patients with lung cancer.
Abstract: In many microarray studies the primary objective is to identify, from a large panel of genes, those which are prognostic markers of a censored survival endpoint such as time to disease recurrence or death. Often, these genes are considered prognostic in that their respective expressions are associated with the survival endpoint of interest. To assess this association requires specifying an appropriate measure of association, a suitable test statistic and, as the number of genes is large, proper handling of multiplicity issues. In this paper, we will address these issues by utilizing a general correlation measure, a non-parametric test statistic, and control of the family-wise error rate by employing permutation resampling. Comprehensive simulation studies are conducted to investigate the statistical properties of the proposed procedure. The proposed method is applied to a recently published data set on patients with lung cancer.
TL;DR: It is argued that hypothesis testing, with its emphasis on type I error and family analogues, may not address the exploratory nature of most microarray experiments, and is proposed viewing the problem as one of estimation of a vector known to have a large number of zero components.
Abstract: Gene microarray technology is often used to compare the expression of thousand of genes in two different cell lines. Typically, one does not expect measurable changes in transcription amounts for a large number of genes; furthermore, the noise level of array experiments is rather high in relation to the available number of replicates. For the purpose of statistical analysis, inference on the “population” difference in expression for genes across the two cell lines is often cast in the framework of hypothesis testing, with the null hypothesis being no change in expression. Given that thousands of genes are investigated at the same time, this requires some multiple comparison correction procedure to be in place. We argue that hypothesis testing, with its emphasis on type I error and family analogues, may not address the exploratory nature of most microarray experiments. We instead propose viewing the problem as one of estimation of a vector known to have a large number of zero components. In a Bayesian framework, we describe the prior knowledge on expression changes using mixture priors that incorporate a mass at zero and we choose a loss function that favors the selection of sparse solutions. We consider two different models applicable to the microarray problem, depending on the nature of replicates available, and show how to explore the posterior distributions of the parameters using MCMC. Simulations show an interesting connection between this Bayesian estimation framework and both false discovery rate (FDR) control, and misclassification minimizing pro- cedures. Finally, two empirical examples illustrate the practical advantages of this Bayesian estimation paradigm
TL;DR: This work proposes a procedure that uses a tree‐based recursive partitioning algorithm to group haplotypes into a small number of clusters, and conducts the association test based on groups of haplotypes instead of individual haplotypes.
Abstract: Motivated by the increasing availability of high-density single nucleotide polymorphism (SNP) markers across the genome, various haplotype-based methods have been developed for candidate gene association studies, and even for genome-wide association studies. Although haplotype approaches dramatically reduce the multiple comparisons problem (as compared to single SNP analysis), even the number of existing haplotypes is relatively large, which increases the degrees of freedom and decreases the power for the corresponding test statistic. Grouping haplotypes is a way to reduce the degrees of freedom. We propose a procedure that uses a tree-based recursive partitioning algorithm to group haplotypes into a small number of clusters, and conducts the association test based on groups of haplotypes instead of individual haplotypes. The method can be used for both population-based and family-based association studies, with known or ambiguous phase information. Simulation studies suggest that the proposed method has the right type I error rate, and is more powerful than some existing haplotype-based tests.
TL;DR: The results of a large study on the association between base excision repair gene polymorphisms and lung cancer risk demonstrate problems typical of investigations of gene – environment interactions and propose a simple Bayesian approach based on the estimation of a prior probability and the calculation of posterior probability.
Abstract: In this issue of the Journal, Hung et al. ( 1 ) describe the results of a large study on the association between base excision repair gene polymorphisms and lung cancer risk. Their study demonstrates problems typical of investigations of gene – environment interactions, in particular, the fact that among the generally negative results, some seemingly noteworthy associations are identifi ed in subgroups of subjects who are defi ned on the basis of their tumor histology or smoking habits. The unique aspect of this study is that the authors have estimated the probability that the associations found are attributable to chance (i.e., false positives). The idea of evaluating new observations in the light of existing evidence is not novel and belongs to at least two traditions: Bayesian statistics and clinical epidemiology. In clinical epidemiology, formal treatment of “proof ” from prior evidence is updated into posterior probability according to Bayes’ theorem. This approach has proved extremely useful because ( a ) it has revealed that diagnosis (and, by extension, clinical and scientifi c reasoning) develops in a sequence of steps that incorporate explicitly or implicitly previous knowledge ( 2 ) and ( b ) it has trained physicians to assign probabilities to the different components of such reasoning, namely, prior knowledge and “likelihood,” or the probability, of a positive test result in the presence of disease. Wacholder et al. ( 3 ) have proposed these same approaches to assess the validity of the increasing number of associations that are being identifi ed among genetic variants, environmental exposures, and disease (i.e., gene – environment and gene – gene interactions). A large number of such associations have been reported and even more are expected to emerge in the future. Tens of thousands of single nucleotide polymorphisms (SNPs) are or will be investigated for their association with cancer, and many of the observed results will be false positives. The challenge is to distinguish the false-positive associations from the true positives. Wacholder et al. ( 3 ) proposed a simple Bayesian approach that is based on the estimation of a prior probability and the calculation of posterior probability. The prior probability can be estimated from results of previous studies, from biochemical or molecular information (e.g., gene expression data) that supports the function of a SNP, or from other types of evidence such as sequence homology ( 4 ) . The general idea is to weight a new observation with the available prior evidence to derive a posterior probability. Recently, in an extension of the model proposed by Wacholder et al. ( 3 ) , Ioannidis ( 5 ) has incorporated selective reporting and other biases and the fact that specifi c hypotheses, including gene – disease associations and gene – environment interactions, are usually tested by many teams worldwide. By taking these parameters into account when scrutinizing all the evidence, one can show that very few identifi ed gene – disease associations and gene – environment interactions are probably real. The approach proposed by Wacholder et al. is ingenious and more convincing than other approaches, such as those based on Bonferroni’s correction for multiple comparisons or similar statistical methods ( 6 , 7 ) . Although the study by Hung et al. is a clever application of the proposal by Wacholder et al., execution of that proposal is not free from problems. The main problem is how best to calculate the prior probabilities. First, there is often no information available to use. Second, any available information may not be easy to evaluate, either because it is indirect (for example, from sequence homology) or because it is contradictory. The latter is very often the case with the epidemiologic evidence of gene – environment interactions (see below). Third, methods to quantify prior probabilities are not available, even when we have good evidence. The attempt made by Hung et al. is a good step forward, but it is still imperfect. The authors have considered and applied fi ve categories of prior probability (from 50% to 0.1%); however, the basis for these numbers and categories is unclear. A further step forward might consist of using results of welldone meta-analyses or pooled analyses of the available studies. In this case, we have good examples from the clinical literature: a good meta-analysis can allow clinicians to judge how much a new study changes the estimate of effi cacy of a drug. For example, if a meta-analysis provides an odds ratio of 0.9 and a new study reports an odds ratio of 0.2, then the real contribution of the new study to the advancement of knowledge will be judged by its ability to change the odds ratio provided by the meta-analysis. This ability will depend on both qualitative considerations (such as potential biases and the response rate) and the size of the study. Metaanalyses can be extremely useful for obtaining prior estimates against which the new evidence can be challenged because they include ( a ) an overall odds ratio from virtually all the available studies, ( b ) an evaluation of the quality of the studies (a quality score is usually assigned to each study included in a metaanalysis), ( c ) a weight that depends on the study size, and ( d ) a measure of heterogeneity. In a meta-analysis, a large and well-conducted positive study can thus outweigh several small negative studies. Figure 1 shows an example of a meta-analysis of some of the genes included in the Hung et al. article that we performed using data on gene–environment interactions, with a specifi c focus on DNA repair polymorphisms (summarized at http://perseus.isi.it/ huge [last accessed: March 24, 2005]). A network of investigative groups involved in human genome epidemiology research has been developed to combine data from many studies to overcome the defi ciencies of currently available, relatively small datasets (Ioannidis JPA, Altman R, Boffetta P, Danesh J, Hartge P, Little J, et al.: unpublished data). Overall, the results shown in the fi gure support the choices made by Hung et al. in assigning prior
TL;DR: This work shows, by applying the model to an empirical data set, that the method based on the well-known logistic regression model is a useful tool for haplotype association analysis of human disease traits.
Abstract: Haplotype inference has become an important part of human genetic data analysis due to its functional and statistical advantages over the single-locus approach in linkage disequilibrium mapping. Different statistical methods have been proposed for detecting haplotype - disease associations using unphased multi-locus genotype data, ranging from the early approach by the simple gene-counting method to the recent work using the generalized linear model. However, these methods are either confined to case - control design or unable to yield unbiased point and interval estimates of haplotype effects. Based on the popular logistic regression model, we present a new approach for haplotype association analysis of human disease traits. Using haplotype-based parameterization, our model infers the effects of specific haplotypes (point estimation) and constructs confidence interval for the risks of haplotypes (interval estimation). Based on the estimated parameters, the model calculates haplotype frequency conditional on the trait value for both discrete and continuous traits. Moreover, our model provides an overall significance level for the association between the disease trait and a group or all of the haplotypes. Featured by the direct maximization in haplotype estimation, our method also facilitates a computer simulation approach for correcting the significance level of individual haplotype to adjust for multiple testing. We show, by applying the model to an empirical data set, that our method based on the well-known logistic regression model is a useful tool for haplotype association analysis of human disease traits.
TL;DR: The exact approach is described as a useful alternative to the aymptotic test and it is shown that the exact tests for biallelic data may be most useful for the recessive disease model.
TL;DR: A simple method to calculate sample size and power for a simulation-based multiple testing procedure which gives a sharper critical value than the standard Bonferroni method is presented.
Abstract: In this article, we present a simple method to calculate sample size and power for a simulation-based multiple testing procedure which gives a sharper critical value than the standard Bonferroni method. The method is especially useful when several highly correlated test statistics are involved in a multiple-testing procedure. The formula for sample size calculation will be useful in designing clinical trials with multiple endpoints or correlated outcomes. We illustrate our method with a quality-of-life study for patients with early stage prostate cancer. Our method can also be used for comparing multiple independent groups.
TL;DR: In this article, a new resampling based multiple testing procedure is proposed to control the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers.
Abstract: Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. In this article we propose a new resampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers. The procedure involves 1) specifying a conditional distribution for a guessed set of true null hypotheses, given the data, which asymptotically is degenerate at the true set of null hypotheses, and 2) specifying a generally valid null distribution for the vector of test-statistics proposed in Pollard and van der Laan (2003), and generalized in our subsequent articles Dudoit et al. (2004), van der Laan et al. (2004a) and van der Laan et al. (2004b). We establish the finite sample rational behind our proposal, and prove that this new multiple testing procedure asymptotically controls the wished tail probability for the proportion of false positives under general data generating distributions. In addition, we provide simulation studies establishing that this method is generally more powerful in finite samples than our previously proposed augmentation multiple testing procedure (van der Laan et al. (2004b)) and competing procedures from the literature. Finally, we illustrate our methodology with a data analysis.
TL;DR: Several test statistics that can be utilized in testing disease association are examined and several multiple testing procedures that can properly control the family-wise error rates when the univariate approach is applied to multiple markers are reviewed.
Abstract: We studied several methods for selecting single-nucleotide polymorphisms (SNPs) in a disease association study. Two major categories for analytical strategy are the univariate and the set selection approaches. The univariate approach evaluates each SNP marker one at a time, while the set selection approach tests disease association of a set of SNP markers simultaneously. We examined various test statistics that can be utilized in testing disease association and also reviewed several multiple testing procedures that can properly control the family-wise error rates when the univariate approach is applied to multiple markers. The set association methods were then briefly reviewed. Finally, we applied these methods to the data from Collaborative Study on the Genetics of Alcoholism (COGA).