Top 92 papers published in the topic of Multiple comparisons problem in 2005

Showing papers on "Multiple comparisons problem published in 2005"

Journal Article•10.1111/J.0030-1299.2005.13727.X•

Implementing false discovery rate control: increasing your power

[...]

Koen J. F. Verhoeven¹, Katy L. Simonsen, Lauren M. McIntyre¹•Institutions (1)

01 Mar 2005-Oikos

TL;DR: Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue.

...read moreread less

Abstract: Popular procedures to control the chance of making type I errors when multiple statistical tests are performed come at a high cost: a reduction in power. As the number of tests increases, power for an individual test may become unacceptably low. This is a consequence of minimizing the chance of making even a single type I error, which is the aim of, for instance, the Bonferroni and sequential Bonferroni procedures. An alternative approach, control of the false discovery rate (FDR), has recently been advocated for ecological studies. This approach aims at controlling the proportion of significant results that are in fact type I errors. Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue. To encourage practical use of the approach, in this note we illustrate how the proposed procedure works, we compare it to more traditional methods that control the familywise error rate, and we discuss some recent useful developments in FDR control.

...read moreread less

981 citations

Journal Article•10.1093/BIOINFORMATICS/BTI062•

An empirical Bayes approach to inferring large-scale gene association networks

[...]

Juliane Schäfer¹, Korbinian Strimmer¹•Institutions (1)

Ludwig Maximilian University of Munich¹

15 Mar 2005-Bioinformatics

TL;DR: A novel framework for small-sample inference of graphical models from gene expression data that focuses on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes is introduced.

...read moreread less

Abstract: Motivation: Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem. Methods: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology. Results: Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes. Availability: The authors have implemented the approach in the R package 'GeneTS' that is freely available from http://www.stat.uni-muenchen.de/~strimmer/genets/, from the R archive (CRAN) and from the Bioconductor website. Contact: korbinian.strimmer@lmu.de

...read moreread less

968 citations

Journal Article•10.1002/HBM.20136•

ALE meta-analysis: Controlling the false discovery rate and performing statistical contrasts

[...]

Angela R. Laird¹, P. Mickle Fox¹, Cathy J. Price, David C. Glahn¹, Angela M. Uecker¹, Jack L. Lancaster, Peter E. Turkeltaub², Peter Kochunov¹, Peter T. Fox - Show less +5 more•Institutions (2)

University of Texas Health Science Center at San Antonio¹, Georgetown University²

01 May 2005-Human Brain Mapping

TL;DR: A technique for making statistical comparisons of ALE meta‐analyses is proposed and its efficacy on different groups of foci divided by task or response type and random groups of similarly obtained foci is investigated.

...read moreread less

Abstract: Activation likelihood estimation (ALE) has greatly advanced voxel-based meta-analysis research in the field of functional neuroimaging. We present two improvements to the ALE method. First, we evaluate the feasibility of two techniques for correcting for multiple comparisons: the single threshold test and a procedure that controls the false discovery rate (FDR). To test these techniques, foci from four different topics within the literature were analyzed: overt speech in stuttering subjects, the color-word Stroop task, picture-naming tasks, and painful stimulation. In addition, the performance of each thresholding method was tested on randomly generated foci. We found that the FDR method more effectively controls the rate of false positives in meta-analyses of small or large numbers of foci. Second, we propose a technique for making statistical comparisons of ALE meta-analyses and investigate its efficacy on different groups of foci divided by task or response type and random groups of similarly obtained foci. We then give an example of how comparisons of this sort may lead to advanced designs in future meta-analytic research.

...read moreread less

910 citations

Journal Article•10.1111/J.1420-9101.2005.00917.X•

Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach

[...]

Michael C. Whitlock¹•Institutions (1)

University of British Columbia¹

01 Sep 2005-Journal of Evolutionary Biology

TL;DR: The results in this note show that, when combining P‐values from multiple tests of the same hypothesis, the weighted Z‐method should be preferred.

...read moreread less

Abstract: The most commonly used method in evolutionary biology for combining information across multiple tests of the same null hypothesis is Fisher's combined probability test. This note shows that an alternative method called the weighted Z-test has more power and more precision than does Fisher's test. Furthermore, in contrast to some statements in the literature, the weighted Z-method is superior to the unweighted Z-transform approach. The results in this note show that, when combining P-values from multiple tests of the same hypothesis, the weighted Z-method should be preferred.

...read moreread less

824 citations

Journal Article•10.1198/016214504000001907•

False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters

[...]

Yoav Benjamini¹, Daniel Yekutieli¹, Don Edwards², Juliet Popper Shaffer³, Ajit C. Tamhane⁴, Peter H. Westfall⁵, Burt Holland⁶ - Show less +3 more•Institutions (6)

Tel Aviv University¹, University of South Carolina², University of California, Berkeley³, Northwestern University⁴, Texas Tech University⁵, Temple University⁶

01 Mar 2005-Journal of the American Statistical Association

TL;DR: In this article, the authors proposed the false coverage-statement rate (FCR) as a measure of interval coverage following selection, and proposed a general procedure to construct a marginal CI for each selected parameter, but instead of the confidence level 1 − q being used marginally, q is divided by the number of parameters considered and multiplied by the selected.

...read moreread less

Abstract: Often in applied research, confidence intervals (CIs) are constructed or reported only for parameters selected after viewing the data. We show that such selected intervals fail to provide the assumed coverage probability. By generalizing the false discovery rate (FDR) approach from multiple testing to selected multiple CIs, we suggest the false coverage-statement rate (FCR) as a measure of interval coverage following selection. A general procedure is then introduced, offering FCR control at level q under any selection rule. The procedure constructs a marginal CI for each selected parameter, but instead of the confidence level 1 − q being used marginally, q is divided by the number of parameters considered and multiplied by the number selected. If we further use the FDR controlling testing procedure of Benjamini and Hochberg for selecting the parameters, the newly suggested procedure offers CIs that are dual to the testing procedure and are shown to be optimal in the independent case. Under the positive re...

...read moreread less

753 citations

Book Chapter•10.1007/0-387-29362-0_15•

Multiple Testing Procedures: the multtest Package and Applications to Genomics

[...]

Katherine S. Pollard¹, Sandrine Dudoit², M. J. van der Laan²•Institutions (2)

University of California, San Francisco¹, University of California, Berkeley²

1 Jan 2005

TL;DR: The multtest package as discussed by the authors implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTPs) for controlling a broad class of Type I error rates.

...read moreread less

Abstract: The Bioconductor R package multtest implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates. The current version of multtest provides MTPs for tests concerning means, differences in means, and regression parameters in linear and Cox proportional hazards models. Typical testing scenarios are illustrated by applying various MTPs implemented in multtest to the Acute Lymphoblastic Leukemia (ALL) data set of Chiaretti et al. (2004), with the aim of identifying genes whose expression measures are associated with (possibly censored) biological and clinical outcomes.

...read moreread less

303 citations

Journal Article•10.11234/GI1990.16.106•

Biological profiling of gene groups utilizing Gene Ontology.

[...]

Nils Blüthgen¹, Karsten Brand², Branka Čajavec¹, Maciej Swat¹, Hanspeter Herzel¹, Dieter Beule - Show less +2 more•Institutions (2)

Humboldt University of Berlin¹, Heidelberg University²

01 Jan 2005-Genome Informatics

TL;DR: In this paper, a statistical framework is proposed to detect statistical signifcance of a biological process within a group of interesting genes when compared to a reference group, based on functional annotation provided by the Gene Ontology.

...read moreread less

Abstract: Increasingly used high throughput experimental techniques, like DNA or protein microarrays give as a result groups of interesting, e.g. difierentially regulated genes which require further biological interpretation. With the systematic functional annotation provided by the Gene Ontology the information required to automate the interpretation task is now accessible. However, the determination of statistical signiflcance of a biological process within these groups is still an open question. In answering this question, multiple testing issues must be taken into account to avoid misleading results. Here we present a statistical framework that tests whether functions, processes or locations described in the Gene Ontology are signiflcantly enriched within a group of interesting genes when compared to a reference group. First we deflne an exact analytical expression for the expected number of false positives that allows us to calculate adjusted p-values to control the false discovery rate. Next, we demonstrate and discuss the capabilities of our approach using publicly available microarray data on cell-cycle regulated genes. Further, we analyze the robustness of our framework with respect to the exact gene group composition and compare the performance with earlier approaches. The software package GOSSIP implements our method and is made freely available at http://gossip.gene-groups.net/.

...read moreread less

228 citations

Journal Article•10.1016/J.IJFORECAST.2004.10.003•

The M3 competition: Statistical tests of the results

[...]

Alex Koning¹, Philip Hans Franses¹, Michèle Hibon², Herman O. Stekler³•Institutions (3)

Erasmus University Rotterdam¹, INSEAD², George Washington University³

01 Jul 2005-International Journal of Forecasting

TL;DR: A new methodology that has not previously been used to evaluate economic forecasts: multiple comparisons is introduced, which concludes that the accuracy of the various methods does differ significantly, and that some methods are significantly better than others.

...read moreread less

201 citations

Book•

Stepup procedures for control of generalizations of the familywise error rate

[...]

Joseph P. Romano, Azeem M. Shaikh

1 Jan 2005

TL;DR: In this paper, the authors consider two generalizations of the FWER, i.e., the k-FWER and the false discovery proportion (FDP), defined as the number of false rejections divided by the total number of rejections and defined to be 0 if there are no rejections.

...read moreread less

Abstract: Consider the multiple testing problem of testing null hypotheses H 1 ,...,H s . A classical approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. But if s is large, control of the FWER is so stringent that the ability of a procedure that controls the FWER to detect false null hypotheses is limited. It is therefore desirable to consider other measures of error control. This article considers two generalizations of the FWER. The first is the k-FWER, in which one is willing to tolerate k or more false rejections for some fixed k ≥ I. The second is based on the false discovery proportion (FDP), defined to be the number of false rejections divided by the total number of rejections (and defined to be 0 if there are no rejections). Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289-300] proposed ccntrol of the false discovery rate (FDR), by which they meant that, for fixed a, E(FDP) ≤ a. Here, we consider control of the FDP in the sense that, for fixed y and a, P{FDP > γ} ≤ a. Beginning with any nondecreasing sequence of constants and p-values for the individual tests, we derive stepup procedures that control each of these two measures of error control without imposing any assumptions on the dependence structure of the p-values. We use our results to point out a few interesting connections with some closely related stepdown procedures. We then compare and contrast two FDP-controlling procedures obtained using our results with the stepup procedure for control of the FDR of Benjamini and Yekutieli [Ann. Statist. 29 (2001) 1165-1188].

...read moreread less

129 citations

Journal Article•10.1002/GEPI.20050•

Analysis of single‐locus tests to detect gene/disease associations

[...]

Kathryn Roeder¹, Silviu-Alin Bacanu², Vibhor A. Sonpar², Xiaohua Zhang³, Bernie Devlin² - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, University of Pittsburgh², United States Military Academy³

01 Apr 2005-Genetic Epidemiology

TL;DR: Power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele, and it is found that power of TP is generally superior to that for the other procedures, including TR.

...read moreread less

Abstract: A goal of association analysis is to determine whether variation in a particular candidate region or gene is associated with liability to complex disease. To evaluate such candidates, ubiquitous Single Nucleotide Polymorphisms (SNPs) are useful. It is critical, however, to select a set of SNPs that are in substantial linkage disequilibrium (LD) with all other polymorphisms in the region. Whether there is an ideal statistical framework to test such a set of 'tag SNPs' for association is unknown. Compared to tests for association based on frequencies of haplotypes, recent evidence suggests tests for association based on linear combinations of the tag SNPs (Hotelling T(2) test) are more powerful. Following this logical progression, we wondered if single-locus tests would prove generally more powerful than the regression-based tests? We answer this question by investigating four inferential procedures: the maximum of a series of test statistics corrected for multiple testing by the Bonferroni procedure, T(B), or by permutation of case-control status, T(P); a procedure that tests the maximum of a smoothed curve fitted to the series of of test statistics, T(S); and the Hotelling T(2) procedure, which we call T(R). These procedures are evaluated by simulating data like that from human populations, including realistic levels of LD and realistic effects of alleles conferring liability to disease. We find that power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele. The clearest pattern emerges between power and the number of SNPs selected. When a large fraction of the SNPs within a gene are tested, and multiple SNPs are highly correlated with the liability allele, T(S) has better power. Using a SNP selection scheme that optimizes power but also requires a substantial number of SNPs to be genotyped (roughly 10-20 SNPs per gene), power of T(P) is generally superior to that for the other procedures, including T(R). Finally, when a SNP selection procedure that targets a minimal number of SNPs per gene is applied, the average performances of T(P) and T(R) are indistinguishable.

...read moreread less

99 citations

Journal Article•10.1111/J.1467-9876.2005.00475.X•

A modified false discovery rate multiple‐comparisons procedure for discrete data, applied to human immunodeficiency virus genetics

[...]

Peter B. Gilbert¹•Institutions (1)

Fred Hutchinson Cancer Research Center¹

01 Jan 2005-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A modified FDR procedure for discrete data is developed and applies it to the human immunodeficiency virus data, detecting 15 positions with significantly different mutation rates compared with 11 that are detected by the original FDR method.

...read moreread less

Abstract: to determine statistical significance. When the test statistics have discrete distributions, the FDR procedure can be made more powerful by a simple modification. The paper develops a modified FDR procedure for discrete data and applies it to the human immunodeficiency virus data. The new procedure detects 15 positions with significantly different mutation rates compared with 11 that are detected by the original FDR method. Simulations delineate conditions under which the modified FDR procedure confers large gains in power over the original technique. In general FDR adjustment methods can be improved for discrete data by incorporating the modification proposed.

...read moreread less

Journal Article•10.1159/000087540•

Evaluation of Nyholt’s Procedure for Multiple Testing Correction

[...]

Daria Salyakina¹, S. R. Seaman¹, Brian L Browning, Frank Dudbridge, Bertram Müller-Myhsok¹ - Show less +1 more•Institutions (1)

Max Planck Society¹

01 Jan 2005-Human Heredity

TL;DR: Empirical and theoretical considerations show that Nyholt’s approach may be useful as an exploratory tool, but it is not an adequate substitute for permutation tests.

...read moreread less

Abstract: Objective: A simple method for accounting efficiently for multiple testing of many SNPs in an association study was recently proposed by Nyholt, but its performance was not extensively evaluated. The method involves estimating an ‘effective number’ of independent tests and then adjusting the smallest observed p value using Sidak’s formula based on this number of tests. We sought to carry out an empirical and theoretical evaluation of Nyholt’s method. Methods: Nyholt’s method was applied to a sample of 31 genes typed at a total of 291 SNPs and permutation used to determine the type-I error rate for each gene. Based on our empirical results, we algebraically investigated the effective number of independent tests for a simple model of haplotype block structure. Results: The nominal 5% type I error rate varied from under 3% to over 7%, and was dependent on linkage disequilibrium. Theoretical considerations show further that the method can be very conservative in the presence of haplotype block structure. Conclusion: Although Nyholt’s approach may be useful as an exploratory tool, it is not an adequate substitute for permutation tests.

...read moreread less

Journal Article•10.1111/J.1529-8817.2005.00198.X•

Multiple testing in the context of haplotype analysis revisited: application to case-control data.

[...]

Tim Becker¹, Sven Cichon¹, E. Jönson², Michael Knapp¹•Institutions (2)

University of Bonn¹, Karolinska Institutet²

01 Nov 2005-Annals of Human Genetics

TL;DR: It is concluded that the better strategy to deal with phase ambiguities is to assigning to each individual its list of weighted haplotype explanations, rather than to assign to eachindividual its most likely haplotype explanation.

...read moreread less

Abstract: Summary We have lately presented a testing procedure for family data which accounts for the multiple testing problem that is induced by the enormous number of different marker combinations that can be analyzed in a set of tightly linked markers. Most methods of haplotype based association analysis already require simulations to obtain an uncorrected P value for a specific marker combination. As shown before, it is nevertheless not necessary to carry out nested simulations to obtain a global P value that properly corrects for the multiple testing of different marker combinations without neglecting the dependency of the tests. We have now implemented this approach for case-control data in our program FAMHAP, as this data structure currently plays a dominant role in the field. We consider different ways to deal with phase ambiguities and two different statistical tests for the underlying single marker combinations to obtain uncorrected P values. One test statistic is chi-square based, the other is a haplotype trend regression. The performance of these different tests in the multiple testing situation is investigated in a large simulation study. We obtain a considerable gain in power with our global P values as opposed to Bonferroni corrected P values for all suggested test statistics. Good power was obtained both with the haplotype trend regression approach as well as with the simpler chi-square based test. Furthermore, we conclude that the better strategy to deal with phase ambiguities is to assign to each individual its list of weighted haplotype explanations, rather than to assign to each individual its most likely haplotype explanation. Finally, we demonstrate the usefulness of our approach by a real data example.

...read moreread less

Journal Article•10.1093/BIOMET/92.4.893•

Lower bounds for the number of false null hypotheses for multiple testing of associations under general dependence structures

[...]

Nicolai Meinshausen¹, Peter Bühlmann¹•Institutions (1)

ETH Zurich¹

01 Dec 2005-Biometrika

TL;DR: The proposed estimator is shown to deliver a tight probabilistic lower bound for the number of false null hypotheses in a multiple testing situation even under strong dependence between test statistics.

...read moreread less

Abstract: We propose probabilistic lower bounds for the number of false null hypotheses when testing multiple hypotheses of association simultaneously. The bounds are valid under general and unknown dependence structures between the test statistics. The power of the proposed estimator to detect the full proportion of false null hypotheses is discussed and compared to other estimators. The proposed estimator is shown to deliver a tight probabilistic lower bound for the number of false null hypotheses in a multiple testing situation even under strong dependence between test statistics.

...read moreread less

Journal Article•10.2202/1544-6115.1143•

Empirical Bayes and resampling based multiple testing procedure controlling tail probability of the proportion of false positives.

[...]

Mark J. van der Laan¹, Merrill D. Birkner², Alan Hubbard¹•Institutions (2)

University of California, Berkeley¹, Genentech²

07 Oct 2005-Statistical Applications in Genetics and Molecular Biology

TL;DR: A new re-sampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers.

...read moreread less

Abstract: Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. In this article we propose a new re-sampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers. The procedure involves 1) specifying a conditional distribution for a guessed set of true null hypotheses, given the data, which asymptotically is degenerate at the true set of null hypotheses, and 2) specifying a generally valid null distribution for the vector of test-statistics proposed in Pollard & van der Laan (2003), and generalized in our subsequent article Dudoit, van der Laan, & Pollard (2004), van der Laan, Dudoit, & Pollard (2004), and van der Laan, Dudoit, & Pollard (2004b). Ingredient 1) is established by fitting the empirical Bayes two component mixture model (Efron (2001b)) to the data to obtain an upper bound for marginal posterior probabilities of the null being true, given the data. We establish the finite sample rational behind our proposal, and prove that this new multiple testing procedure asymptotically controls the wished tail probability for the proportion of false positives under general data generating distributions. In addition, we provide simulation studies establishing that this method is generally more powerful in finite samples than our previously proposed augmentation multiple testing procedure (van der Laan, Dudoit, & Pollard (2004b)) and competing procedures from the literature. Finally, we illustrate our methodology with a data analysis.

...read moreread less

Journal Article•10.1055/S-0038-1633989•

Multiplicity issues in microarray experiments.

[...]

Frank Bretz¹, Jobst Landgrebe, Edgar Brunner•Institutions (1)

Novartis¹

01 Jan 2005-Methods of Information in Medicine

TL;DR: Although it is believed that the FDR or one of its variants will be applied more often in the future, longterm experience with microarray technology is missing and thus the validity of appropriate multiple test procedures cannot yet be assessed for microarray data analysis.

...read moreread less

Abstract: Objectives: Discussion of different error concepts relevant to microarray experiments. Review of some commonly used multiple testing procedures. Comparison of different approaches as applied to gene expression data. Methods: This article focuses on familywise error rate (FWER) and false discovery rate (FDR) controlling procedures. Methods under investigation include: Bonferroni-type methods and their improvements (including resampling approaches), modified Bonferroni methods, data-driven approaches, as well as the linear step-up method and its modifications. Particular emphasis lies on the description of the assumptions, advantages and limitations for the investigated methods. Results: FWER controlling procedures are often too conservative in high dimensional screening studies. A better balance between the raw P -values and the stringent FWER-adjusted P -values may be required in many situations, as provided by FDR controlling and related procedures. Conclusions: The questions remain open, which error concept to apply and which multiple testing procedure to use. Although we believe that the FDR or one of its variants will be applied more often in the future, longterm experience with microarray technology is missing and thus the validity of appropriate multiple test procedures cannot yet be assessed for microarray data analysis.

...read moreread less

Journal Article•10.1002/GEPI.20096•

Haplotype interaction analysis of unlinked regions.

[...]

Tim Becker¹, Johannes Schumacher¹, Sven Cichon¹, Max P. Baur¹, Michael Knapp¹ - Show less +1 more•Institutions (1)

University of Bonn¹

01 Dec 2005-Genetic Epidemiology

TL;DR: This work presents a method to analyze case‐control studies with multiple SNP data without phase information that considers gene‐gene interaction effects while correcting appropriately for multiple testing, and allows for interactions of haplotypes that belong to different unlinked regions, as haplotype analysis often proves to be more powerful than single marker analysis.

...read moreread less

Abstract: Genetically complex diseases are caused by interacting environmental factors and genes. As a consequence, statistical methods that consider multiple unlinked genomic regions simultaneously are desirable. Such consideration, however, may lead to a vast number of different high-dimensional tests whose appropriate analysis pose a problem. Here, we present a method to analyze case-control studies with multiple SNP data without phase information that considers gene-gene interaction effects while correcting appropriately for multiple testing. In particular, we allow for interactions of haplotypes that belong to different unlinked regions, as haplotype analysis often proves to be more powerful than single marker analysis. In addition, we consider different marker combinations at each unlinked region. The multiple testing issue is settled via the minP approach; the P value of the “best” marker/region configuration is corrected via Monte-Carlo simulations. Thus, we do not explicitly test for a specific pre-defined interaction model, but test for the global hypothesis that none of the considered haplotype interactions shows association with the disease. We carry out a simulation study for case-control data that confirms the validity of our approach. When simulating two-locus disease models, our test proves to be more powerful than association methods that analyze each linked region separately. In addition, when one of the tested regions is not involved in the etiology of the disease, only a small amount of power is lost with interaction analysis as compared to analysis without interaction. We successfully applied our method to a real case-control data set with markers from two genes controlling a common pathway. While classical analysis failed to reach significance, we obtained a significant result even after correction for multiple testing with our proposed haplotype interaction analysis. The method described here has been implemented in FAMHAP. Genet. Epidemiol. 2005. © 2005 Wiley-Liss, Inc.

...read moreread less

Journal Article•10.2202/1544-6115.1132•

Empirical Bayes Estimation of a Sparse Vector of Gene Expression Changes

[...]

Stephen W. Erickson¹, Chiara Sabatti¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 2005-Statistical Applications in Genetics and Molecular Biology

TL;DR: In this article, a Bayesian estimation of a vector known to have a large number of zero components is proposed to compare the expression of thousands of genes in two different cell lines, and the prior knowledge on expression changes using mixture priors that incorporate a mass at zero.

...read moreread less

Abstract: Gene microarray technology is often used to compare the expression of thousand of genes in two different cell lines. Typically, one does not expect measurable changes in transcription amounts for a large number of genes; furthermore, the noise level of array experiments is rather high in relation to the available number of replicates. For the purpose of statistical analysis, inference on the "population'' difference in expression for genes across the two cell lines is often cast in the framework of hypothesis testing, with the null hypothesis being no change in expression. Given that thousands of genes are investigated at the same time, this requires some multiple comparison correction procedure to be in place. We argue that hypothesis testing, with its emphasis on type I error and family analogues, may not address the exploratory nature of most microarray experiments. We instead propose viewing the problem as one of estimation of a vector known to have a large number of zero components. In a Bayesian framework, we describe the prior knowledge on expression changes using mixture priors that incorporate a mass at zero, and we choose a loss function that favors the selection of sparse solutions. We consider two different models applicable to the microarray problem, depending on the nature of replicates available, and show how to explore the posterior distributions of the parameters using MCMC. Simulations show an interesting connection between this Bayesian estimation framework and false discovery rate (FDR) control. Finally, two empirical examples illustrate the practical advantages of this Bayesian estimation paradigm.

...read moreread less

Journal Article•10.1016/J.JNEUMETH.2004.08.008•

New concepts of multiple tests and their use for evaluating high-dimensional EEG data.

[...]

Claudia Hemmelmann¹, Manfred Horn¹, Thomas Süsse¹, Riidiger Vollandt¹, Sabine Weiss² - Show less +1 more•Institutions (2)

University of Jena¹, Medical University of Vienna²

30 Mar 2005-Journal of Neuroscience Methods

TL;DR: This work applies the different multiple tests to EEG coherence data and shows that subsequently recalled nouns elicited significantly higher coherence than not recalled ones.

...read moreread less

Journal Article•10.1214/009053604000000968•

Decision theory results for one-sided multiple comparison procedures

[...]

Arthur Cohen¹, Harold B. Sackrowitz•Institutions (1)

Rutgers University¹

01 Feb 2005-Annals of Statistics

TL;DR: In this paper, the authors considered the multiple hypothesis testing problem as a 2 k finite action problem and showed that single-step, step-down and step-up procedures are inadmissible.

...read moreread less

Abstract: A resurgence of interest in multiple hypothesis testing has occurred in the last decade. Motivated by studies in genomics, microarrays, DNA sequencing, drug screening, clinical trials, bioassays, education and psychology, statisticians have been devoting considerable research energy in an effort to properly analyze multiple endpoint data. In response to new applications, new criteria and new methodology, many ad hoc procedures have emerged. The classical requirement has been to use procedures which control the strong familywise error rate (FWE) at some predetermined level a. That is, the probability of any false rejection of a true null hypothesis should be less than or equal to a. Finding desirable and powerful multiple test procedures is difficult under this requirement. One of the more recent ideas is concerned with controlling the false discovery rate (FDR), that is, the expected proportion of rejected hypotheses which are, in fact, true. Many multiple test procedures do control the FDR. A much earlier approach to multiple testing was formulated by Lehmann [Ann. Math. Statist. 23 (1952) 541-552 and 28 (1957) 1-25]. Lehmann's approach is decision theoretic and he treats the multiple endpoints problem as a 2 k finite action problem when there are k endpoints. This approach is appealing since unlike the FWE and FDR criteria, the finite action approach pays attention to false acceptances as well as false rejections. In this paper we view the multiple endpoints problem as a 2 k finite action problem. We study the popular procedures single-step, step-down and step-up from the point of view of admissibility, Bayes and limit of Bayes properties. For our model, which is a prototypical one, and our loss function, we are able to demonstrate the following results under some fairly general conditions to be specified: (i) The single-step procedure is admissible. (ii) A sequence of prior distributions is given for which the step-down procedure is a limit of a sequence of Bayes procedures. (iii) For a vector risk function, where each component is the risk for an individual testing problem, various admissibility and inadmissibility results are obtained. In a companion paper [Cohen and Sackrowitz, Ann. Statist. 33 (2005) 145-158], we are able to give a characterization of Bayes procedures and their limits. The characterization yields a complete class and the additional useful result that the step-up procedure is inadmissible. The inadmissibility of step-up is demonstrated there for a more stringent loss function. Additional decision theoretic type results are also obtained in this paper.

...read moreread less

Journal Article•10.1109/TBME.2004.841263•

Statistical maps for EEG dipolar source localization

[...]

Christian Bénar¹, Roger N. Gunn¹, Christophe Grova¹, Benoit Champagne², Jean Gotman¹ - Show less +1 more•Institutions (2)

Montreal Neurological Institute and Hospital¹, McGill University²

22 Feb 2005-IEEE Transactions on Biomedical Engineering

TL;DR: Three-dimensional statistical maps for electroencephalogram (EEG) source localization are presented, which allow for the systematic exploration of the solution space for dipolar sources and permit to test whether the data support a given solution.

...read moreread less

Abstract: We present a method that estimates three-dimensional statistical maps for electroencephalogram (EEG) source localization. The maps assess the likelihood that a point in the brain contains a dipolar source, under the hypothesis of one, two or three activated sources. This is achieved by examining all combinations of one to three dipoles on a coarse grid and attributing to each combination a score based on an F statistic. The probability density function of the statistic under the null hypothesis is estimated nonparametrically, using bootstrap resampling. A theoretical F distribution is then fitted to the empirical distribution in order to allow correction for multiple comparisons. The maps allow for the systematic exploration of the solution space for dipolar sources. They permit to test whether the data support a given solution. They do not rely on the assumption of uncorrelated source time courses. They can be compared to other statistical parametric maps such as those used in functional magnetic resonance imaging (fMRI). Results are presented for both simulated and real data. The maps were compared with LORETA and MUSIC results. For the real data consisting of an average of epileptic spikes, we observed good agreement between the EEG statistical maps, intracranial EEG recordings, and fMRI activations.

...read moreread less

Journal Article•10.1002/SIM.2179•

A multiple testing procedure to associate gene expression levels with survival

[...]

Sin-Ho Jung¹, Kouros Owzar¹, Stephen L. George¹•Institutions (1)

Duke University¹

30 Oct 2005-Statistics in Medicine

TL;DR: This paper will address issues by utilizing a general correlation measure, a non-parametric test statistic, and control of the family-wise error rate by employing permutation resampling in microarray studies on patients with lung cancer.

...read moreread less

Abstract: In many microarray studies the primary objective is to identify, from a large panel of genes, those which are prognostic markers of a censored survival endpoint such as time to disease recurrence or death. Often, these genes are considered prognostic in that their respective expressions are associated with the survival endpoint of interest. To assess this association requires specifying an appropriate measure of association, a suitable test statistic and, as the number of genes is large, proper handling of multiplicity issues. In this paper, we will address these issues by utilizing a general correlation measure, a non-parametric test statistic, and control of the family-wise error rate by employing permutation resampling. Comprehensive simulation studies are conducted to investigate the statistical properties of the proposed procedure. The proposed method is applied to a recently published data set on patients with lung cancer.

...read moreread less

Empirical Bayes estimation of a sparse vector of gene expression changes

[...]

Stephen W. Erickson¹, Chiara Sabatti¹•Institutions (1)

University of California, Los Angeles¹

1 Feb 2005

TL;DR: It is argued that hypothesis testing, with its emphasis on type I error and family analogues, may not address the exploratory nature of most microarray experiments, and is proposed viewing the problem as one of estimation of a vector known to have a large number of zero components.

...read moreread less

Abstract: Gene microarray technology is often used to compare the expression of thousand of genes in two different cell lines. Typically, one does not expect measurable changes in transcription amounts for a large number of genes; furthermore, the noise level of array experiments is rather high in relation to the available number of replicates. For the purpose of statistical analysis, inference on the “population” difference in expression for genes across the two cell lines is often cast in the framework of hypothesis testing, with the null hypothesis being no change in expression. Given that thousands of genes are investigated at the same time, this requires some multiple comparison correction procedure to be in place. We argue that hypothesis testing, with its emphasis on type I error and family analogues, may not address the exploratory nature of most microarray experiments. We instead propose viewing the problem as one of estimation of a vector known to have a large number of zero components. In a Bayesian framework, we describe the prior knowledge on expression changes using mixture priors that incorporate a mass at zero and we choose a loss function that favors the selection of sparse solutions. We consider two different models applicable to the microarray problem, depending on the nature of replicates available, and show how to explore the posterior distributions of the parameters using MCMC. Simulations show an interesting connection between this Bayesian estimation framework and both false discovery rate (FDR) control, and misclassification minimizing pro- cedures. Finally, two empirical examples illustrate the practical advantages of this Bayesian estimation paradigm

...read moreread less

Journal Article•10.1111/J.1529-8817.2005.00193.X•

Using Tree-Based Recursive Partitioning Methods to Group Haplotypes for Increased Power in Association Studies

[...]

Kai Yu¹, Jun Xu², D. C. Rao¹, Michael A. Province¹•Institutions (2)

Washington University in St. Louis¹, Procter & Gamble²

01 Sep 2005-Annals of Human Genetics

TL;DR: This work proposes a procedure that uses a tree‐based recursive partitioning algorithm to group haplotypes into a small number of clusters, and conducts the association test based on groups of haplotypes instead of individual haplotypes.

...read moreread less

Abstract: Motivated by the increasing availability of high-density single nucleotide polymorphism (SNP) markers across the genome, various haplotype-based methods have been developed for candidate gene association studies, and even for genome-wide association studies. Although haplotype approaches dramatically reduce the multiple comparisons problem (as compared to single SNP analysis), even the number of existing haplotypes is relatively large, which increases the degrees of freedom and decreases the power for the corresponding test statistic. Grouping haplotypes is a way to reduce the degrees of freedom. We propose a procedure that uses a tree-based recursive partitioning algorithm to group haplotypes into a small number of clusters, and conducts the association test based on groups of haplotypes instead of individual haplotypes. The method can be used for both population-based and family-based association studies, with known or ambiguous phase information. Simulation studies suggest that the proposed method has the right type I error rate, and is more powerful than some existing haplotype-based tests.

...read moreread less

Journal Article•10.1093/JNCI/DJI122•

Gene–Environment Interactions: How Many False Positives?

[...]

Giuseppe Matullo, Marianne Berwick, Paolo Vineis

20 Apr 2005-Journal of the National Cancer Institute

TL;DR: The results of a large study on the association between base excision repair gene polymorphisms and lung cancer risk demonstrate problems typical of investigations of gene – environment interactions and propose a simple Bayesian approach based on the estimation of a prior probability and the calculation of posterior probability.

...read moreread less

Abstract: In this issue of the Journal, Hung et al. ( 1 ) describe the results of a large study on the association between base excision repair gene polymorphisms and lung cancer risk. Their study demonstrates problems typical of investigations of gene – environment interactions, in particular, the fact that among the generally negative results, some seemingly noteworthy associations are identifi ed in subgroups of subjects who are defi ned on the basis of their tumor histology or smoking habits. The unique aspect of this study is that the authors have estimated the probability that the associations found are attributable to chance (i.e., false positives). The idea of evaluating new observations in the light of existing evidence is not novel and belongs to at least two traditions: Bayesian statistics and clinical epidemiology. In clinical epidemiology, formal treatment of “proof ” from prior evidence is updated into posterior probability according to Bayes’ theorem. This approach has proved extremely useful because ( a ) it has revealed that diagnosis (and, by extension, clinical and scientifi c reasoning) develops in a sequence of steps that incorporate explicitly or implicitly previous knowledge ( 2 ) and ( b ) it has trained physicians to assign probabilities to the different components of such reasoning, namely, prior knowledge and “likelihood,” or the probability, of a positive test result in the presence of disease. Wacholder et al. ( 3 ) have proposed these same approaches to assess the validity of the increasing number of associations that are being identifi ed among genetic variants, environmental exposures, and disease (i.e., gene – environment and gene – gene interactions). A large number of such associations have been reported and even more are expected to emerge in the future. Tens of thousands of single nucleotide polymorphisms (SNPs) are or will be investigated for their association with cancer, and many of the observed results will be false positives. The challenge is to distinguish the false-positive associations from the true positives. Wacholder et al. ( 3 ) proposed a simple Bayesian approach that is based on the estimation of a prior probability and the calculation of posterior probability. The prior probability can be estimated from results of previous studies, from biochemical or molecular information (e.g., gene expression data) that supports the function of a SNP, or from other types of evidence such as sequence homology ( 4 ) . The general idea is to weight a new observation with the available prior evidence to derive a posterior probability. Recently, in an extension of the model proposed by Wacholder et al. ( 3 ) , Ioannidis ( 5 ) has incorporated selective reporting and other biases and the fact that specifi c hypotheses, including gene – disease associations and gene – environment interactions, are usually tested by many teams worldwide. By taking these parameters into account when scrutinizing all the evidence, one can show that very few identifi ed gene – disease associations and gene – environment interactions are probably real. The approach proposed by Wacholder et al. is ingenious and more convincing than other approaches, such as those based on Bonferroni’s correction for multiple comparisons or similar statistical methods ( 6 , 7 ) . Although the study by Hung et al. is a clever application of the proposal by Wacholder et al., execution of that proposal is not free from problems. The main problem is how best to calculate the prior probabilities. First, there is often no information available to use. Second, any available information may not be easy to evaluate, either because it is indirect (for example, from sequence homology) or because it is contradictory. The latter is very often the case with the epidemiologic evidence of gene – environment interactions (see below). Third, methods to quantify prior probabilities are not available, even when we have good evidence. The attempt made by Hung et al. is a good step forward, but it is still imperfect. The authors have considered and applied fi ve categories of prior probability (from 50% to 0.1%); however, the basis for these numbers and categories is unclear. A further step forward might consist of using results of welldone meta-analyses or pooled analyses of the available studies. In this case, we have good examples from the clinical literature: a good meta-analysis can allow clinicians to judge how much a new study changes the estimate of effi cacy of a drug. For example, if a meta-analysis provides an odds ratio of 0.9 and a new study reports an odds ratio of 0.2, then the real contribution of the new study to the advancement of knowledge will be judged by its ability to change the odds ratio provided by the meta-analysis. This ability will depend on both qualitative considerations (such as potential biases and the response rate) and the size of the study. Metaanalyses can be extremely useful for obtaining prior estimates against which the new evidence can be challenged because they include ( a ) an overall odds ratio from virtually all the available studies, ( b ) an evaluation of the quality of the studies (a quality score is usually assigned to each study included in a metaanalysis), ( c ) a weight that depends on the study size, and ( d ) a measure of heterogeneity. In a meta-analysis, a large and well-conducted positive study can thus outweigh several small negative studies. Figure 1 shows an example of a meta-analysis of some of the genes included in the Hung et al. article that we performed using data on gene–environment interactions, with a specifi c focus on DNA repair polymorphisms (summarized at http://perseus.isi.it/ huge [last accessed: March 24, 2005]). A network of investigative groups involved in human genome epidemiology research has been developed to combine data from many studies to overcome the defi ciencies of currently available, relatively small datasets (Ioannidis JPA, Altman R, Boffetta P, Danesh J, Hartge P, Little J, et al.: unpublished data). Overall, the results shown in the fi gure support the choices made by Hung et al. in assigning prior

...read moreread less

Journal Article•10.1017/S0016672305007792•

Haplotype association analysis of human disease traits using genotype data of unrelated individuals.

[...]

Qihua Tan¹, Lene Christiansen¹, Kaare Christensen¹, Lise Bathum¹, Shuxia Li¹, Jing Hua Zhao², Torben A Kruse³ - Show less +3 more•Institutions (3)

University of Southern Denmark¹, University College London², Odense University Hospital³

01 Dec 2005-Genetics Research

TL;DR: This work shows, by applying the model to an empirical data set, that the method based on the well-known logistic regression model is a useful tool for haplotype association analysis of human disease traits.

...read moreread less

Abstract: Haplotype inference has become an important part of human genetic data analysis due to its functional and statistical advantages over the single-locus approach in linkage disequilibrium mapping. Different statistical methods have been proposed for detecting haplotype - disease associations using unphased multi-locus genotype data, ranging from the early approach by the simple gene-counting method to the recent work using the generalized linear model. However, these methods are either confined to case - control design or unable to yield unbiased point and interval estimates of haplotype effects. Based on the popular logistic regression model, we present a new approach for haplotype association analysis of human disease traits. Using haplotype-based parameterization, our model infers the effects of specific haplotypes (point estimation) and constructs confidence interval for the risks of haplotypes (interval estimation). Based on the estimated parameters, the model calculates haplotype frequency conditional on the trait value for both discrete and continuous traits. Moreover, our model provides an overall significance level for the association between the disease trait and a group or all of the haplotypes. Featured by the direct maximization in haplotype estimation, our method also facilitates a computer simulation approach for correcting the significance level of individual haplotype to adjust for multiple testing. We show, by applying the model to an empirical data set, that our method based on the well-known logistic regression model is a useful tool for haplotype association analysis of human disease traits.

...read moreread less

Journal Article•10.1002/GEPI.20088•

Exact family-based association tests for biallelic data

[...]

Kady Schneiter¹, Nan M. Laird¹, Chris Corcoran²•Institutions (2)

Harvard University¹, Utah State University²

01 Nov 2005-Genetic Epidemiology

TL;DR: The exact approach is described as a useful alternative to the aymptotic test and it is shown that the exact tests for biallelic data may be most useful for the recessive disease model.

...read moreread less

Abstract: Family-based study designs have an important role in the search for association between disease phenotypes and genetic markers. Unlike traditional case-control methods, family-based tests use within-family data to avoid identification of spurious associations that may result from population admixture. Many family-based association tests have been proposed to accommodate a variety of ascertainment schemes and patterns of missing data. In this report, we describe exact family-based association tests for biallelic data. Specifically, we discuss test of the null hypotheses “no linkage and no association” and “linkage, but no association”. These tests, which are valid under various models for inheritance and patterns of missingness, utilize the procedure proposed by Rabinowitz and Laird [2000: Hum Hered 50:211–223] that provides a unified framework for family based association testing (FBAT). The conditioning approach implemented in FBAT makes an exact test conceptually straightforward, but computationally difficult since the minimum sufficient statistics upon which we condition do not have a conventional form. An exact test may be especially critical when accurate computation of the extreme area of the FBAT statistic is needed, such as when the study design necessitates multiple comparisons adjustments. We describe the exact approach as a useful alternative to the aymptotic test and show that the exact tests for biallelic data may be most useful for the recessive disease model. Genet. Epidemiol. 2005 © 2005 Wiley-Liss, Inc.

...read moreread less

Journal Article•10.1080/10543400500265710•

Sample Size Calculation for Simulation-Based Multiple-Testing Procedures

[...]

Heejung Bang¹, Sin-Ho Jung², Stephen L. George²•Institutions (2)

Cornell University¹, Duke University²

01 Jan 2005-Journal of Biopharmaceutical Statistics

TL;DR: A simple method to calculate sample size and power for a simulation-based multiple testing procedure which gives a sharper critical value than the standard Bonferroni method is presented.

...read moreread less

Abstract: In this article, we present a simple method to calculate sample size and power for a simulation-based multiple testing procedure which gives a sharper critical value than the standard Bonferroni method. The method is especially useful when several highly correlated test statistics are involved in a multiple-testing procedure. The formula for sample size calculation will be useful in designing clinical trials with multiple endpoints or correlated outcomes. We illustrate our method with a quality-of-life study for patients with early stage prostate cancer. Our method can also be used for comparing multiple independent groups.

...read moreread less

Resampling Based Multiple Testing Procedure Controlling Tail Probability of the Proportion of False Positives

[...]

Mark J. van der Laan¹, Merrill D. Birkner¹, Alan Hubbard¹•Institutions (1)

University of California, Berkeley¹

1 Jan 2005

TL;DR: In this article, a new resampling based multiple testing procedure is proposed to control the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers.

...read moreread less

Abstract: Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. In this article we propose a new resampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers. The procedure involves 1) specifying a conditional distribution for a guessed set of true null hypotheses, given the data, which asymptotically is degenerate at the true set of null hypotheses, and 2) specifying a generally valid null distribution for the vector of test-statistics proposed in Pollard and van der Laan (2003), and generalized in our subsequent articles Dudoit et al. (2004), van der Laan et al. (2004a) and van der Laan et al. (2004b). We establish the finite sample rational behind our proposal, and prove that this new multiple testing procedure asymptotically controls the wished tail probability for the proportion of false positives under general data generating distributions. In addition, we provide simulation studies establishing that this method is generally more powerful in finite samples than our previously proposed augmentation multiple testing procedure (van der Laan et al. (2004b)) and competing procedures from the literature. Finally, we illustrate our methodology with a data analysis.

...read moreread less

Journal Article•10.1186/1471-2156-6-S1-S93•

Selection of single-nucleotide polymorphisms in disease association data

[...]

Jungnam Joo¹, Xin Tian¹, Gang Zheng¹, Jing-Ping Lin¹, Nancy L. Geller¹ - Show less +1 more•Institutions (1)

National Institutes of Health¹

30 Dec 2005-BMC Genetics

TL;DR: Several test statistics that can be utilized in testing disease association are examined and several multiple testing procedures that can properly control the family-wise error rates when the univariate approach is applied to multiple markers are reviewed.

...read moreread less

Abstract: We studied several methods for selecting single-nucleotide polymorphisms (SNPs) in a disease association study. Two major categories for analytical strategy are the univariate and the set selection approaches. The univariate approach evaluates each SNP marker one at a time, while the set selection approach tests disease association of a set of SNP markers simultaneously. We examined various test statistics that can be utilized in testing disease association and also reviewed several multiple testing procedures that can properly control the family-wise error rates when the univariate approach is applied to multiple markers. The set association methods were then briefly reviewed. Finally, we applied these methods to the data from Collaborative Study on the Genetics of Alcoholism (COGA).

...read moreread less

...

Expand