About: Multifactor dimensionality reduction is a research topic. Over the lifetime, 519 publications have been published within this topic receiving 16846 citations.
TL;DR: In this article, the authors introduced multifactor dimensionality reduction (MDR) as a method for reducing the dimensionality of multilocus information, to improve the identification of polymorphism combinations associated with disease risk.
Abstract: One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common complex multifactorial human diseases. This challenge is partly due to the limitations of parametric-statistical methods for detection of gene effects that are dependent solely or partially on interactions with other genes and with environmental exposures. We introduce multifactor-dimensionality reduction (MDR) as a method for reducing the dimensionality of multilocus information, to improve the identification of polymorphism combinations associated with disease risk. The MDR method is nonparametric (i.e., no hypothesis about the value of a statistical parameter is made), is model-free (i.e., it assumes no particular inheritance model), and is directly applicable to case-control and discordant-sib-pair studies. Using simulated case-control data, we demonstrate that MDR has reasonable power to identify interactions among two or more loci in relatively small samples. When it was applied to a sporadic breast cancer case-control data set, in the absence of any statistically significant independent main effects, MDR identified a statistically significant high-order interaction among four polymorphisms from three different estrogen-metabolism genes. To our knowledge, this is the first report of a four-locus interaction associated with a common complex multifactorial disease.
TL;DR: A multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension thus permitting interactions to be detected in relatively small sample sizes is developed.
Abstract: Motivation: Polymorphisms in human genes are being described in remarkable numbers. Determining which polymorphisms and which environmental factors are associated with common, complex diseases has become a daunting task. This is partly because the effect of any single genetic variation will likely be dependent on other genetic variations (gene–gene interaction or epistasis) and environmental factors (gene–environment interaction). Detecting and characterizing interactions among multiple factors is both a statistical and a computational challenge. To address this problem, we have developed am ultifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe the MDR approach and an MDR software package. Results: We developed a program that integrates MDR with a cross-validation strategy for estimating the classification and prediction error of multifactor models. The software can be used to analyze interactions among 2–15 genetic and/or environmental factors. The dataset may contain up to 500 total variables and a maximum of 4000 study subjects.
TL;DR: A working hypothesis is formed that epistasis is a ubiquitous component of the genetic architecture of common human diseases and that complex interactions are more important than the independent main effects of any one susceptibility gene.
Abstract: There is increasing awareness that epistasis or gene-gene interaction plays a role in susceptibility to common human diseases In this paper, we formulate a working hypothesis that epistasis is a ubiquitous component of the genetic architecture of common human diseases and that complex interactions are more important than the independent main effects of any one susceptibility gene This working hypothesis is based on several bodies of evidence First, the idea that epistasis is important is not new In fact, the recognition that deviations from Mendelian ratios are due to interactions between genes has been around for nearly 100 years Second, the ubiquity of biomolecular interactions in gene regulation and biochemical and metabolic systems suggest that relationship between DNA sequence variations and clinical endpoints is likely to involve gene-gene interactions Third, positive results from studies of single polymorphisms typically do not replicate across independent samples This is true for both linkage and association studies Fourth, gene-gene interactions are commonly found when properly investigated We review each of these points and then review an analytical strategy called multifactor dimensionality reduction for detecting epistasis We end with ideas of how hypotheses about biological epistasis can be generated from statistical evidence using biochemical systems models If this working hypothesis is true, it suggests that we need a research strategy for identifying common disease susceptibility genes that embraces, rather than ignores, the complexity of the genotype to phenotype relationship
TL;DR: A generalized MDR (GMDR) method is reported that permits adjustment for discrete and quantitative covariates and is applicable to both dichotomous and continuous phenotypes in various population-based study designs and serves the purpose of identifying contributors to population variation better than do the other existing methods.
Abstract: The determination of gene-by-gene and gene-by-environment interactions has long been one of the greatest challenges in genetics. The traditional methods are typically inadequate because of the problem referred to as the "curse of dimensionality." Recent combinatorial approaches, such as the multifactor dimensionality reduction (MDR) method, the combinatorial partitioning method, and the restricted partition method, have a straightforward correspondence to the concept of the phenotypic landscape that unifies biological, statistical genetics, and evolutionary theories. However, the existing approaches have several limitations, such as not allowing for covariates, that restrict their practical use. In this study, we report a generalized MDR (GMDR) method that permits adjustment for discrete and quantitative covariates and is applicable to both dichotomous and continuous phenotypes in various population-based study designs. Computer simulations indicated that the GMDR method has superior performance in its ability to identify epistatic loci, compared with current methods in the literature. We applied our proposed method to a genetics study of four genes that were reported to be associated with nicotine dependence and found significant joint action between CHRNB4 and NTRK2. Moreover, our example illustrates that the newly proposed GMDR approach can increase prediction ability, suggesting that its use is justified in practice. In summary, GMDR serves the purpose of identifying contributors to population variation better than do the other existing methods.
TL;DR: This work proposes using a variant of logistic regression with (L)_(2)-regularization to fit gene-gene and gene-environment interaction models and demonstrates that this method outperforms other methods in the identification of the interaction structures as well as prediction accuracy.
Abstract: We propose using a variant of logistic regression (LR) with (L)_(2)-regularization to fit gene-gene and gene-environment interaction models. Studies have shown that many common diseases are influenced by interaction of certain genes. LR models with quadratic penalization not only correctly characterizes the influential genes along with their interaction structures but also yields additional benefits in handling high-dimensional, discrete factors with a binary response. We illustrate the advantages of using an (L)_(2)-regularization scheme and compare its performance with that of "multifactor dimensionality reduction" and "FlexTree," 2 recent tools for identifying gene-gene interactions. Through simulated and real data sets, we demonstrate that our method outperforms other methods in the identification of the interaction structures as well as prediction accuracy. In addition, we validate the significance of the factors selected through bootstrap analyses.