Mixed linear model approach adapted for genome-wide association studies.
Zhiwu Zhang,Elhan S. Ersoz,Chao-Qiang Lai,Rory J. Todhunter,Hemant K. Tiwari,Michael A. Gore,Peter J. Bradbury,Jianming Yu,Donna K. Arnett,Jose M. Ordovas,Edward S. Buckler,Edward S. Buckler +11 more
TL;DR: A compression approach is reported, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups and a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components.
read more
Abstract: Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called ‘compressed MLM’, that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, ‘population parameters previously determined’ (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Identifying genes for resistant starch, slowly digestible starch, and rapidly digestible starch in rice using genome-wide association studies
TL;DR: This study identifies new genes for starch digestibility in rice and interprets the genetic mechanisms of RS and SDS by GWAS, a valid strategy to genetically dissect the formation of starch digestion properties in rice.
14
A Genome-Wide Association Study of Coleoptile Length in Different Chinese Wheat Landraces.
Jun Ma,Yu Lin,Si Tang,Shuonan Duan,Qing Wang,Fangkun Wu,Caixia Li,Xiaojun Jiang,Kunyu Zhou,Yaxi Liu +9 more
TL;DR: A genome-wide association study on a set of 707 Chinese wheat landraces provided important insights into the genetic mechanisms underlying coleoptile growth and could be applied to marker-assisted wheat selection.
14
•Posted Content
Essential formulae for restricted maximum likelihood and its derivatives associated with the linear mixed models
Shengxin Zhu,Andrew J. Wathen +1 more
TL;DR: In this article, the restricted maximum likelihood (RML) method for variance component analysis on large-scale unbalanced data is introduced and a self-contained derivation on some available formulae used in practical algorithms is provided.
14
Statistical Association Mapping of Population-Structured Genetic Data
TL;DR: A statistical framework to compensate for spurious inference of the associated sites due to population inhomogeneities is introduced by equipping the current methodologies with a state-of-the-art clustering algorithm being widely used in population genetics applications.
GWAS-Flow: A GPU accelerated framework for efficient permutation based genome-wide association studies
TL;DR: To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, the machine learning framework TensorFlow is used to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.
References
Inference of population structure using multilocus genotype data
TL;DR: Pritch et al. as discussed by the authors proposed a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations, which can be applied to most of the commonly used genetic markers, provided that they are not closely linked.
Data clustering: a review
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
TASSEL: software for association mapping of complex traits in diverse samples
Peter J. Bradbury,Zhiwu Zhang,Dallas E. Kroon,Terry M. Casstevens,Yogesh Ramdoss,Edward S. Buckler +5 more
TL;DR: TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) implements general linear model and mixed linear model approaches for controlling population and family structure and allows for linkage disequilibrium statistics to be calculated and visualized graphically.
7.2K
A unified mixed-model method for association mapping that accounts for multiple levels of relatedness
Jianming Yu,Gaël Pressoir,William H. Briggs,Irie Vroh Bi,Masanori Yamasaki,John Doebley,Michael D. McMullen,Michael D. McMullen,Brandon S. Gaut,Dahlia M. Nielsen,James B. Holland,James B. Holland,Stephen Kresovich,Edward S. Buckler,Edward S. Buckler +14 more
TL;DR: A unified mixed-model approach to account for multiple levels of relatedness simultaneously as detected by random genetic markers is developed and provides a powerful complement to currently available methods for association mapping.
4.1K