1. What is FDA's role in GWAS?
Functional data analysis (FDA) plays a critical role in genome-wide association studies (GWAS) by incorporating correlation, linkage, and linkage disequilibrium information of genetic variants into association tests. FDA captures complex dependency structures and higher order linkage disequilibrium among genetic variants, which are often missed by other methods. It also provides a computationally efficient way to test associations between multiple variants and phenotypes. FDA's ability to naturally incorporate genetic variant data as a collection of random variables forming a stochastic process makes it suitable for large-scale genomic data analysis. FDA has been proven effective in GWAS, as demonstrated by Fan et al. and subsequent studies. The objective of this paper is to propose a robust approach to represent genetic variants using FDA, addressing the normality assumption violation in least-square smoothing. The paper introduces a two-step method integrating curve smoothing with functional generalized linear models to identify haplotypes associated with the phenotype. This approach is detailed in Section 2, and its performance is compared with single SNP test results and haplotypes constructed based on genetic information in Section 3. The strengths and limitations of the approach are discussed in Section 4, and future directions are highlighted in Section 5. Overall, FDA's role in GWAS is to provide a comprehensive and efficient method for analyzing complex genetic data and identifying associations between genetic variants and phenotypes.
read more
2. How are SNPs coded in GWAS?
In GWAS, SNPs are often coded as the number of minor alleles observed at a given locus. For a particular locus, there are two alleles on a homologous chromosome pair, and the SNPs can be coded as 0, 1, or 2. The nucleotide A, the more common allele, is referred to as the major allele or reference allele, while T, the less common allele, will be the minor allele or non-reference allele. Consequently, a genotype of AA is coded as 2, AT or TA as 1, and TT as 0. This coding system helps in analyzing the genetic variations and their association with diseases in GWAS studies.
read more
3. How to determine statistically significant SNPs in functional regression models?
In functional regression models, statistically significant SNPs can be found by examining the confidence band of b(t), which connects the point-wise confidence intervals of b(t) at each position t. The confidence band is calculated using the formula bT ph(t) +- z a/2 q (b) k=1 q (b) l=1 ph k (t)ph l (t)Cov(b k , b l ). Here, ph(t) represents a set of basis functions for b(t), and b is a vector of coefficients corresponding to the basis functions. By analyzing the confidence band, researchers can identify SNPs that have a statistically significant association with the phenotype or disease being studied.
read more
4. How are genetic-based haplotypes constructed?
Genetic-based haplotypes are constructed using two common approaches: (1) blocks containing a fixed number of SNPs and (2) fixed genomic windows of size. In these approaches, blocks with 5, 10, 20, and 50 SNPs and window sizes of 10 kb, 20 kb, 50 kb, and 100 kb are used. This results in 1392 blocks for Approach (1) and 2394 blocks for Approach (2). Missing values in genotype data are handled by excluding samples with missing values for any SNP in a haplotype and imputing missing values using the mode. Significant blocks are identified using a GLM model, with a p-value threshold of 10^-3. Approach (1) found 9 significant blocks, while Approach (2) found 11 significant blocks, primarily located around 22 Mb with a few at 24 Mb. These findings align with single SNP tests, which identified significant SNPs near 22 Mb. Haplotypes at 24 Mb have larger p-values (>10^-4) and are considered noise rather than true associations.
read more