TL;DR: This work describes a method that enables explicit detection and correction of population stratification on a genome-wide scale and uses principal components analysis to explicitly model ancestry differences between cases and controls.
Abstract: Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers. Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies 1‐8 . Because the effects of stratification vary in proportion to the number of samples 9 , stratification will be an increasing problem in the large-scale association studies of the future, which will analyze thousands of samples in an effort to detect common genetic variants of weak effect. The two prevailing methods for dealing with stratification are genomic control and structured association 9‐14 . Although genomic control and structured association have proven useful in a variety of contexts, they have limitations. Genomic control corrects for stratification by adjusting association statistics at each marker by a uniform overall inflation factor. However, some markers differ in their allele frequencies across ancestral populations more than others. Thus, the uniform adjustment applied by genomic control may be insufficient at markers having unusually strong differentiation across ancestral populations and may be superfluous at markers devoid of such differentiation, leading to a loss in power. Structured association uses a program such as STRUCTURE 15 to assign the samples to discrete subpopulation clusters and then aggregates evidence of association within each cluster. If fractional membership in more than one cluster is allowed, the method cannot currently be applied to genome-wide association studies because of its intensive computational cost on large data sets. Furthermore, assignments of individuals to clusters are highly sensitive to the number of clusters, which is not well defined 14,16 .
TL;DR: Associations at DRD2 and several genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses.
Abstract: Schizophrenia is a highly heritable disorder. Genetic risk is conferred by a large number of alleles, including common alleles of small effect that might be detected by genome-wide association studies. Here we report a multi-stage schizophrenia genome-wide association study of up to 36,989 cases and 113,075 controls. We identify 128 independent associations spanning 108 conservatively defined loci that meet genome-wide significance, 83 of which have not been previously reported. Associations were enriched among genes expressed in brain, providing biological plausibility for the findings. Many findings have the potential to provide entirely new insights into aetiology, but associations at DRD2 and several genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses. Independent of genes expressed in brain, associations were enriched among genes expressed in tissues that have important roles in immunity, providing support for the speculated link between the immune system and schizophrenia.
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.
Abstract: Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.
TL;DR: The short variant of the polymorphism reduces the transcriptional efficiency of the 5-HTT gene promoter, resulting in decreased 5HTT expression and 5HT uptake in lymphoblasts as discussed by the authors, which is the site of action of widely used uptake-inhibiting antidepressant and antianxiety drugs.
Abstract: Transporter-facilitated uptake of serotonin (5-hydroxytryptamine or 5-HT) has been implicated in anxiety in humans and animal models and is the site of action of widely used uptake-inhibiting antidepressant and antianxiety drugs. Human 5-HT transporter (5-HTT) gene transcription is modulated by a common polymorphism in its upstream regulatory region. The short variant of the polymorphism reduces the transcriptional efficiency of the 5-HTT gene promoter, resulting in decreased 5-HTT expression and 5-HT uptake in lymphoblasts. Association studies in two independent samples totaling 505 individuals revealed that the 5-HTT polymorphism accounts for 3 to 4 percent of total variation and 7 to 9 percent of inherited variance in anxiety-related personality traits in individuals as well as sibships.