TL;DR: Analysis of advanced cancer patients treated with immune-checkpoint inhibitors shows that tumor mutational burden, as assessed by targeted next-generation sequencing, predicts survival after immunotherapy across multiple cancer types.
Abstract: Immune checkpoint inhibitor (ICI) treatments benefit some patients with metastatic cancers, but predictive biomarkers are needed. Findings in selected cancer types suggest that tumor mutational burden (TMB) may predict clinical response to ICI. To examine this association more broadly, we analyzed the clinical and genomic data of 1,662 advanced cancer patients treated with ICI, and 5,371 non-ICI-treated patients, whose tumors underwent targeted next-generation sequencing (MSK-IMPACT). Among all patients, higher somatic TMB (highest 20% in each histology) was associated with better overall survival. For most cancer histologies, an association between higher TMB and improved survival was observed. The TMB cutpoints associated with improved survival varied markedly between cancer types. These data indicate that TMB is associated with improved survival in patients receiving ICI across a wide variety of cancer types, but that there may not be one universal definition of high TMB.
TL;DR: Pathway analysis implicates immunity, lipid metabolism, tau binding proteins, and amyloid precursor protein (APP) metabolism, showing that genetic variants affecting APP and Aβ processing are associated not only with early-onset autosomal dominant Alzheimer’s disease but also with LOAD.
Abstract: Risk for late-onset Alzheimer’s disease (LOAD), the most prevalent dementia, is partially driven by genetics. To identify LOAD risk loci, we performed a large genome-wide association meta-analysis of clinically diagnosed LOAD (94,437 individuals). We confirm 20 previous LOAD risk loci and identify five new genome-wide loci (IQCK, ACE, ADAM10, ADAMTS1, and WWOX), two of which (ADAM10, ACE) were identified in a recent genome-wide association (GWAS)-by-familial-proxy of Alzheimer’s or dementia. Fine-mapping of the human leukocyte antigen (HLA) region confirms the neurological and immune-mediated disease haplotype HLA-DR15 as a risk factor for LOAD. Pathway analysis implicates immunity, lipid metabolism, tau binding proteins, and amyloid precursor protein (APP) metabolism, showing that genetic variants affecting APP and Aβ processing are associated not only with early-onset autosomal dominant Alzheimer’s disease but also with LOAD. Analyses of risk genes and pathways show enrichment for rare variants (P = 1.32 × 10−7), indicating that additional rare variants remain to be identified. We also identify important genetic correlations between LOAD and traits such as family history of dementia and education.
TL;DR: To realize the full and equitable potential of polygenic risk scores, greater diversity must be prioritized in genetic studies, and summary statistics must be publically disseminated to ensure that health disparities are not increased for those individuals already most underserved.
Abstract: Polygenic risk scores (PRS) are poised to improve biomedical outcomes via precision medicine. However, the major ethical and scientific challenge surrounding clinical implementation of PRS is that those available today are several times more accurate in individuals of European ancestry than other ancestries. This disparity is an inescapable consequence of Eurocentric biases in genome-wide association studies, thus highlighting that-unlike clinical biomarkers and prescription drugs, which may individually work better in some populations but do not ubiquitously perform far better in European populations-clinical uses of PRS today would systematically afford greater improvement for European-descent populations. Early diversifying efforts show promise in leveling this vast imbalance, even when non-European sample sizes are considerably smaller than the largest studies to date. To realize the full and equitable potential of PRS, greater diversity must be prioritized in genetic studies, and summary statistics must be publically disseminated to ensure that health disparities are not increased for those individuals already most underserved.
TL;DR: A large genome-wide association study of clinically diagnosed AD and AD-by-proxy identifies new loci and functional pathways that contribute to AD risk and adds novel insights into the neurobiology of AD.
Abstract: Alzheimer's disease (AD) is highly heritable and recent studies have identified over 20 disease-associated genomic loci. Yet these only explain a small proportion of the genetic variance, indicating that undiscovered loci remain. Here, we performed a large genome-wide association study of clinically diagnosed AD and AD-by-proxy (71,880 cases, 383,378 controls). AD-by-proxy, based on parental diagnoses, showed strong genetic correlation with AD (rg = 0.81). Meta-analysis identified 29 risk loci, implicating 215 potential causative genes. Associated genes are strongly expressed in immune-related tissues and cell types (spleen, liver, and microglia). Gene-set analyses indicate biological mechanisms involved in lipid-related processes and degradation of amyloid precursor proteins. We show strong genetic correlations with multiple health-related outcomes, and Mendelian randomization results suggest a protective effect of cognitive ability on AD risk. These results are a step forward in identifying the genetic factors that contribute to AD risk and add novel insights into the neurobiology of AD.
TL;DR: A genome-wide association meta-analysis of 18,381 austim spectrum disorder cases and 27,969 controls identifies five risk loci and the authors find quantitative and qualitative polygenic heterogeneity across ASD subtypes.
Abstract: Autism spectrum disorder (ASD) is a highly heritable and heterogeneous group of neurodevelopmental phenotypes diagnosed in more than 1% of children. Common genetic variants contribute substantially to ASD susceptibility, but to date no individual variants have been robustly associated with ASD. With a marked sample-size increase from a unique Danish population resource, we report a genome-wide association meta-analysis of 18,381 individuals with ASD and 27,969 controls that identified five genome-wide-significant loci. Leveraging GWAS results from three phenotypes with significantly overlapping genetic architectures (schizophrenia, major depression, and educational attainment), we identified seven additional loci shared with other traits at equally strict significance levels. Dissecting the polygenic architecture, we found both quantitative and qualitative polygenic heterogeneity across ASD subtypes. These results highlight biological insights, particularly relating to neuronal function and corticogenesis, and establish that GWAS performed at scale will be much more productive in the near term in ASD.
TL;DR: A genome-wide association meta-analysis of 20,183 individuals diagnosed with ADHD and 35,191 controls identifies variants surpassing genome- wide significance in 12 independent loci and implicates neurodevelopmental pathways and conserved regions of the genome as being involved in underlying ADHD biology.
Abstract: Attention deficit/hyperactivity disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 individuals diagnosed with ADHD and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, finding important new information about the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes and around brain-expressed regulatory marks. Analyses of three replication studies: a cohort of individuals diagnosed with ADHD, a self-reported ADHD sample and a meta-analysis of quantitative measures of ADHD symptoms in the population, support these findings while highlighting study-specific differences on genetic overlap with educational attainment. Strong concordance with GWAS of quantitative population measures of ADHD symptoms supports that clinical diagnosis of ADHD is an extreme expression of continuous heritable traits.
TL;DR: Evidence is reported for the involvement of many systems in tobacco and alcohol use, including genes involved in nicotinic, dopaminergic, and glutamatergic neurotransmission, which provide a solid starting point to evaluate the effects of these loci in model organisms and more precise substance use measures.
Abstract: Tobacco and alcohol use are leading causes of mortality that influence risk for many complex diseases and disorders1. They are heritable2,3 and etiologically related4,5 behaviors that have been resistant to gene discovery efforts6–11. In sample sizes up to 1.2 million individuals, we discovered 566 genetic variants in 406 loci associated with multiple stages of tobacco use (initiation, cessation, and heaviness) as well as alcohol use, with 150 loci evidencing pleiotropic association. Smoking phenotypes were positively genetically correlated with many health conditions, whereas alcohol use was negatively correlated with these conditions, such that increased genetic risk for alcohol use is associated with lower disease risk. We report evidence for the involvement of many systems in tobacco and alcohol use, including genes involved in nicotinic, dopaminergic, and glutamatergic neurotransmission. The results provide a solid starting point to evaluate the effects of these loci in model organisms and more precise substance use measures.
TL;DR: Genome-wide analysis identifies 30 loci associated with bipolar disorder, allowing for comparisons of shared genes and pathways with other psychiatric disorders, including schizophrenia and depression.
Abstract: Bipolar disorder is a highly heritable psychiatric disorder. We performed a genome-wide association study (GWAS) including 20,352 cases and 31,358 controls of European descent, with follow-up analysis of 822 variants with P < 1 × 10-4 in an additional 9,412 cases and 137,760 controls. Eight of the 19 variants that were genome-wide significant (P < 5 × 10-8) in the discovery GWAS were not genome-wide significant in the combined analysis, consistent with small effect sizes and limited power but also with genetic heterogeneity. In the combined analysis, 30 loci were genome-wide significant, including 20 newly identified loci. The significant loci contain genes encoding ion channels, neurotransmitter transporters and synaptic components. Pathway analysis revealed nine significantly enriched gene sets, including regulation of insulin secretion and endocannabinoid signaling. Bipolar I disorder is strongly genetically correlated with schizophrenia, driven by psychosis, whereas bipolar II disorder is more strongly correlated with major depressive disorder. These findings address key clinical questions and provide potential biological mechanisms for bipolar disorder.
TL;DR: Evidence of a causal effect of the gut microbiome on metabolic traits is shown and the use of MR is supported as a means to elucidate causal relationships from microbiome-wide association findings.
Abstract: Microbiome-wide association studies on large population cohorts have highlighted associations between the gut microbiome and complex traits, including type 2 diabetes (T2D) and obesity1. However, the causal relationships remain largely unresolved. We leveraged information from 952 normoglycemic individuals for whom genome-wide genotyping, gut metagenomic sequence and fecal short-chain fatty acid (SCFA) levels were available2, then combined this information with genome-wide-association summary statistics for 17 metabolic and anthropometric traits. Using bidirectional Mendelian randomization (MR) analyses to assess causality3, we found that the host-genetic-driven increase in gut production of the SCFA butyrate was associated with improved insulin response after an oral glucose-tolerance test (P = 9.8 × 10-5), whereas abnormalities in the production or absorption of another SCFA, propionate, were causally related to an increased risk of T2D (P = 0.004). These data provide evidence of a causal effect of the gut microbiome on metabolic traits and support the use of MR as a means to elucidate causal relationships from microbiome-wide association findings.
TL;DR: It is shown that trait-associated loci cover more than half of the genome, and 90% of these overlap with loci from multiple traits, which provides insights into how genetic variation contributes to trait variation.
Abstract: After a decade of genome-wide association studies (GWASs), fundamental questions in human genetics, such as the extent of pleiotropy across the genome and variation in genetic architecture across traits, are still unanswered. The current availability of hundreds of GWASs provides a unique opportunity to address these questions. We systematically analyzed 4,155 publicly available GWASs. For a subset of well-powered GWASs on 558 traits, we provide an extensive overview of pleiotropy and genetic architecture. We show that trait-associated loci cover more than half of the genome, and 90% of these overlap with loci from multiple traits. We find that potential causal variants are enriched in coding and flanking regions, as well as in regulatory elements, and show variation in polygenicity and discoverability of traits. Our results provide insights into how genetic variation contributes to trait variation. All GWAS results can be queried and visualized at the GWAS ATLAS resource ( https://atlas.ctglab.nl ).
TL;DR: A simple activity-by-contact model substantially outperformed previous methods at predicting the complex connections in the CRISPR dataset and allows systematic mapping of enhancer–gene connections in a given cell type, on the basis of chromatin-state measurements.
Abstract: Enhancer elements in the human genome control how genes are expressed in specific cell types and harbor thousands of genetic variants that influence risk for common diseases1-4. Yet, we still do not know how enhancers regulate specific genes, and we lack general rules to predict enhancer-gene connections across cell types5,6. We developed an experimental approach, CRISPRi-FlowFISH, to perturb enhancers in the genome, and we applied it to test >3,500 potential enhancer-gene connections for 30 genes. We found that a simple activity-by-contact model substantially outperformed previous methods at predicting the complex connections in our CRISPR dataset. This activity-by-contact model allows us to construct genome-wide maps of enhancer-gene connections in a given cell type, on the basis of chromatin state measurements. Together, CRISPRi-FlowFISH and the activity-by-contact model provide a systematic approach to map and predict which enhancers regulate which genes, and will help to interpret the functions of the thousands of disease risk variants in the noncoding genome.
TL;DR: The genetic architecture of anorexia nervosa mirrors its clinical presentation, showing significant genetic correlations with psychiatric disorders, physical activity, and metabolic (including glycemic), lipid and anthropometric traits, independent of the effects of common variants associated with body-mass index.
Abstract: Characterized primarily by a low body-mass index, anorexia nervosa is a complex and serious illness1, affecting 0.9-4% of women and 0.3% of men2-4, with twin-based heritability estimates of 50-60%5. Mortality rates are higher than those in other psychiatric disorders6, and outcomes are unacceptably poor7. Here we combine data from the Anorexia Nervosa Genetics Initiative (ANGI)8,9 and the Eating Disorders Working Group of the Psychiatric Genomics Consortium (PGC-ED) and conduct a genome-wide association study of 16,992 cases of anorexia nervosa and 55,525 controls, identifying eight significant loci. The genetic architecture of anorexia nervosa mirrors its clinical presentation, showing significant genetic correlations with psychiatric disorders, physical activity, and metabolic (including glycemic), lipid and anthropometric traits, independent of the effects of common variants associated with body-mass index. These results further encourage a reconceptualization of anorexia nervosa as a metabo-psychiatric disorder. Elucidating the metabolic component is a critical direction for future research, and paying attention to both psychiatric and metabolic components may be key to improving outcomes.
TL;DR: High-quality de novo–assembled genomes of two cultivated allotetraploid cotton species and whole-genome comparative analyses provide insights into the evolution of cotton genomes and improvement of fiber quality and resilience to stress.
Abstract: Allotetraploid cotton is an economically important natural-fiber-producing crop worldwide. After polyploidization, Gossypium hirsutum L. evolved to produce a higher fiber yield and to better survive harsh environments than Gossypium barbadense, which produces superior-quality fibers. The global genetic and molecular bases for these interspecies divergences were unknown. Here we report high-quality de novo-assembled genomes for these two cultivated allotetraploid species with pronounced improvement in repetitive-DNA-enriched centromeric regions. Whole-genome comparative analyses revealed that species-specific alterations in gene expression, structural variations and expanded gene families were responsible for speciation and the evolutionary history of these species. These findings help to elucidate the evolution of cotton genomes and their domestication history. The information generated not only should enable breeders to improve fiber quality and resilience to ever-changing environmental conditions but also can be translated to other crops for better understanding of their domestication history and use in improvement.
TL;DR: Property of TWAS is explored as a potential approach to prioritize causal genes at GWAS loci, by using simulations and case studies of literature-curated candidate causal genes for schizophrenia, low-density-lipoprotein cholesterol and Crohn’s disease.
Abstract: Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and gene expression datasets to identify gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes at GWAS loci, by using simulations and case studies of literature-curated candidate causal genes for schizophrenia, low-density-lipoprotein cholesterol and Crohn's disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene as well as loci where TWAS prioritizes multiple genes, some likely to be non-causal, owing to sharing of expression quantitative trait loci (eQTL). TWAS is especially prone to spurious prioritization with expression data from non-trait-related tissues or cell types, owing to substantial cross-cell-type variation in expression levels and eQTL strengths. Nonetheless, TWAS prioritizes candidate causal genes more accurately than simple baselines. We suggest best practices for causal-gene prioritization with TWAS and discuss future opportunities for improvement. Our results showcase the strengths and limitations of using eQTL datasets to determine causal genes at GWAS loci.
TL;DR: A perspective and primer on deep learning applications for genome analysis and successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores are provided.
Abstract: Deep learning methods are a class of machine learning techniques capable of identifying highly complex patterns in large datasets. Here, we provide a perspective and primer on deep learning applications for genome analysis. We discuss successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores. We include general guidance for how to effectively use deep learning methods as well as a practical guide to tools and resources. This primer is accompanied by an interactive online tutorial.
TL;DR: A large genetic association sample is used to detect novel loci and gain insight into the pathways, tissue and cell types involved in insomnia complaints, identifying 202 loci implicating 956 genes through positional, expression quantitative trait loci, and chromatin mapping.
Abstract: Insomnia is the second most prevalent mental disorder, with no sufficient treatment available. Despite substantial heritability, insight into the associated genes and neurobiological pathways remains limited. Here, we use a large genetic association sample (n = 1,331,010) to detect novel loci and gain insight into the pathways, tissue and cell types involved in insomnia complaints. We identify 202 loci implicating 956 genes through positional, expression quantitative trait loci, and chromatin mapping. The meta-analysis explained 2.6% of the variance. We show gene set enrichments for the axonal part of neurons, cortical and subcortical tissues, and specific cell types, including striatal, hypothalamic, and claustrum neurons. We found considerable genetic correlations with psychiatric traits and sleep duration, and modest correlations with other sleep-related traits. Mendelian randomization identified the causal effects of insomnia on depression, diabetes, and cardiovascular disease, and the protective effects of educational attainment and intracranial volume. Our findings highlight key brain areas and cell types implicated in insomnia, and provide new treatment targets.
TL;DR: Pathway and enrichment analyses, including mouse models with renal phenotypes, support the kidney as the main target organ and provide a comprehensive priority list of molecular targets for translational research.
Abstract: Chronic kidney disease (CKD) is responsible for a public health burden with multi-systemic complications. Through trans-ancestry meta-analysis of genome-wide association studies of estimated glomerular filtration rate (eGFR) and independent replication (n = 1,046,070), we identified 264 associated loci (166 new). Of these, 147 were likely to be relevant for kidney function on the basis of associations with the alternative kidney function marker blood urea nitrogen (n = 416,178). Pathway and enrichment analyses, including mouse models with renal phenotypes, support the kidney as the main target organ. A genetic risk score for lower eGFR was associated with clinically diagnosed CKD in 452,264 independent individuals. Colocalization analyses of associations with eGFR among 783,978 European-ancestry individuals and gene expression across 46 human tissues, including tubulo-interstitial and glomerular kidney compartments, identified 17 genes differentially expressed in kidney. Fine-mapping highlighted missense driver variants in 11 genes and kidney-specific regulatory variants. These results provide a comprehensive priority list of molecular targets for translational research.
TL;DR: This genetic atlas provides evidence linking associated SNPs to causal genes, offers new insight into osteoporosis pathophysiology, and highlights opportunities for drug development.
Abstract: Osteoporosis is a common aging-related disease diagnosed primarily using bone mineral density (BMD). We assessed genetic determinants of BMD as estimated by heel quantitative ultrasound in 426,824 individuals, identifying 518 genome-wide significant loci (301 novel), explaining 20% of its variance. We identified 13 bone fracture loci, all associated with estimated BMD (eBMD), in ~1.2 million individuals. We then identified target genes enriched for genes known to influence bone density and strength (maximum odds ratio (OR) = 58, P = 1 × 10-75) from cell-specific features, including chromatin conformation and accessible chromatin sites. We next performed rapid-throughput skeletal phenotyping of 126 knockout mice with disruptions in predicted target genes and found an increased abnormal skeletal phenotype frequency compared to 526 unselected lines (P < 0.0001). In-depth analysis of one gene, DAAM2, showed a disproportionate decrease in bone strength relative to mineralization. This genetic atlas provides evidence linking associated SNPs to causal genes, offers new insight into osteoporosis pathophysiology, and highlights opportunities for drug development.
TL;DR: The assembly of the genome of durum wheat cultivar Svevo enables genome-wide genetic diversity analyses highlighting modifications imposed by thousands of years of empirical selection and breeding.
Abstract: The domestication of wild emmer wheat led to the selection of modern durum wheat, grown mainly for pasta production. We describe the 10.45 gigabase (Gb) assembly of the genome of durum wheat cultivar Svevo. The assembly enabled genome-wide genetic diversity analyses revealing the changes imposed by thousands of years of empirical selection and breeding. Regions exhibiting strong signatures of genetic divergence associated with domestication and breeding were widespread in the genome with several major diversity losses in the pericentromeric regions. A locus on chromosome 5B carries a gene encoding a metal transporter (TdHMA3-B1) with a non-functional variant causing high accumulation of cadmium in grain. The high-cadmium allele, widespread among durum cultivars but undetected in wild emmer accessions, increased in frequency from domesticated emmer to modern durum wheat. The rapid cloning of TdHMA3-B1 rescues a wild beneficial allele and demonstrates the practical use of the Svevo genome for wheat improvement. Genome assembly of durum wheat cultivar Svevo enables genome-wide genetic diversity analyses highlighting modifications imposed by thousands of years of empirical selection and breeding.
TL;DR: The genome sequence of segmental allotetraploid peanut is reported and suggests that diversity generated by genetic deletions and homeologous recombination helped to favor the domestication of Arachis hypogaea over its diploid relatives.
Abstract: Like many other crops, the cultivated peanut (Arachis hypogaea L.) is of hybrid origin and has a polyploid genome that contains essentially complete sets of chromosomes from two ancestral species. Here we report the genome sequence of peanut and show that after its polyploid origin, the genome has evolved through mobile-element activity, deletions and by the flow of genetic information between corresponding ancestral chromosomes (that is, homeologous recombination). Uniformity of patterns of homeologous recombination at the ends of chromosomes favors a single origin for cultivated peanut and its wild counterpart A. monticola. However, through much of the genome, homeologous recombination has created diversity. Using new polyploid hybrids made from the ancestral species, we show how this can generate phenotypic changes such as spontaneous changes in the color of the flowers. We suggest that diversity generated by these genetic mechanisms helped to favor the domestication of the polyploid A. hypogaea over other diploid Arachis species cultivated by humans.
TL;DR: High-quality genome sequence of cultivated peanut provides insights into genome evolution and the genetic mechanisms underlying seed size and leaf resistance in peanut, providing a cornerstone for functional genomics and peanut improvement.
Abstract: High oil and protein content make tetraploid peanut a leading oil and food legume. Here we report a high-quality peanut genome sequence, comprising 2.54 Gb with 20 pseudomolecules and 83,709 protein-coding gene models. We characterize gene functional groups implicated in seed size evolution, seed oil content, disease resistance and symbiotic nitrogen fixation. The peanut B subgenome has more genes and general expression dominance, temporally associated with long-terminal-repeat expansion in the A subgenome that also raises questions about the A-genome progenitor. The polyploid genome provided insights into the evolution of Arachis hypogaea and other legume chromosomes. Resequencing of 52 accessions suggests that independent domestications formed peanut ecotypes. Whereas 0.42–0.47 million years ago (Ma) polyploidy constrained genetic variation, the peanut genome sequence aids mapping and candidate-gene discovery for traits such as seed size and color, foliar disease resistance and others, also providing a cornerstone for functional genomics and peanut improvement.
TL;DR: It is established that tumor hypoxia may drive aggressive molecular features across cancers and shape the clinical trajectory of individual tumors.
Abstract: Many primary-tumor subregions have low levels of molecular oxygen, termed hypoxia. Hypoxic tumors are at elevated risk for local failure and distant metastasis, but the molecular hallmarks of tumor hypoxia remain poorly defined. To fill this gap, we quantified hypoxia in 8,006 tumors across 19 tumor types. In ten tumor types, hypoxia was associated with elevated genomic instability. In all 19 tumor types, hypoxic tumors exhibited characteristic driver-mutation signatures. We observed widespread hypoxia-associated dysregulation of microRNAs (miRNAs) across cancers and functionally validated miR-133a-3p as a hypoxia-modulated miRNA. In localized prostate cancer, hypoxia was associated with elevated rates of chromothripsis, allelic loss of PTEN and shorter telomeres. These associations are particularly enriched in polyclonal tumors, representing a constellation of features resembling tumor nimbosus, an aggressive cellular phenotype. Overall, this work establishes that tumor hypoxia may drive aggressive molecular features across cancers and shape the clinical trajectory of individual tumors.
TL;DR: A near-complete chromosome-scale assembly for cultivated octoploid strawberry (Fragaria × ananassa) is reported and the origin and evolutionary processes that shaped this complex allopolyploid are uncovered, providing a useful resource for genome-wide analyses and molecular breeding.
Abstract: Cultivated strawberry emerged from the hybridization of two wild octoploid species, both descendants from the merger of four diploid progenitor species into a single nucleus more than 1 million years ago. Here we report a near-complete chromosome-scale assembly for cultivated octoploid strawberry (Fragaria × ananassa) and uncovered the origin and evolutionary processes that shaped this complex allopolyploid. We identified the extant relatives of each diploid progenitor species and provide support for the North American origin of octoploid strawberry. We examined the dynamics among the four subgenomes in octoploid strawberry and uncovered the presence of a single dominant subgenome with significantly greater gene content, gene expression abundance, and biased exchanges between homoeologous chromosomes, as compared with the other subgenomes. Pathway analysis showed that certain metabolomic and disease-resistance traits are largely controlled by the dominant subgenome. These findings and the reference genome should serve as a powerful platform for future evolutionary studies and enable molecular breeding in strawberry.
TL;DR: The largest study to date of East Asian participants is reported, identifying 21 genome-wide-significant associations in 19 genetic loci associated with schizophrenia and highlighting the importance of including sufficient samples of major ancestral groups to ensure their generalizability across populations.
Abstract: Schizophrenia is a debilitating psychiatric disorder with approximately 1% lifetime risk globally. Large-scale schizophrenia genetic studies have reported primarily on European ancestry samples, potentially missing important biological insights. Here, we report the largest study to date of East Asian participants (22,778 schizophrenia cases and 35,362 controls), identifying 21 genome-wide-significant associations in 19 genetic loci. Common genetic variants that confer risk for schizophrenia have highly similar effects between East Asian and European ancestries (genetic correlation = 0.98 ± 0.03), indicating that the genetic basis of schizophrenia and its biology are broadly shared across populations. A fixed-effect meta-analysis including individuals from East Asian and European ancestries identified 208 significant associations in 176 genetic loci (53 novel). Trans-ancestry fine-mapping reduced the sets of candidate causal variants in 44 loci. Polygenic risk scores had reduced performance when transferred across ancestries, highlighting the importance of including sufficient samples of major ancestral groups to ensure their generalizability across populations.
TL;DR: A tomato pan-genome constructed using genome sequences of 725 phylogenetically and geographically representative accessions captures 4,873 genes absent from the reference genome and identifies a rare allele of TomLoxC regulating fruit flavor.
Abstract: Modern tomatoes have narrow genetic diversity limiting their improvement potential. We present a tomato pan-genome constructed using genome sequences of 725 phylogenetically and geographically representative accessions, revealing 4,873 genes absent from the reference genome. Presence/absence variation analyses reveal substantial gene loss and intense negative selection of genes and promoters during tomato domestication and improvement. Lost or negatively selected genes are enriched for important traits, especially disease resistance. We identify a rare allele in the TomLoxC promoter selected against during domestication. Quantitative trait locus mapping and analysis of transgenic plants reveal a role for TomLoxC in apocarotenoid production, which contributes to desirable tomato flavor. In orange-stage fruit, accessions harboring both the rare and common TomLoxC alleles (heterozygotes) have higher TomLoxC expression than those homozygous for either and are resurgent in modern tomatoes. The tomato pan-genome adds depth and completeness to the reference genome, and is useful for future biological discovery and breeding.
TL;DR: Genome-wide association analyses based on whole-genome sequencing and imputation identify 40 new risk variants for colorectal cancer, including a strongly protective low-frequency variant at CHD1 and loci implicating signaling and immune function in disease etiology.
Abstract: To further dissect the genetic architecture of colorectal cancer (CRC), we performed whole-genome sequencing of 1,439 cases and 720 controls, imputed discovered sequence variants and Haplotype Reference Consortium panel variants into genome-wide association study data, and tested for association in 34,869 cases and 29,051 controls. Findings were followed up in an additional 23,262 cases and 38,296 controls. We discovered a strongly protective 0.3% frequency variant signal at CHD1. In a combined meta-analysis of 125,478 individuals, we identified 40 new independent signals at P < 5 × 10-8, bringing the number of known independent signals for CRC to ~100. New signals implicate lower-frequency variants, Kruppel-like factors, Hedgehog signaling, Hippo-YAP signaling, long noncoding RNAs and somatic drivers, and support a role for immune function. Heritability analyses suggest that CRC risk is highly polygenic, and larger, more comprehensive studies enabling rare variant analysis will improve understanding of biology underlying this risk and influence personalized screening strategies and drug development.
TL;DR: An expanded GWAS of birth weight and subsequent analysis using structural equation modeling and Mendelian randomization decomposes maternal and fetal genetic contributions and causal links between birth weight, blood pressure and glycemic traits.
Abstract: Birth weight variation is influenced by fetal and maternal genetic and non-genetic factors, and has been reproducibly associated with future cardio-metabolic health outcomes. In expanded genome-wide association analyses of own birth weight (n = 321,223) and offspring birth weight (n = 230,069 mothers), we identified 190 independent association signals (129 of which are novel). We used structural equation modeling to decompose the contributions of direct fetal and indirect maternal genetic effects, then applied Mendelian randomization to illuminate causal pathways. For example, both indirect maternal and direct fetal genetic effects drive the observational relationship between lower birth weight and higher later blood pressure: maternal blood pressure-raising alleles reduce offspring birth weight, but only direct fetal effects of these alleles, once inherited, increase later offspring blood pressure. Using maternal birth weight-lowering genotypes to proxy for an adverse intrauterine environment provided no evidence that it causally raises offspring blood pressure, indicating that the inverse birth weight-blood pressure association is attributable to genetic effects, and not to intrauterine programming.
TL;DR: Analysis of 1,988 cases of B-cell acute lymphoblastic leukemia characterizes 23 subtypes defined by genomic features and shows that two of the subtypes have frequent PAX5 alterations, demonstrating the utility of transcriptome sequencing to classify B-ALL.
Abstract: Recent genomic studies have identified chromosomal rearrangements defining new subtypes of B-progenitor acute lymphoblastic leukemia (B-ALL), however many cases lack a known initiating genetic alteration. Using integrated genomic analysis of 1,988 childhood and adult cases, we describe a revised taxonomy of B-ALL incorporating 23 subtypes defined by chromosomal rearrangements, sequence mutations or heterogeneous genomic alterations, many of which show marked variation in prevalence according to age. Two subtypes have frequent alterations of the B lymphoid transcription-factor gene PAX5. One, PAX5alt (7.4%), has diverse PAX5 alterations (rearrangements, intragenic amplifications or mutations); a second subtype is defined by PAX5 p.Pro80Arg and biallelic PAX5 alterations. We show that p.Pro80Arg impairs B lymphoid development and promotes the development of B-ALL with biallelic Pax5 alteration in vivo. These results demonstrate the utility of transcriptome sequencing to classify B-ALL and reinforce the central role of PAX5 as a checkpoint in B lymphoid maturation and leukemogenesis.
TL;DR: A broad comparative study of 81 genomes of parasitic and non-parasitic worms identifies gene family births and hundreds of expanded gene families at key nodes in the phylogeny that are relevant to parasitism and proteins historically targeted for drug development.
Abstract: Parasitic nematodes (roundworms) and platyhelminths (flatworms) cause debilitating chronic infections of humans and animals, decimate crop production and are a major impediment to socioeconomic development. Here we report the broadest comparative study to date of the genomes of parasitic and non-parasitic worms, involving 81. We have identified gene family births and hundreds of expanded gene families at key nodes in the phylogeny that are relevant to parasitism. Examples include gene families that modulate host immune responses, enable parasite migration though host tissues or allow the parasite to feed. We reveal extensive lineage-specific differences in core metabolism and protein families historically targeted for drug development. From an in silico screen, we have identified and prioritised new potential drug targets and compounds for testing. This comparative genomics resource provides a much needed boost for the research community to understand and combat parasitic worms.
TL;DR: An MLM-based tool (fastGWA) is developed that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data.
Abstract: The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB.