TL;DR: High-density single nucleotide polymorphism genotyping microarrays are used to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination.
Abstract: We use high-density single nucleotide polymorphism (SNP) genotyping microarrays to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture. We first develop a theoretical framework for detecting an individual's presence within a mixture, then show, through simulations, the limits associated with our method, and finally demonstrate experimentally the identification of the presence of genomic DNA of specific individuals within a series of highly complex genomic mixtures, including mixtures where an individual contributes less than 0.1% of the total genomic DNA. These findings shift the perceived utility of SNPs for identifying individual trace contributors within a forensics mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination. These findings also suggest that composite statistics across cohorts, such as allele frequency or genotype counts, do not mask identity within genome-wide association studies. The implications of these findings are discussed.
TL;DR: It is proposed that large CpG islands depleted of activating motifs confer epigenetic memory by recruiting the full repertoire of Polycomb complexes in pluripotent cells.
Abstract: In embryonic stem (ES) cells, bivalent chromatin domains with overlapping repressive (H3 lysine 27 tri-methylation) and activating (H3 lysine 4 tri-methylation) histone modifications mark the promoters of more than 2,000 genes. To gain insight into the structure and function of bivalent domains, we mapped key histone modifications and subunits of Polycomb-repressive complexes 1 and 2 (PRC1 and PRC2) genomewide in human and mouse ES cells by chromatin immunoprecipitation, followed by ultra high-throughput sequencing. We find that bivalent domains can be segregated into two classes—the first occupied by both PRC2 and PRC1 (PRC1-positive) and the second specifically bound by PRC2 (PRC2-only). PRC1-positive bivalent domains appear functionally distinct as they more efficiently retain lysine 27 tri-methylation upon differentiation, show stringent conservation of chromatin state, and associate with an overwhelming number of developmental regulator gene promoters. We also used computational genomics to search for sequence determinants of Polycomb binding. This analysis revealed that the genomewide locations of PRC2 and PRC1 can be largely predicted from the locations, sizes, and underlying motif contents of CpG islands. We propose that large CpG islands depleted of activating motifs confer epigenetic memory by recruiting the full repertoire of Polycomb complexes in pluripotent cells.
TL;DR: This technique allows the cost-effective exploration of changes in microbial community structure, including the rare biosphere, over space and time and can be applied immediately to initiatives, such as the Human Microbiome Project.
Abstract: Massively parallel pyrosequencing of hypervariable regions from small subunit ribosomal RNA (SSU rRNA) genes can sample a microbial community two or three orders of magnitude more deeply per dollar and per hour than capillary sequencing of full-length SSU rRNA. As with full-length rRNA surveys, each sequence read is a tag surrogate for a single microbe. However, rather than assigning taxonomy by creating gene trees de novo that include all experimental sequences and certain reference taxa, we compare the hypervariable region tags to an extensive database of rRNA sequences and assign taxonomy based on the best match in a Global Alignment for Sequence Taxonomy (GAST) process. The resulting taxonomic census provides information on both composition and diversity of the microbial community. To determine the effectiveness of using only hypervariable region tags for assessing microbial community membership, we compared the taxonomy assigned to the V3 and V6 hypervariable regions with the taxonomy assigned to full-length SSU rRNA sequences isolated from both the human gut and a deep-sea hydrothermal vent. The hypervariable region tags and full-length rRNA sequences provided equivalent taxonomy and measures of relative abundance of microbial communities, even for tags up to 15% divergent from their nearest reference match. The greater sampling depth per dollar afforded by massively parallel pyrosequencing reveals many more members of the “rare biosphere” than does capillary sequencing of the full-length gene. In addition, tag sequencing eliminates cloning bias and the sequences are short enough to be completely sequenced in a single read, maximizing the number of organisms sampled in a run while minimizing chimera formation. This technique allows the cost-effective exploration of changes in microbial community structure, including the rare biosphere, over space and time and can be applied immediately to initiatives, such as the Human Microbiome Project.
TL;DR: The expected dN/dS ratio for samples drawn from a single population under selection is studied, and it is found that in this context, dN-dS is relatively insensitive to the selection coefficient.
Abstract: Evolutionary pressures on proteins are often quantified by the ratio of substitution rates at non-synonymous and synonymous sites. The dN/dS ratio was originally developed for application to distantly diverged sequences, the differences among which represent substitutions that have fixed along independent lineages. Nevertheless, the dN/dS measure is often applied to sequences sampled from a single population, the differences among which represent segregating polymorphisms. Here, we study the expected dN/dS ratio for samples drawn from a single population under selection, and we find that in this context, dN/dS is relatively insensitive to the selection coefficient. Moreover, the hallmark signature of positive selection over divergent lineages, dN/dS>1, is violated within a population. For population samples, the relationship between selection and dN/dS does not follow a monotonic function, and so it may be impossible to infer selection pressures from dN/dS. These results have significant implications for the interpretation of dN/dS measurements among population-genetic samples.
TL;DR: It is suggested that alt-NHEJ is a mechanistically distinct pathway of DSB repair, and thus may play a unique role in mutagenesis during cancer development and therapy.
Abstract: Characterizing the functional overlap and mutagenic potential of different pathways of chromosomal double-strand break (DSB) repair is important to understand how mutations arise during cancer development and treatment. To this end, we have compared the role of individual factors in three different pathways of mammalian DSB repair: alternative-nonhomologous end joining (alt-NHEJ), single-strand annealing (SSA), and homology directed repair (HDR/GC). Considering early steps of repair, we found that the DSB end-processing factors KU and CtIP affect all three pathways similarly, in that repair is suppressed by KU and promoted by CtIP. In contrast, both KU and CtIP appear dispensable for the absolute level of total-NHEJ between two tandem I-SceI–induced DSBs. During later steps of repair, we find that while the annealing and processing factors RAD52 and ERCC1 are important to promote SSA, both HDR/GC and alt-NHEJ are significantly less dependent upon these factors. As well, while disruption of RAD51 causes a decrease in HDR/GC and an increase in SSA, inhibition of this factor did not affect alt-NHEJ. These results suggest that the regulation of DSB end-processing via KU/CtIP is a common step during alt-NHEJ, SSA, and HDR/GC. However, at later steps of repair, alt-NHEJ is a mechanistically distinct pathway of DSB repair, and thus may play a unique role in mutagenesis during cancer development and therapy.
TL;DR: The results suggest that common genetic polymorphisms induce major differentiations in the metabolic make-up of the human population, which may lead to a novel approach to personalized health care based on a combination of genotyping and metabolic characterization.
Abstract: The rapidly evolving field of metabolomics aims at a comprehensive measurement of ideally all endogenous metabolites in a cell or body fluid. It thereby provides a functional readout of the physiological state of the human body. Genetic variants that associate with changes in the homeostasis of key lipids, carbohydrates, or amino acids are not only expected to display much larger effect sizes due to their direct involvement in metabolite conversion modification, but should also provide access to the biochemical context of such variations, in particular when enzyme coding genes are concerned. To test this hypothesis, we conducted what is, to the best of our knowledge, the first GWA study with metabolomics based on the quantitative measurement of 363 metabolites in serum of 284 male participants of the KORA study. We found associations of frequent single nucleotide polymorphisms (SNPs) with considerable differences in the metabolic homeostasis of the human body, explaining up to 12% of the observed variance. Using ratios of certain metabolite concentrations as a proxy for enzymatic activity, up to 28% of the variance can be explained (p-values 10(-16) to 10(-21)). We identified four genetic variants in genes coding for enzymes (FADS1, LIPC, SCAD, MCAD) where the corresponding metabolic phenotype (metabotype) clearly matches the biochemical pathways in which these enzymes are active. Our results suggest that common genetic polymorphisms induce major differentiations in the metabolic make-up of the human population. This may lead to a novel approach to personalized health care based on a combination of genotyping and metabolic characterization. These genetically determined metabotypes may subscribe the risk for a certain medical phenotype, the response to a given drug treatment, or the reaction to a nutritional intervention or environmental challenge.
TL;DR: The findings suggest that autophagy is not sufficient for lifespan extension because although it provides raw material for new macromolecular synthesis, DAF-16/FOXO must program the cells to recycle this raw material into cell-protective longevity proteins.
Abstract: In many organisms, dietary restriction appears to extend lifespan, at least in part, by down-regulating the nutrient-sensor TOR (Target Of Rapamycin). TOR inhibition elicits autophagy, the large-scale recycling of cytoplasmic macromolecules and organelles. In this study, we asked whether autophagy might contribute to the lifespan extension induced by dietary restriction in C. elegans. We find that dietary restriction and TOR inhibition produce an autophagic phenotype and that inhibiting genes required for autophagy prevents dietary restriction and TOR inhibition from extending lifespan. The longevity response to dietary restriction in C. elegans requires the PHA-4 transcription factor. We find that the autophagic response to dietary restriction also requires PHA-4 activity, indicating that autophagy is a transcriptionally regulated response to food limitation. In spite of the rejuvenating effect that autophagy is predicted to have on cells, our findings suggest that autophagy is not sufficient to extend lifespan. Long-lived daf-2 insulin/IGF-1 receptor mutants require both autophagy and the transcription factor DAF-16/FOXO for their longevity, but we find that autophagy takes place in the absence of DAF-16. Perhaps autophagy is not sufficient for lifespan extension because although it provides raw material for new macromolecular synthesis, DAF-16/FOXO must program the cells to recycle this raw material into cell-protective longevity proteins.
TL;DR: The analysis predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.
Abstract: Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s| 1%). Our results are consistent with 10–20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.
TL;DR: A genome-wide association study was performed to identify genetic factors involved in susceptibility to psoriasis and psoriatic arthritis, inflammatory diseases of the skin and joints in humans, and identified a novel PSA (and potentially PS) locus on chromosome 4q27.
Abstract: A genome-wide association study was performed to identify genetic factors involved in susceptibility to psoriasis (PS) and psoriatic arthritis (PSA), inflammatory diseases of the skin and joints in humans. 223 PS cases (including 91 with PSA) were genotyped with 311,398 single nucleotide polymorphisms (SNPs), and results were compared with those from 519 Northern European controls. Replications were performed with an independent cohort of 577 PS cases and 737 controls from the U.S., and 576 PSA patients and 480 controls from the U.K.. Strongest associations were with the class I region of the major histocompatibility complex (MHC). The most highly associated SNP was rs10484554, which lies 34.7 kb upstream from HLA-C (P = 7.8x10(-11), GWA scan; P = 1.8x10(-30), replication; P = 1.8x10(-39), combined; U.K. PSA: P = 6.9x10(-11)). However, rs2395029 encoding the G2V polymorphism within the class I gene HCP5 (combined P = 2.13x10(-26) in U.S. cases) yielded the highest ORs with both PS and PSA (4.1 and 3.2 respectively). This variant is associated with low viral set point following HIV infection and its effect is independent of rs10484554. We replicated the previously reported association with interleukin 23 receptor and interleukin 12B (IL12B) polymorphisms in PS and PSA cohorts (IL23R: rs11209026, U.S. PS, P = 1.4x10(-4); U.K. PSA: P = 8.0x10(-4); IL12B:rs6887695, U.S. PS, P = 5x10(-5) and U.K. PSA, P = 1.3x10(-3)) and detected an independent association in the IL23R region with a SNP 4 kb upstream from IL12RB2 (P = 0.001). Novel associations replicated in the U.S. PS cohort included the region harboring lipoma HMGIC fusion partner (LHFP) and conserved oligomeric golgi complex component 6 (COG6) genes on chromosome 13q13 (combined P = 2x10(-6) for rs7993214; OR = 0.71), the late cornified envelope gene cluster (LCE) from the Epidermal Differentiation Complex (PSORS4) (combined P = 6.2x10(-5) for rs6701216; OR 1.45) and a region of LD at 15q21 (combined P = 2.9x10(-5) for rs3803369; OR = 1.43). This region is of interest because it harbors ubiquitin-specific protease-8 whose processed pseudogene lies upstream from HLA-C. This region of 15q21 also harbors the gene for SPPL2A (signal peptide peptidase like 2a) which activates tumor necrosis factor alpha by cleavage, triggering the expression of IL12 in human dendritic cells. We also identified a novel PSA (and potentially PS) locus on chromosome 4q27. This region harbors the interleukin 2 (IL2) and interleukin 21 (IL21) genes and was recently shown to be associated with four autoimmune diseases (Celiac disease, Type 1 diabetes, Grave's disease and Rheumatoid Arthritis).
TL;DR: A number of interesting commonalities and differences across diseases that implicate both general and disease-specific pathogenetic mechanisms in autoimmunity are found.
Abstract: The major histocompatibility complex (MHC) is one of the most extensively studied regions in the human genome because of the association of variants at this locus with autoimmune, infectious, and inflammatory diseases. However, identification of causal variants within the MHC for the majority of these diseases has remained difficult due to the great variability and extensive linkage disequilibrium (LD) that exists among alleles throughout this locus, coupled with inadequate study design whereby only a limited subset of about 20 from a total of approximately 250 genes have been studied in small cohorts of predominantly European origin. We have performed a review and pooled analysis of the past 30 years of research on the role of the MHC in six genetically complex disease traits - multiple sclerosis (MS), type 1 diabetes (T1D), systemic lupus erythematosus (SLE), ulcerative colitis (UC), Crohn's disease (CD), and rheumatoid arthritis (RA) - in order to consolidate and evaluate the current literature regarding MHC genetics in these common autoimmune and inflammatory diseases. We corroborate established MHC disease associations and identify predisposing variants that previously have not been appreciated. Furthermore, we find a number of interesting commonalities and differences across diseases that implicate both general and disease-specific patho- genetic mechanisms in autoimmunity.
TL;DR: This paper introduces a novel algorithm to order markers on a genetic linkage map based on a simple yet fundamental mathematical property that proves the validity of this property, and shows that it consistently outperforms the best available methods in the literature.
Abstract: Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications, including map-assisted breeding, association genetics, and map-assisted gene cloning. During the past several years, the adoption of high-throughput genotyping technologies has been paralleled by a substantial increase in the density and diversity of genetic markers. New genetic mapping algorithms are needed in order to efficiently process these large datasets and accurately construct high-density genetic maps. In this paper, we introduce a novel algorithm to order markers on a genetic linkage map. Our method is based on a simple yet fundamental mathematical property that we prove under rather general assumptions. The validity of this property allows one to determine efficiently the correct order of markers by computing the minimum spanning tree of an associated graph. Our empirical studies obtained on genotyping data for three mapping populations of barley (Hordeum vulgare), as well as extensive simulations on synthetic data, show that our algorithm consistently outperforms the best available methods in the literature, particularly when the input data are noisy or incomplete. The software implementing our algorithm is available in the public domain as a web tool under the name MSTmap.
TL;DR: A hypothesis-driven network discovery pipeline that identifies biologically relevant patterns in genome-scale data is described that identifies at least three distinct transcription modules controlling phase-specific expression, including a new midnight specific module, PBX/TBX/SBX.
Abstract: Correct daily phasing of transcription confers an adaptive advantage to almost all organisms, including higher plants In this study, we describe a hypothesis-driven network discovery pipeline that identifies biologically relevant patterns in genome-scale data To demonstrate its utility, we analyzed a comprehensive matrix of time courses interrogating the nuclear transcriptome of Arabidopsis thaliana plants grown under different thermocycles, photocycles, and circadian conditions We show that 89% of Arabidopsis transcripts cycle in at least one condition and that most genes have peak expression at a particular time of day, which shifts depending on the environment Thermocycles alone can drive at least half of all transcripts critical for synchronizing internal processes such as cell cycle and protein synthesis We identified at least three distinct transcription modules controlling phase-specific expression, including a new midnight specific module, PBX/TBX/SBX We validated the network discovery pipeline, as well as the midnight specific module, by demonstrating that the PBX element was sufficient to drive diurnal and circadian condition-dependent expression Moreover, we show that the three transcription modules are conserved across Arabidopsis, poplar, and rice These results confirm the complex interplay between thermocycles, photocycles, and the circadian clock on the daily transcription program, and provide a comprehensive view of the conserved genomic targets for a transcriptional network key to successful adaptation
TL;DR: The genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and As pergillus clavatus NRRL1 are presented.
Abstract: We present the genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of A1163 with the recently sequenced A. fumigatus isolate Af293 has identified core, variable and up to 2% unique genes in each genome. While the core genes are 99.8% identical at the nucleotide level, identity for variable genes can be as low 40%. The most divergent loci appear to contain heterokaryon incompatibility (het) genes associated with fungal programmed cell death such as developmental regulator rosA. Cross-species comparison has revealed that 8.5%, 13.5% and 12.6%, respectively, of A. fumigatus, N. fischeri and A. clavatus genes are species-specific. These genes are significantly smaller in size than core genes, contain fewer exons and exhibit a subtelomeric bias. Most of them cluster together in 13 chromosomal islands, which are enriched for pseudogenes, transposons and other repetitive elements. At least 20% of A. fumigatus-specific genes appear to be functional and involved in carbohydrate and chitin catabolism, transport, detoxification, secondary metabolism and other functions that may facilitate the adaptation to heterogeneous environments such as soil or a mammalian host. Contrary to what was suggested previously, their origin cannot be attributed to horizontal gene transfer (HGT), but instead is likely to involve duplication, diversification and differential gene loss (DDL). The role of duplication in the origin of lineage-specific genes is further underlined by the discovery of genomic islands that seem to function as designated “gene dumps” and, perhaps, simultaneously, as “gene factories”.
TL;DR: It is shown that common genetic variation influences levels of clinically relevant proteins in human serum and plasma and the identification of protein quantitative trait loci (pQTLs) may be a powerful complementary method of improving the understanding of disease pathways.
Abstract: There is considerable evidence that human genetic variation influences gene expression. Genome-wide studies have revealed that mRNA levels are associated with genetic variation in or close to the gene coding for those mRNA transcripts – cis effects, and elsewhere in the genome – trans effects. The role of genetic variation in determining protein levels has not been systematically assessed. Using a genome-wide association approach we show that common genetic variation influences levels of clinically relevant proteins in human serum and plasma. We evaluated the role of 496,032 polymorphisms on levels of 42 proteins measured in 1200 fasting individuals from the population based InCHIANTI study. Proteins included insulin, several interleukins, adipokines, chemokines, and liver function markers that are implicated in many common diseases including metabolic, inflammatory, and infectious conditions. We identified eight Cis effects, including variants in or near the IL6R (p = 1.8×10−57), CCL4L1 (p = 3.9×10−21), IL18 (p = 6.8×10−13), LPA (p = 4.4×10−10), GGT1 (p = 1.5×10−7), SHBG (p = 3.1×10−7), CRP (p = 6.4×10−6) and IL1RN (p = 7.3×10−6) genes, all associated with their respective protein products with effect sizes ranging from 0.19 to 0.69 standard deviations per allele. Mechanisms implicated include altered rates of cleavage of bound to unbound soluble receptor (IL6R), altered secretion rates of different sized proteins (LPA), variation in gene copy number (CCL4L1) and altered transcription (GGT1). We identified one novel trans effect that was an association between ABO blood group and tumour necrosis factor alpha (TNF-alpha) levels (p = 6.8×10−40), but this finding was not present when TNF-alpha was measured using a different assay , or in a second study, suggesting an assay-specific association. Our results show that protein levels share some of the features of the genetics of gene expression. These include the presence of strong genetic effects in cis locations. The identification of protein quantitative trait loci (pQTLs) may be a powerful complementary method of improving our understanding of disease pathways.
TL;DR: It is demonstrated that the serine/threonine kinase Rim15 is required for yeast chronological life span extension caused by deficiencies in Ras2, Tor1, and Sch9, and by calorie restriction, while the anti-aging effect caused by the inactivation of both pathways is much more potent than that caused by CR.
Abstract: Calorie restriction (CR), the only non-genetic intervention known to slow aging and extend life span in organisms ranging from yeast to mice, has been linked to the down-regulation of Tor, Akt, and Ras signaling. In this study, we demonstrate that the serine/threonine kinase Rim15 is required for yeast chronological life span extension caused by deficiencies in Ras2, Tor1, and Sch9, and by calorie restriction. Deletion of stress resistance transcription factors Gis1 and Msn2/4, which are positively regulated by Rim15, also caused a major although not complete reversion of the effect of calorie restriction on life span. The deletion of both RAS2 and the Akt and S6 kinase homolog SCH9 in combination with calorie restriction caused a remarkable 10-fold life span extension, which, surprisingly, was only partially reversed by the lack of Rim15. These results indicate that the Ras/cAMP/PKA/Rim15/Msn2/4 and the Tor/Sch9/Rim15/Gis1 pathways are major mediators of the calorie restriction-dependent stress resistance and life span extension, although additional mediators are involved. Notably, the anti-aging effect caused by the inactivation of both pathways is much more potent than that caused by CR.
TL;DR: The identification and functional analyses of two novel and one known mutation in TARDBP that are identified as a result of extensive mutation analyses in a cohort of 296 patients with variable neurodegenerative diseases associated with TDP-43 histopathology support TARD BP mutations as a cause of ALS.
Abstract: The TAR DNA-binding protein 43 (TDP-43) has been identified as the major disease protein in amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration with ubiquitin inclusions (FTLD-U), defining a novel class of neurodegenerative conditions: the TDP-43 proteinopathies. The first pathogenic mutations in the gene encoding TDP-43 (TARDBP) were recently reported in familial and sporadic ALS patients, supporting a direct role for TDP-43 in neurodegeneration. In this study, we report the identification and functional analyses of two novel and one known mutation in TARDBP that we identified as a result of extensive mutation analyses in a cohort of 296 patients with variable neurodegenerative diseases associated with TDP-43 histopathology. Three different heterozygous missense mutations in exon 6 of TARDBP (p.M337V, p.N345K, and p.I383V) were identified in the analysis of 92 familial ALS patients (3.3%), while no mutations were detected in 24 patients with sporadic ALS or 180 patients with other TDP-43-positive neurodegenerative diseases. The presence of p.M337V, p.N345K, and p.I383V was excluded in 825 controls and 652 additional sporadic ALS patients. All three mutations affect highly conserved amino acid residues in the C-terminal part of TDP-43 known to be involved in protein-protein interactions. Biochemical analysis of TDP-43 in ALS patient cell lines revealed a substantial increase in caspase cleaved fragments, including the approximately 25 kDa fragment, compared to control cell lines. Our findings support TARDBP mutations as a cause of ALS. Based on the specific C-terminal location of the mutations and the accumulation of a smaller C-terminal fragment, we speculate that TARDBP mutations may cause a toxic gain of function through novel protein interactions or intracellular accumulation of TDP-43 fragments leading to apoptosis.
TL;DR: The novel population genetics approach reveals that the vast majority (97%) of sporadic disease can be attributed to animals farmed for meat and poultry, whereas wild animal and environmental sources are responsible for just 3% of disease.
Abstract: Campylobacter jejuni is the leading cause of bacterial gastro-enteritis in the developed world. It is thought to infect 2–3 million people a year in the US alone, at a cost to the economy in excess of US $4 billion. C. jejuni is a widespread zoonotic pathogen that is carried by animals farmed for meat and poultry. A connection with contaminated food is recognized, but C. jejuni is also commonly found in wild animals and water sources. Phylogenetic studies have suggested that genotypes pathogenic to humans bear greatest resemblance to non-livestock isolates. Moreover, seasonal variation in campylobacteriosis bears the hallmarks of water-borne disease, and certain outbreaks have been attributed to contamination of drinking water. As a result, the relative importance of these reservoirs to human disease is controversial. We use multilocus sequence typing to genotype 1,231 cases of C. jejuni isolated from patients in Lancashire, England. By modeling the DNA sequence evolution and zoonotic transmission of C. jejuni between host species and the environment, we assign human cases probabilistically to source populations. Our novel population genetics approach reveals that the vast majority (97%) of sporadic disease can be attributed to animals farmed for meat and poultry. Chicken and cattle are the principal sources of C. jejuni pathogenic to humans, whereas wild animal and environmental sources are responsible for just 3% of disease. Our results imply that the primary transmission route is through the food chain, and suggest that incidence could be dramatically reduced by enhanced on-farm biosecurity or preventing food-borne transmission.
TL;DR: The processes and genes identified here present a framework for further study of the disease mechanism and provide candidate susceptibility genes and drug targets for Parkinson's disease and other α-synuclein related disorders.
Abstract: Inclusions in the brain containing alpha-synuclein are the pathological hallmark of Parkinson's disease, but how these inclusions are formed and how this links to disease is poorly understood. We have developed a C. elegans model that makes it possible to monitor, in living animals, the formation of alpha-synuclein inclusions. In worms of old age, inclusions contain aggregated alpha- synuclein, resembling a critical pathological feature. We used genome-wide RNA interference to identify processes involved in inclusion formation, and identified 80 genes that, when knocked down, resulted in a premature increase in the number of inclusions. Quality control and vesicle-trafficking genes expressed in the ER/Golgi complex and vesicular compartments were overrepresented, indicating a specific role for these processes in alpha-synuclein inclusion formation. Suppressors include aging-associated genes, such as sir-2.1/SIRT1 and lagr-1/LASS2. Altogether, our data suggest a link between alpha-synuclein inclusion formation and cellular aging, likely through an endomembrane-related mechanism. The processes and genes identified here present a framework for further study of the disease mechanism and provide candidate susceptibility genes and drug targets for Parkinson's disease and other alpha-synuclein related disorders.
TL;DR: The findings show that common genetic variants influence the pathological subtype of breast cancer and provide further support for the hypothesis that ER-positive and ER-negative disease are biologically distinct.
Abstract: A three-stage genome-wide association study recently identified single nucleotide polymorphisms (SNPs) in five loci (fibroblast growth receptor 2 (FGFR2), trinucleotide repeat containing 9 (TNRC9), mitogen-activated protein kinase 3 K1 (MAP3K1), 8q24, and lymphocyte-specific protein 1 (LSP1)) associated with breast cancer risk. We investigated whether the associations between these SNPs and breast cancer risk varied by clinically important tumor characteristics in up to 23,039 invasive breast cancer cases and 26,273 controls from 20 studies. We also evaluated their influence on overall survival in 13,527 cases from 13 studies. All participants were of European or Asian origin. rs2981582 in FGFR2 was more strongly related to ER-positive (per-allele OR (95%CI) = 1.31 (1.27-1.36)) than ER-negative (1.08 (1.03-1.14)) disease (P for heterogeneity = 10(-13)). This SNP was also more strongly related to PR-positive, low grade and node positive tumors (P = 10(-5), 10(-8), 0.013, respectively). The association for rs13281615 in 8q24 was stronger for ER-positive, PR-positive, and low grade tumors (P = 0.001, 0.011 and 10(-4), respectively). The differences in the associations between SNPs in FGFR2 and 8q24 and risk by ER and grade remained significant after permutation adjustment for multiple comparisons and after adjustment for other tumor characteristics. Three SNPs (rs2981582, rs3803662, and rs889312) showed weak but significant associations with ER-negative disease, the strongest association being for rs3803662 in TNRC9 (1.14 (1.09-1.21)). rs13281615 in 8q24 was associated with an improvement in survival after diagnosis (per-allele HR = 0.90 (0.83-0.97). The association was attenuated and non-significant after adjusting for known prognostic factors. Our findings show that common genetic variants influence the pathological subtype of breast cancer and provide further support for the hypothesis that ER-positive and ER-negative disease are biologically distinct. Understanding the etiologic heterogeneity of breast cancer may ultimately result in improvements in prevention, early detection, and treatment.
TL;DR: This work shows that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods and derived an explicit approximation for type-I error that avoids the need to use permutation procedures.
Abstract: Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.
TL;DR: Results indicate that Dot1L and H3K79 methylation play important roles in heterochromatin formation and in embryonic development.
Abstract: Dot1 is an evolutionarily conserved histone methyltransferase specific for lysine 79 of histone H3 (H3K79). In Saccharomyces cerevisiae, Dot1-mediated H3K79 methylation is associated with telomere silencing, meiotic checkpoint control, and DNA damage response. The biological function of H3K79 methylation in mammals, however, remains poorly understood. Using gene targeting, we generated mice deficient for Dot1L, the murine Dot1 homologue. Dot1L-deficient embryos show multiple developmental abnormalities, including growth impairment, angiogenesis defects in the yolk sac, and cardiac dilation, and die between 9.5 and 10.5 days post coitum. To gain insights into the cellular function of Dot1L, we derived embryonic stem (ES) cells from Dot1L mutant blastocysts. Dot1L-deficient ES cells show global loss of H3K79 methylation as well as reduced levels of heterochromatic marks (H3K9 di-methylation and H4K20 tri-methylation) at centromeres and telomeres. These changes are accompanied by aneuploidy, telomere elongation, and proliferation defects. Taken together, these results indicate that Dot1L and H3K79 methylation play important roles in heterochromatin formation and in embryonic development.
TL;DR: The data suggest that normal repair of a DNA break can occasionally cause heritable silencing of a CpG island–containing promoter by recruitment of proteins involved in silencing.
Abstract: Chronic exposure to inducers of DNA base oxidation and single and double strand breaks contribute to tumorigenesis. In addition to the genetic changes caused by this DNA damage, such tumors often contain epigenetically silenced genes with aberrant promoter region CpG island DNA hypermethylation. We herein explore the relationships between such DNA damage and epigenetic gene silencing using an experimental model in which we induce a defined double strand break in an exogenous promoter construct of the E-cadherin CpG island, which is frequently aberrantly DNA hypermethylated in epithelial cancers. Following the onset of repair of the break, we observe recruitment to the site of damage of key proteins involved in establishing and maintaining transcriptional repression, namely SIRT1, EZH2, DNMT1, and DNMT3B, and the appearance of the silencing histone modifications, hypoacetyl H4K16, H3K9me2 and me3, and H3K27me3. Although in most cells selected after the break, DNA repair occurs faithfully with preservation of activity of the promoter, a small percentage of the plated cells demonstrate induction of heritable silencing. The chromatin around the break site in such a silent clone is enriched for most of the above silent chromatin proteins and histone marks, and the region harbors the appearance of increasing DNA methylation in the CpG island of the promoter. During the acute break, SIRT1 appears to be required for the transient recruitment of DNMT3B and subsequent methylation of the promoter in the silent clones. Taken together, our data suggest that normal repair of a DNA break can occasionally cause heritable silencing of a CpG island–containing promoter by recruitment of proteins involved in silencing. Furthermore, with contribution of the stress-related protein SIRT1, the break can lead to the onset of aberrant CpG island DNA methylation, which is frequently associated with tight gene silencing in cancer.
TL;DR: This is the first study to demonstrate the capacity of a P450 identified in wild A. gambiae to metabolise insecticides, and the findings add to the understanding of the genetic basis of insecticide resistance in wild mosquito populations.
Abstract: Insects exposed to pesticides undergo strong natural selection and have developed various adaptive mechanisms to survive. Resistance to pyrethroid insecticides in the malaria vector Anopheles gambiae is receiving increasing attention because it threatens the sustainability of malaria vector control programs in sub-Saharan Africa. An understanding of the molecular mechanisms conferring pyrethroid resistance gives insight into the processes of evolution of adaptive traits and facilitates the development of simple monitoring tools and novel strategies to restore the efficacy of insecticides. For this purpose, it is essential to understand which mechanisms are important in wild mosquitoes. Here, our aim was to identify enzymes that may be important in metabolic resistance to pyrethroids by measuring gene expression for over 250 genes potentially involved in metabolic resistance in phenotyped individuals from a highly resistant, wild A. gambiae population from Ghana. A cytochrome P450, CYP6P3, was significantly overexpressed in the survivors, and we show that the translated enzyme metabolises both alpha-cyano and non–alpha-cyano pyrethroids. This is the first study to demonstrate the capacity of a P450 identified in wild A. gambiae to metabolise insecticides. The findings add to the understanding of the genetic basis of insecticide resistance in wild mosquito populations.
TL;DR: The data presented here suggest that in human cells, bidirectional transcription is an endogenous gene regulatory mechanism whereby an antisense RNA directs epigenetic regulatory complexes to a sense promoter, resulting in RNA-directed epigenetic gene regulation.
Abstract: Small RNAs targeted to gene promoters in human cells have been shown to modulate both transcriptional gene suppression and activation. However, the mechanism involved in transcriptional activation has remained poorly defined, and an endogenous RNA trigger for transcriptional gene silencing has yet to be identified. Described here is an explanation for siRNA-directed transcriptional gene activation, as well as a role for non-coding antisense RNAs as effector molecules driving transcriptional gene silencing. Transcriptional activation of p21 gene expression was determined to be the result of Argonaute 2–dependent, post-transcriptional silencing of a p21-specific antisense transcript, which functions in Argonaute 1–mediated transcriptional control of p21 mRNA expression. The data presented here suggest that in human cells, bidirectional transcription is an endogenous gene regulatory mechanism whereby an antisense RNA directs epigenetic regulatory complexes to a sense promoter, resulting in RNA-directed epigenetic gene regulation. The observations presented here support the notion that epigenetic silencing of tumor suppressor genes, such as p21, may be the result of an imbalance in bidirectional transcription levels. This imbalance allows the unchecked antisense RNA to direct silent state epigenetic marks to the sense promoter, resulting in stable transcriptional gene silencing.
TL;DR: The analysis indicates the ancestors of Polynesians moved through Melanesia relatively rapidly and only intermixed to a very modest degree with the indigenous populations there, contributing to a resolution to the debates over Polynesian origins and their past interactions with Melanesians.
Abstract: Human genetic diversity in the Pacific has not been adequately sampled, particularly in Melanesia. As a result, population relationships there have been open to debate. A genome scan of autosomal markers (687 microsatellites and 203 insertions/deletions) on 952 individuals from 41 Pacific populations now provides the basis for understanding the remarkable nature of Melanesian variation, and for a more accurate comparison of these Pacific populations with previously studied groups from other regions. It also shows how textured human population variation can be in particular circumstances. Genetic diversity within individual Pacific populations is shown to be very low, while differentiation among Melanesian groups is high. Melanesian differentiation varies not only between islands, but also by island size and topographical complexity. The greatest distinctions are among the isolated groups in large island interiors, which are also the most internally homogeneous. The pattern loosely tracks language distinctions. Papuan-speaking groups are the most differentiated, and Austronesian or Oceanic-speaking groups, which tend to live along the coastlines, are more intermixed. A small “Austronesian” genetic signature (always <20%) was detected in less than half the Melanesian groups that speak Austronesian languages, and is entirely lacking in Papuan-speaking groups. Although the Polynesians are also distinctive, they tend to cluster with Micronesians, Taiwan Aborigines, and East Asians, and not Melanesians. These findings contribute to a resolution to the debates over Polynesian origins and their past interactions with Melanesians. With regard to genetics, the earlier studies had heavily relied on the evidence from single locus mitochondrial DNA or Y chromosome variation. Neither of these provided an unequivocal signal of phylogenetic relations or population intermixture proportions in the Pacific. Our analysis indicates the ancestors of Polynesians moved through Melanesia relatively rapidly and only intermixed to a very modest degree with the indigenous populations there.
TL;DR: This is the first glimpse of an individual's exome and a snapshot of the current state of personalized genomics, and presents an approach to analyze the coding variation in humans by proposing multiple bioinformatic methods to hone in on possible functional variation.
Abstract: There is much interest in characterizing the variation in a human individual, because this may elucidate what contributes significantly to a person's phenotype, thereby enabling personalized genomics. We focus here on the variants in a person's ‘exome,’ which is the set of exons in a genome, because the exome is believed to harbor much of the functional variation. We provide an analysis of the ∼12,500 variants that affect the protein coding portion of an individual's genome. We identified ∼10,400 nonsynonymous single nucleotide polymorphisms (nsSNPs) in this individual, of which ∼15–20% are rare in the human population. We predict ∼1,500 nsSNPs affect protein function and these tend be heterozygous, rare, or novel. Of the ∼700 coding indels, approximately half tend to have lengths that are a multiple of three, which causes insertions/deletions of amino acids in the corresponding protein, rather than introducing frameshifts. Coding indels also occur frequently at the termini of genes, so even if an indel causes a frameshift, an alternative start or stop site in the gene can still be used to make a functional protein. In summary, we reduced the set of ∼12,500 nonsilent coding variants by ∼8-fold to a set of variants that are most likely to have major effects on their proteins' functions. This is our first glimpse of an individual's exome and a snapshot of the current state of personalized genomics. The majority of coding variants in this individual are common and appear to be functionally neutral. Our results also indicate that some variants can be used to improve the current NCBI human reference genome. As more genomes are sequenced, many rare variants and non-SNP variants will be discovered. We present an approach to analyze the coding variation in humans by proposing multiple bioinformatic methods to hone in on possible functional variation.
TL;DR: The sequencing and analysis of the genome of the nitrogen-fixing endophyte, Klebsiella pneumoniae 342, will drive new research into this less-understood, but important category of bacterial-plant host relationships, which could ultimately enhance growth and nutrition of important agricultural crops and development of plant-derived products and biofuels.
Abstract: We report here the sequencing and analysis of the genome of the nitrogen-fixing endophyte, Klebsiella pneumoniae 342. Although K. pneumoniae 342 is a member of the enteric bacteria, it serves as a model for studies of endophytic, plant-bacterial associations due to its efficient colonization of plant tissues (including maize and wheat, two of the most important crops in the world), while maintaining a mutualistic relationship that encompasses supplying organic nitrogen to the host plant. Genomic analysis examined K. pneumoniae 342 for the presence of previously identified genes from other bacteria involved in colonization of, or growth in, plants. From this set, approximately one-third were identified in K. pneumoniae 342, suggesting additional factors most likely contribute to its endophytic lifestyle. Comparative genome analyses were used to provide new insights into this question. Results included the identification of metabolic pathways and other features devoted to processing plant-derived cellulosic and aromatic compounds, and a robust complement of transport genes (15.4%), one of the highest percentages in bacterial genomes sequenced. Although virulence and antibiotic resistance genes were predicted, experiments conducted using mouse models showed pathogenicity to be attenuated in this strain. Comparative genomic analyses with the presumed human pathogen K. pneumoniae MGH78578 revealed that MGH78578 apparently cannot fix nitrogen, and the distribution of genes essential to surface attachment, secretion, transport, and regulation and signaling varied between each genome, which may indicate critical divergences between the strains that influence their preferred host ranges and lifestyles (endophytic plant associations for K. pneumoniae 342 and presumably human pathogenesis for MGH78578). Little genome information is available concerning endophytic bacteria. The K. pneumoniae 342 genome will drive new research into this less-understood, but important category of bacterial-plant host relationships, which could ultimately enhance growth and nutrition of important agricultural crops and development of plant-derived products and biofuels.
TL;DR: The data provide the first evidence that ERβ-deficiency protects against diet-induced IR and glucose intolerance which involves an augmented PPARγ signaling in adipose tissue and suggest that the coactivators SRC1 and TIF2 are involved in this interaction.
Abstract: Estrogen receptors (ER) are important regulators of metabolic diseases such as obesity and insulin resistance (IR). While ERα seems to have a protective role in such diseases, the function of ERβ is not clear. To characterize the metabolic function of ERβ, we investigated its molecular interaction with a master regulator of insulin signaling/glucose metabolism, the PPARγ, in vitro and in high-fat diet (HFD)-fed ERβ -/- mice (βERKO) mice. Our in vitro experiments showed that ERβ inhibits ligand-mediated PPARγ-transcriptional activity. That resulted in a blockade of PPARγ-induced adipocytic gene expression and in decreased adipogenesis. Overexpression of nuclear coactivators such as SRC1 and TIF2 prevented the ERβ-mediated inhibition of PPARγ activity. Consistent with the in vitro data, we observed increased PPARγ activity in gonadal fat from HFD-fed βERKO mice. In consonance with enhanced PPARγ activation, HFD-fed βERKO mice showed increased body weight gain and fat mass in the presence of improved insulin sensitivity. To directly demonstrate the role of PPARγ in HFD-fed βERKO mice, PPARγ signaling was disrupted by PPARγ antisense oligonucleotide (ASO). Blockade of adipose PPARγ by ASO reversed the phenotype of βERKO mice with an impairment of insulin sensitization and glucose tolerance. Finally, binding of SRC1 and TIF2 to the PPARγ-regulated adiponectin promoter was enhanced in gonadal fat from βERKO mice indicating that the absence of ERβ in adipose tissue results in exaggerated coactivator binding to a PPARγ target promoter. Collectively, our data provide the first evidence that ERβ-deficiency protects against diet-induced IR and glucose intolerance which involves an augmented PPARγ signaling in adipose tissue. Moreover, our data suggest that the coactivators SRC1 and TIF2 are involved in this interaction. Impairment of insulin and glucose metabolism by ERβ may have significant implications for our understanding of hormone receptor-dependent pathophysiology of metabolic diseases, and may be essential for the development of new ERβ-selective agonists.
TL;DR: The phenotypic and molecular characterization of a set of mutants showing loss of adult structures of the dermal skeleton of the zebrafish, such as the rays of the fins and the scales, as well as the pharyngeal teeth are described.
Abstract: The genetic basis of the development and variation of adult form of vertebrates is not well understood. To address this problem, we performed a mutant screen to identify genes essential for the formation of adult skeletal structures of the zebrafish. Here, we describe the phenotypic and molecular characterization of a set of mutants showing loss of adult structures of the dermal skeleton, such as the rays of the fins and the scales, as well as the pharyngeal teeth. The mutations represent adult-viable, loss of function alleles in the ectodysplasin (eda) and ectodysplasin receptor (edar) genes. These genes are frequently mutated in the human hereditary disease hypohidrotic ectodermal dysplasia (HED; OMIM 224900, 305100) that affects the development of integumentary appendages such as hair and teeth. We find mutations in zebrafish edar that affect similar residues as mutated in human cases of HED and show similar phenotypic consequences. eda and edar are not required for early zebrafish development, but are rather specific for the development of adult skeletal and dental structures. We find that the defects of the fins and scales are due to the role of Eda signaling in organizing epidermal cells into discrete signaling centers of the scale epidermal placode and fin fold. Our genetic analysis demonstrates dose-sensitive and organ-specific response to alteration in levels of Eda signaling. In addition, we show substantial buffering of the effect of loss of edar function in different genetic backgrounds, suggesting canalization of this developmental system. We uncover a previously unknown role of Eda signaling in teleosts and show conservation of the developmental mechanisms involved in the formation and variation of both integumentary appendages and limbs. Lastly, our findings point to the utility of adult genetic screens in the zebrafish in identifying essential developmental processes involved in human disease and in morphological evolution.
TL;DR: It is found that strains collected from oak exudates are phenotypically more similar than expected based on their genetic diversity, while sake and vineyard isolates display more diverse phenotypes than expected under a neutral model.
Abstract: Interactions between an organism and its environment can significantly influence phenotypic evolution. A first step toward understanding this process is to characterize phenotypic diversity within and between populations. We explored the phenotypic variation in stress sensitivity and genomic expression in a large panel of Saccharomyces strains collected from diverse environments. We measured the sensitivity of 52 strains to 14 environmental conditions, compared genomic expression in 18 strains, and identified gene copy-number variations in six of these isolates. Our results demonstrate a large degree of phenotypic variation in stress sensitivity and gene expression. Analysis of these datasets reveals relationships between strains from similar niches, suggests common and unique features of yeast habitats, and implicates genes whose variable expression is linked to stress resistance. Using a simple metric to suggest cases of selection, we found that strains collected from oak exudates are phenotypically more similar than expected based on their genetic diversity, while sake and vineyard isolates display more diverse phenotypes than expected under a neutral model. We also show that the laboratory strain S288c is phenotypically distinct from all of the other strains studied here, in terms of stress sensitivity, gene expression, Ty copy number, mitochondrial content, and gene-dosage control. These results highlight the value of understanding the genetic basis of phenotypic variation and raise caution about using laboratory strains for comparative genomics.