TL;DR: Analysis of sibling paralogs from seven representative gene families and 300 pairs of one-to-one orthologs from different species found that structural divergences have been very prevalent in duplicate genes and, in many cases, have led to the generation of functionally distinctParalogs.
Abstract: Gene duplication plays key roles in organismal evolution. Duplicate genes, if they survive, tend to diverge in regulatory and coding regions. Divergences in coding regions, especially those that can change the function of the gene, can be caused by amino acid-altering substitutions and/or alterations in exon–intron structure. Much has been learned about the mode, tempo, and consequences of nucleotide substitutions, yet relatively little is known about structural divergences. In this study, by analyzing 612 pairs of sibling paralogs from seven representative gene families and 300 pairs of one-to-one orthologs from different species, we investigated the occurrence and relative importance of structural divergences during the evolution of duplicate and nonduplicate genes. We found that structural divergences have been very prevalent in duplicate genes and, in many cases, have led to the generation of functionally distinct paralogs. Comparisons of the genomic sequences of these genes further indicated that the differences in exon–intron structure were actually accomplished by three main types of mechanisms (exon/intron gain/loss, exonization/pseudoexonization, and insertion/deletion), each of which contributed differently to structural divergence. Like nucleotide substitutions, insertion/deletion and exonization/pseudoexonization occurred more or less randomly, with the number of observable mutational events per gene pair being largely proportional to evolutionary time. Notably, however, compared with paralogs with similar evolutionary times, orthologs have accumulated significantly fewer structural changes, whereas the amounts of amino acid replacements accumulated did not show clear differences. This finding suggests that structural divergences have played a more important role during the evolution of duplicate than nonduplicate genes.
TL;DR: Somatic copy number aberrations (SCNAs) in 1,087 unique medulloblastomas are reported, including recurrent events targeting TGF-β signalling in Group 3, and NF-κB signalling in Groups 4, which suggest future avenues for rational, targeted therapy.
Abstract: Medulloblastoma, the most common malignant paediatric brain tumour, is currently treated with nonspecific cytotoxic therapies including surgery, whole-brain radiation, and aggressive chemotherapy. As medulloblastoma exhibits marked intertumoural heterogeneity, with at least four distinct molecular variants, previous attempts to identify targets for therapy have been underpowered because of small samples sizes. Here we report somatic copy number aberrations (SCNAs) in 1,087 unique medulloblastomas. SCNAs are common in medulloblastoma, and are predominantly subgroup-enriched. The most common region of focal copy number gain is a tandem duplication of SNCAIP, a gene associated with Parkinson's disease, which is exquisitely restricted to Group 4α. Recurrent translocations of PVT1, including PVT1-MYC and PVT1-NDRG1, that arise through chromothripsis are restricted to Group 3. Numerous targetable SCNAs, including recurrent events targeting TGF-β signalling in Group 3, and NF-κB signalling in Group 4, suggest future avenues for rational, targeted therapy.
TL;DR: The identification of copy-number variation in ecological field studies of species adapting to stressful or novel environmental conditions may improve the understanding of gene duplication as a mechanism of adaptation and its relevance to the long-term persistence of gene duplications.
Abstract: A subject of extensive study in evolutionary theory has been the issue of how neutral, redundant copies can be maintained in the genome for long periods of time. Concurrently, examples of adaptive gene duplications to various environmental conditions in different species have been described. At this point, it is too early to tell whether or not a substantial fraction of gene copies have initially achieved fixation by positive selection for increased dosage. Nevertheless, enough examples have accumulated in the literature that such a possibility should be considered. Here, I review the recent examples of adaptive gene duplications and make an attempt to draw generalizations on what types of genes may be particularly prone to be selected for under certain environmental conditions. The identification of copy-number variation in ecological field studies of species adapting to stressful or novel environmental conditions may improve our understanding of gene duplications as a mechanism of adaptation and its relevance to the long-term persistence of gene duplications.
TL;DR: In this article, the authors formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes generated by widespread translational activity in non-genic sequences.
Abstract: Novel protein-coding genes can arise either through re-organization of pre-existing genes or de novo. Processes involving re-organization of pre-existing genes, notably after gene duplication, have been extensively described. In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or 'non-genic' sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions. Here we formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes generated by widespread translational activity in non-genic sequences. Testing this model at the genome scale in Saccharomyces cerevisiae, we detect translation of hundreds of short species-specific open reading frames (ORFs) located in non-genic sequences. These translation events seem to provide adaptive potential, as suggested by their differential regulation upon stress and by signatures of retention by natural selection. In line with our model, we establish that S. cerevisiae ORFs can be placed within an evolutionary continuum ranging from non-genic sequences to genes. We identify ~1,900 candidate proto-genes among S. cerevisiae ORFs and find that de novo gene birth from such a reservoir may be more prevalent than sporadic gene duplication. Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.
TL;DR: The evidence that genomic balance influences gene expression, quantitative traits, dosage compensation, aneuploid syndromes, population dynamics of copy number variants and differential evolutionary fate of genes after partial or whole-genome duplication is summarized.
Abstract: We summarize, in this review, the evidence that genomic balance influences gene expression, quantitative traits, dosage compensation, aneuploid syndromes, population dynamics of copy number variants and differential evolutionary fate of genes after partial or whole-genome duplication. Gene balance effects are hypothesized to result from stoichiometric differences among members of macromolecular complexes, the interactome, and signaling pathways. The implications of gene balance are discussed.
TL;DR: It is shown that a male-specific, duplicated copy of the anti-Müllerian hormone (amh) is implicated in testicular development of the teleost fish Patagonian pejerrey (Odontesthes hatcheri), suggesting that amhy may be the master sex-determining gene in this species.
Abstract: Gonadal sex determination in vertebrates generally follows a sequence of genetically programmed events. In what is seemingly becoming a pattern, all confirmed or current candidate “master” sex-determining genes reported in this group, e.g., SRY in eutherian mammals, DMY/dmrt1bY in medaka, DM-W in the African clawed frog, and DMRT1 in chicken encode transcription factors. In contrast, here we show that a male-specific, duplicated copy of the anti-Mullerian hormone (amh) is implicated in testicular development of the teleost fish Patagonian pejerrey (Odontesthes hatcheri). The gene, termed amhy because it is found in a single metacentric/submetacentric chromosome of XY individuals, is expressed much earlier than the autosomal amh (6 d after fertilization vs. 12 wk after fertilization) and is localized to presumptive Sertoli cells of XY males during testicular differentiation. Moreover, amhy knockdown in XY embryos resulted in the up-regulation of foxl2 and cyp19a1a mRNAs and the development of ovaries. These results are evidence of a functional amh duplication in vertebrates and suggest that amhy may be the master sex-determining gene in this species. If confirmed, this would be a unique instance of a hormone-related gene, a member of the TGF-β superfamily, in such a role.
TL;DR: The first genome-wide analysis of the whole MYB superfamily in a legume species, soybean, including the gene structures, phylogeny, chromosome locations, conserved motifs, and expression patterns, as well as a comparative genomic analysis with Arabidopsis reveals that MYB genes play conserved and various roles in plants, which is indicative of a divergence in function.
Abstract: The MYB superfamily constitutes one of the most abundant groups of transcription factors described in plants. Nevertheless, their functions appear to be highly diverse and remain rather unclear. To date, no genome-wide characterization of this gene family has been conducted in a legume species. Here we report the first genome-wide analysis of the whole MYB superfamily in a legume species, soybean (Glycine max), including the gene structures, phylogeny, chromosome locations, conserved motifs, and expression patterns, as well as a comparative genomic analysis with Arabidopsis. A total of 244 R2R3-MYB genes were identified and further classified into 48 subfamilies based on a phylogenetic comparative analysis with their putative orthologs, showed both gene loss and duplication events. The phylogenetic analysis showed that most characterized MYB genes with similar functions are clustered in the same subfamily, together with the identification of orthologs by synteny analysis, functional conservation among subgroups of MYB genes was strongly indicated. The phylogenetic relationships of each subgroup of MYB genes were well supported by the highly conserved intron/exon structures and motifs outside the MYB domain. Synonymous nucleotide substitution (dN/dS) analysis showed that the soybean MYB DNA-binding domain is under strong negative selection. The chromosome distribution pattern strongly indicated that genome-wide segmental and tandem duplication contribute to the expansion of soybean MYB genes. In addition, we found that ~ 4% of soybean R2R3-MYB genes had undergone alternative splicing events, producing a variety of transcripts from a single gene, which illustrated the extremely high complexity of transcriptome regulation. Comparative expression profile analysis of R2R3-MYB genes in soybean and Arabidopsis revealed that MYB genes play conserved and various roles in plants, which is indicative of a divergence in function. In this study we identified the largest MYB gene family in plants known to date. Our findings indicate that members of this large gene family may be involved in different plant biological processes, some of which may be potentially involved in legume-specific nodulation. Our comparative genomics analysis provides a solid foundation for future functional dissection of this family gene.
TL;DR: The data suggest a mechanism where incomplete duplication created a novel gene function-antagonizing parental SRGAP2 function-immediately "at birth" 2-3 mya, which is a time corresponding to the transition from Australopithecus to Homo and the beginning of neocortex expansion.
TL;DR: It is shown that chromosomal duplications are first acquired as a crude solution to stress, yet only as transient solutions that are eliminated and replaced by more efficient solutions obtained at the individual gene level, which indicates that aneuploidy is a useful yet short-lived intermediate that facilitates further adaptation.
Abstract: Aneuploidy, an abnormal number of chromosomes, is a widespread phenomenon found in unicellulars such as yeast, as well as in plants and in mammalians, especially in cancer. Aneuploidy is a genome-scale aberration that imposes a severe burden on the cell, yet under stressful conditions specific aneuploidies confer a selective advantage. This dual nature of aneuploidy raises the question of whether it can serve as a stable and sustainable evolutionary adaptation. To clarify this, we conducted a set of laboratory evolution experiments in yeast and followed the long-term dynamics of aneuploidy under diverse conditions. Here we show that chromosomal duplications are first acquired as a crude solution to stress, yet only as transient solutions that are eliminated and replaced by more efficient solutions obtained at the individual gene level. These transient dynamics of aneuploidy were repeatedly observed in our laboratory evolution experiments; chromosomal duplications gained under stress were eliminated not only when the stress was relieved, but even if it persisted. Furthermore, when stress was applied gradually rather than abruptly, alternative solutions appear to have emerged, but not aneuploidy. Our findings indicate that chromosomal duplication is a first evolutionary line of defense, that retains survivability under strong and abrupt selective pressures, yet it merely serves as a “quick fix,” whereas more refined and sustainable solutions take over. Thus, in the perspective of genome evolution trajectory, aneuploidy is a useful yet short-lived intermediate that facilitates further adaptation.
TL;DR: The genetic variability of RSV-A circulating in Ontario during 2010–2011 winter season is investigated by sequencing and phylogenetic analysis of the G glycoprotein gene to infer a decreased avidity of antibody to the current circulating strains.
Abstract: Human respiratory syncytial virus (HRSV) is the main cause of acute lower respiratory infections in children under 2 years of age and causes repeated infections throughout life. We investigated the genetic variability of RSV-A circulating in Ontario during 2010–2011 winter season by sequencing and phylogenetic analysis of the G glycoprotein gene.
Among the 201 consecutive RSV isolates studied, RSV-A (55.7%) was more commonly observed than RSV-B (42.3%). 59.8% and 90.1% of RSV-A infections were among children ≤12 months and ≤5 years old, respectively. On phylogenetic analysis of the second hypervariable region of the 112 RSV-A strains, 110 (98.2%) clustered within or adjacent to the NA1 genotype; two isolates were GA5 genotype. Eleven (10%) NA1-related isolates clustered together phylogenetically as a novel RSV-A genotype, named ON1, containing a 72 nucleotide duplication in the C-terminal region of the attachment (G) glycoprotein. The predicted polypeptide is lengthened by 24 amino acids and includes a23 amino acid duplication. Using RNA secondary structural software, a possible mechanism of duplication occurrence was derived. The 23 amino acid ON1 G gene duplication results in a repeat of 7 potential O-glycosylation sites including three O-linked sugar acceptors at residues 270, 275, and 283. Using Phylogenetic Analysis by Maximum Likelihood analysis, a total of 19 positively selected sites were observed among Ontario NA1 isolates; six were found to be codons which reverted to the previous state observed in the prototype RSV-A2 strain. The tendency of codon regression in the G-ectodomain may infer a decreased avidity of antibody to the current circulating strains. Further work is needed to document and further understand the emergence, virulence, pathogenicity and transmissibility of this novel RSV-A genotype with a72 nucleotide G gene duplication.
TL;DR: This book discusses the evolution of genome and karyotype restructuring in Nicotiana tabacum L, and the early stages of polyploidy: rapid and repeated evolution in Tragopogon.
Abstract: Evolutionary Significance of Whole-Genome Duplication.- Genetic Consequences of Polyploidy in Plants.- Meiosis in polyploid plants.- Origins of Novel Phenotypic Variation in Polyploids.- Identifying the Phylogenetic Context of Whole-Genome Duplications in Plants.- Ancient and Recent Polyploidy in Monocots.- Genomic Plasticity in Polyploid Wheat.- Maize (Zea mays) as a model for studying the impact of gene and regulatory sequence loss following whole genome duplication.- Polyploidy in legumes.- Jeans, genes, and genomes: cotton as a model for studying polyploidy.-Evolutionary implications of genome and karyotype restructuring in Nicotiana tabacum L.- Polyploid evolution in Spartina: Dealing with highly redundant hybrid genomes.- Allopolyploid speciation in action: the origins and evolution of Senecio cambrensis.- The early stages of polyploidy: rapid and repeated evolution in Tragopogon.- Yeast as a window into changes in genome complexity due to polyploidization.- Two Rounds of Whole Genome Duplication: Evidence and Impact on the Evolution of Vertebrate Innovations.- Polyploidy in fish and the teleost genome duplication.- Polyploidization and sex chromosome evolution in amphibians.-
TL;DR: The differences in gene expression patterns and on-going gene death among the three subgenomes suggest that “two-step" genome triplication and differential subgenome methylation played important roles in the genome evolution of B. rapa.
Abstract: Polyploidization, both ancient and recent, is frequent among plants. A “two-step theory" was proposed to explain the meso-triplication of the Brassica “A" genome: Brassica rapa. By accurately partitioning of this genome, we observed that genes in the less fractioned subgenome (LF) were dominantly expressed over the genes in more fractioned subgenomes (MFs: MF1 and MF2), while the genes in MF1 were slightly dominantly expressed over the genes in MF2. The results indicated that the dominantly expressed genes tended to be resistant against gene fractionation. By re-sequencing two B. rapa accessions: a vegetable turnip (VT117) and a Rapid Cycling line (L144), we found that genes in LF had less non-synonymous or frameshift mutations than genes in MFs; however mutation rates were not significantly different between MF1 and MF2. The differences in gene expression patterns and on-going gene death among the three subgenomes suggest that “two-step" genome triplication and differential subgenome methylation played important roles in the genome evolution of B. rapa.
TL;DR: Following various mechanisms of gene duplication, genes are often retained or lost in a biased manner, which has suggested recent models for gene family evolution, such as functional buffering and the gene balance hypothesis in addition to now‐classical models, including neofunctionalization and subfunctionalization.
Abstract: With many plant genomes sequenced, it is now clear that one distinguishing feature of angiosperm (flowering plant) genomes is their high frequency of whole-genome duplication. Single-gene duplication is also widespread in angiosperm genomes. Following various mechanisms of gene duplication, genes are often retained or lost in a biased manner, which has suggested recent models for gene family evolution, such as functional buffering and the gene balance hypothesis in addition to now-classical models, including neofunctionalization and subfunctionalization. Evolutionary consequences of gene duplication, often studied through analyzing gene expression divergence, have enhanced understanding of the biological significance of different mechanisms of gene duplication.
TL;DR: It is shown that HMPS is caused by a duplication spanning the 3′ end of the SCG5 gene and a region upstream of the GREM1 locus, which is predicted to cause reduced bone morphogenetic protein (BMP) pathway activity and underlies tumorigenesis in juvenile polyposis of the large bowel.
Abstract: Hereditary mixed polyposis syndrome (HMPS) is characterized by apparent autosomal dominant inheritance of multiple types of colorectal polyp, with colorectal carcinoma occurring in a high proportion of affected individuals. Here, we use genetic mapping, copy-number analysis, exclusion of mutations by high-throughput sequencing, gene expression analysis and functional assays to show that HMPS is caused by a duplication spanning the 3' end of the SCG5 gene and a region upstream of the GREM1 locus. This unusual mutation is associated with increased allele-specific GREM1 expression. Whereas GREM1 is expressed in intestinal subepithelial myofibroblasts in controls, GREM1 is predominantly expressed in the epithelium of the large bowel in individuals with HMPS. The HMPS duplication contains predicted enhancer elements; some of these interact with the GREM1 promoter and can drive gene expression in vitro. Increased GREM1 expression is predicted to cause reduced bone morphogenetic protein (BMP) pathway activity, a mechanism that also underlies tumorigenesis in juvenile polyposis of the large bowel.
TL;DR: The discovery that nearly half of the gene families in the ancestor of flowering plants have been lost in at least one species examined indicates that the repertoires of miRNA genes have changed more dynamically than previously thought during plant evolution.
Abstract: MicroRNAs (miRNAs) are among the most important regulatory elements of gene expression in animals and plants. However, their origin and evolutionary dynamics have not been studied systematically. In this paper, we identified putative miRNA genes in 11 plant species using the bioinformatic technique and examined their evolutionary changes. Our homology search indicated that no miRNA gene is currently shared between green algae and land plants. The number of miRNA genes has increased substantially in the land plant lineage, but after the divergence of eudicots and monocots, the number has changed in a lineage-specific manner. We found that miRNA genes have originated mainly by duplication of preexisting miRNA genes or protein-coding genes. Transposable elements also seem to have contributed to the generation of species-specific miRNA genes. The relative importance of these mechanisms in plants is quite different from that in Drosophila species, where the formation of hairpin structures in the genomes seems to be a major source of miRNA genes. This difference in the origin of miRNA genes between plants and Drosophila may be explained by the difference in the binding to target mRNAs between plants and animals. We also found that young miRNA genes are less conserved than old genes in plants as well as in Drosophila species. Yet, nearly half of the gene families in the ancestor of flowering plants have been lost in at least one species examined. This indicates that the repertoires of miRNA genes have changed more dynamically than previously thought during plant evolution.
TL;DR: The genus Saccharomyces emerges as a relevant model for polyploid studies, in addition to plant and animal models, and is compared with the knowledge acquired with conventional plant andAnimal models.
Abstract: Polyploidy is a major evolutionary process in eukaryotes—particularly in plants and, to a less extent, in animals, wherein several past and recent whole-genome duplication events have been described. Surprisingly, the incidence of polyploidy in other eukaryote kingdoms, particularly within fungi, remained largely disregarded by the scientific community working on the evolutionary consequences of polyploidy. Recent studies have significantly increased our knowledge of the occurrence and evolutionary significance of fungal polyploidy. The ecological, structural and functional consequences of polyploidy in fungi are reviewed here and compared with the knowledge acquired with conventional plant and animal models. In particular, the genus Saccharomyces emerges as a relevant model for polyploid studies, in addition to plant and animal models.
TL;DR: The results presented here provide a fundamental clue for cloning specific function genes in further studies and applications and semi-quantitative RT-PCR showed variable stress responses in subgroup III.
Abstract: WRKY transcription factors participate in diverse physiological and developmental processes in plants. They have highly conserved WRKYGQK amino acid sequences in their N-termini, followed by the novel zinc-finger-like motifs, Cys2His2 or Cys2HisCys. To date, numerous WRKY genes have been identified and characterized in a number of herbaceous species. Survey and characterization of WRKY genes in a ligneous species would facilitate a better understanding of the evolutionary processes and functions of this gene family. In this study, 104 poplar WRKY genes (PtWRKY) were identified in the latest poplar genome sequence. According to their structural features, the predicted members were divided into the previously defined groups I–III, as described in rice. In addition, chromosomal localization of the genes demonstrated that there might be WRKY gene hot spots in 2.3 Mb regions on chromosome 14. Furthermore, approximately 83% (86 out of 104) WRKY genes participated in gene duplication events, including 69% (29 out of 42) gene pairs which exhibited segmental duplication. Using semi-quantitative RT-PCR, the expression patterns of subgroup III genes were investigated under different stresses [cold, drought, salinity and salicylic acid (SA)]. The data revealed that these genes presented different expression levels in response to various stress conditions. Expression analysis exhibited PtWRKY76 gene induced markedly in 0.1 mM SA or 25% PEG-6000 treatment. The results presented here provide a fundamental clue for cloning specific function genes in further studies and applications. Key message This study identified 104 poplar WRKY genes and demonstrated WRKY gene hot spots on chromosome 14. Furthermore, semi-quantitative RT-PCR showed variable stress responses in subgroup III.
TL;DR: The combination of the genome-wide identification and the expression and diurnal analysis of the OsBBX gene family should facilitate additional functional studies of theOsBBX genes.
Abstract: Background
The B-box (BBX) -containing proteins are a class of zinc finger proteins that contain one or two B-box domains and play important roles in plant growth and development. The Arabidopsis BBX gene family has recently been re-identified and renamed. However, there has not been a genome-wide survey of the rice BBX (OsBBX) gene family until now.
Methodology/Principal Findings
In this study, we identified 30 rice BBX genes through a comprehensive bioinformatics analysis. Each gene was assigned a uniform nomenclature. We described the chromosome localizations, gene structures, protein domains, phylogenetic relationship, whole life-cycle expression profile and diurnal expression patterns of the OsBBX family members. Based on the phylogeny and domain constitution, the OsBBX gene family was classified into five subfamilies. The gene duplication analysis revealed that only chromosomal segmental duplication contributed to the expansion of the OsBBX gene family. The expression profile of the OsBBX genes was analyzed by Affymetrix GeneChip microarrays throughout the entire life-cycle of rice cultivar Zhenshan 97 (ZS97). In addition, microarray analysis was performed to obtain the expression patterns of these genes under light/dark conditions and after three phytohormone treatments. This analysis revealed that the expression patterns of the OsBBX genes could be classified into eight groups. Eight genes were regulated under the light/dark treatments, and eleven genes showed differential expression under at least one phytohormone treatment. Moreover, we verified the diurnal expression of the OsBBX genes using the data obtained from the Diurnal Project and qPCR analysis, and the results indicated that many of these genes had a diurnal expression pattern.
Conclusions/Significance
The combination of the genome-wide identification and the expression and diurnal analysis of the OsBBX gene family should facilitate additional functional studies of the OsBBX genes.
TL;DR: Coevolution in the primate lineage of killer immunoglobulin-like receptors and human leukocyte antigens can be linked to the deep invasion of the uterus by trophoblast that is a characteristic feature of human placentation.
Abstract: Placenta has a wide range of functions. Some are supported by novel genes that have evolved following gene duplication events while others require acquisition of gene expression by the trophoblast. Although not expressed in the placenta, high-affinity fetal hemoglobins play a key role in placental gas exchange. They evolved following duplications within the beta-globin gene family with convergent evolution occurring in ruminants and primates. In primates there was also an interesting rearrangement of a cassette of genes in relation to an upstream locus control region. Substrate transfer from mother to fetus is maintained by expression of classic sugar and amino acid transporters at the trophoblast microvillous and basal membranes. In contrast, placental peptide hormones have arisen largely by gene duplication, yielding for example chorionic gonadotropins from the luteinizing hormone gene and placental lactogens from the growth hormone and prolactin genes. There has been a remarkable degree of convergent evolution with placental lactogens emerging separately in the ruminant, rodent, and primate lineages and chorionic gonadotropins evolving separately in equids and higher primates. Finally, coevolution in the primate lineage of killer immunoglobulin-like receptors and human leukocyte antigens can be linked to the deep invasion of the uterus by trophoblast that is a characteristic feature of human placentation.
TL;DR: The integrative approach revealed at least two separate networks of genomic alterations linked to the molecular diversity seen in UC, and that these circuits may reflect distinct pathways of tumor development.
Abstract: Similar to other malignancies, urothelial carcinoma (UC) is characterized by specific recurrent chromosomal aberrations and gene mutations. However, the interconnection between specific genomic alterations, and how patterns of chromosomal alterations adhere to different molecular subgroups of UC, is less clear. We applied tiling resolution array CGH to 146 cases of UC and identified a number of regions harboring recurrent focal genomic amplifications and deletions. Several potential oncogenes were included in the amplified regions, including known oncogenes like E2F3, CCND1, and CCNE1, as well as new candidate genes, such as SETDB1 (1q21), and BCL2L1 (20q11). We next combined genome profiling with global gene expression, gene mutation, and protein expression data and identified two major genomic circuits operating in urothelial carcinoma. The first circuit was characterized by FGFR3 alterations, overexpression of CCND1, and 9q and CDKN2A deletions. The second circuit was defined by E3F3 amplifications and RB1 deletions, as well as gains of 5p, deletions at PTEN and 2q36, 16q, 20q, and elevated CDKN2A levels. TP53/MDM2 alterations were common for advanced tumors within the two circuits. Our data also suggest a possible RAS/RAF circuit. The tumors with worst prognosis showed a gene expression profile that indicated a keratinized phenotype. Taken together, our integrative approach revealed at least two separate networks of genomic alterations linked to the molecular diversity seen in UC, and that these circuits may reflect distinct pathways of tumor development.
TL;DR: A new probabilistic model is presented, DLCoal, that defines gene duplication and loss in a population setting, such that coalescence and ILS can be directly addressed, and the first general reconciliation method that accurately infers gene duplications and losses in the presence of ILS is developed.
Abstract: Gene phylogenies provide a rich source of information about the way evolution shapes genomes, populations, and phenotypes. In addition to substitutions, evolutionary events such as gene duplication and loss (as well as horizontal transfer) play a major role in gene evolution, and many phylogenetic models have been developed in order to reconstruct and study these events. However, these models typically make the simplifying assumption that population-related effects such as incomplete lineage sorting (ILS) are negligible. While this assumption may have been reasonable in some settings, it has become increasingly problematic as increased genome sequencing has led to denser phylogenies, where effects such as ILS are more prominent. To address this challenge, we present a new probabilistic model, DLCoal, that defines gene duplication and loss in a population setting, such that coalescence and ILS can be directly addressed. Interestingly, this model implies that in addition to the usual gene tree and species tree, there exists a third tree, the locus tree, which will likely have many applications. Using this model, we develop the first general reconciliation method that accurately infers gene duplications and losses in the presence of ILS, and we show its improved inference of orthologs, paralogs, duplications, and losses for a variety of clades, including flies, fungi, and primates. Also, our simulations show that gene duplications increase the frequency of ILS, further illustrating the importance of a joint model. Going forward, we believe that this unified model can offer insights to questions in both phylogenetics and population genetics.
TL;DR: It is found that zinc finger nucleases designed to target two different sites in a human chromosome could introduce two concurrent double-strand breaks, whose repair via non-homologous end-joining (NHEJ) gives rise to targeted duplications and inversions of the genomic segments of up to a mega base pair in length between the two sites.
Abstract: Despite the recent discoveries of and interest in numerous structural variations (SVs)—which include duplications and inversions—in the human and other higher eukaryotic genomes, little is known about the etiology and biology of these SVs, partly due to the lack of molecular tools with which to create individual SVs in cultured cells and model organisms. Here, we present a novel method of inducing duplications and inversions in a targeted manner without pre-manipulation of the genome. We found that zinc finger nucleases (ZFNs) designed to target two different sites in a human chromosome could introduce two concurrent double-strand breaks, whose repair via non-homologous end-joining (NHEJ) gives rise to targeted duplications and inversions of the genomic segments of up to a mega base pair (bp) in length between the two sites. Furthermore, we demonstrated that a ZFN pair could induce the inversion of a 140-kbp chromosomal segment that contains a portion of the blood coagulation factor VIII gene to mimic the inversion genotype that is associated with some cases of severe hemophilia A. This same ZFN pair could be used, in theory, to revert the inverted region to restore genomic integrity in these hemophilia A patients. We propose that ZFNs can be employed as molecular tools to study mechanisms of chromosomal rearrangements and to create SVs in a predetermined manner so as to study their biological roles. In addition, our method raises the possibility of correcting genetic defects caused by chromosomal rearrangements and holds new promise in gene and cell therapy.
TL;DR: It is hypothesised that defence of the genome against endogenous retroelements has been an additional evolutionary driver for PYHIN proteins.
Abstract: Proteins of the mammalian PYHIN (IFI200/HIN-200) family are involved in defence against infection through recognition of foreign DNA. The family member absent in melanoma 2 (AIM2) binds cytosolic DNA via its HIN domain and initiates inflammasome formation via its pyrin domain. AIM2 lies within a cluster of related genes, many of which are uncharacterised in mouse. To better understand the evolution, orthology and function of these genes, we have documented the range of PYHIN genes present in representative mammalian species, and undertaken phylogenetic and expression analyses. No PYHIN genes are evident in non-mammals or monotremes, with a single member found in each of three marsupial genomes. Placental mammals show variable family expansions, from one gene in cow to four in human and 14 in mouse. A single HIN domain appears to have evolved in the common ancestor of marsupials and placental mammals, and duplicated to give rise to three distinct forms (HIN-A, -B and -C) in the placental mammal ancestor. Phylogenetic analyses showed that AIM2 HIN-C and pyrin domains clearly diverge from the rest of the family, and it is the only PYHIN protein with orthology across many species. Interestingly, although AIM2 is important in defence against some bacteria and viruses in mice, AIM2 is a pseudogene in cow, sheep, llama, dolphin, dog and elephant. The other 13 mouse genes have arisen by duplication and rearrangement within the lineage, which has allowed some diversification in expression patterns. The role of AIM2 in forming the inflammasome is relatively well understood, but molecular interactions of other PYHIN proteins involved in defence against foreign DNA remain to be defined. The non-AIM2 PYHIN protein sequences are very distinct from AIM2, suggesting they vary in effector mechanism in response to foreign DNA, and may bind different DNA structures. The PYHIN family has highly varied gene composition between mammalian species due to lineage-specific duplication and loss, which probably indicates different adaptations for fighting infectious disease. Non-genomic DNA can indicate infection, or a mutagenic threat. We hypothesise that defence of the genome against endogenous retroelements has been an additional evolutionary driver for PYHIN proteins.
TL;DR: Several genes identified that are implicated in sister chromatid cohesion and segregation are homologous to genes identified in a yeast mutant screen as necessary for survival of polyploid cells, and also implicated in genome instability in human diseases including cancer.
Abstract: Genome duplication, which results in polyploidy, is disruptive to fundamental biological processes. Genome duplications occur spontaneously in a range of taxa and problems such as sterility, aneuploidy, and gene expression aberrations are common in newly formed polyploids. In mammals, genome duplication is associated with cancer and spontaneous abortion of embryos. Nevertheless, stable polyploid species occur in both plants and animals. Understanding how natural selection enabled these species to overcome early challenges can provide important insights into the mechanisms by which core cellular functions can adapt to perturbations of the genomic environment. Arabidopsis arenosa includes stable tetraploid populations and is related to well-characterized diploids A. lyrata and A. thaliana. It thus provides a rare opportunity to leverage genomic tools to investigate the genetic basis of polyploid stabilization. We sequenced the genomes of twelve A. arenosa individuals and found signatures suggestive of recent and ongoing selective sweeps throughout the genome. Many of these are at genes implicated in genome maintenance functions, including chromosome cohesion and segregation, DNA repair, homologous recombination, transcriptional regulation, and chromatin structure. Numerous encoded proteins are predicted to interact with one another. For a critical meiosis gene, ASYNAPSIS1, we identified a non-synonymous mutation that is highly differentiated by cytotype, but present as a rare variant in diploid A. arenosa, indicating selection may have acted on standing variation already present in the diploid. Several genes we identified that are implicated in sister chromatid cohesion and segregation are homologous to genes identified in a yeast mutant screen as necessary for survival of polyploid cells, and also implicated in genome instability in human diseases including cancer. This points to commonalities across kingdoms and supports the hypothesis that selection has acted on genes controlling genome integrity in A. arenosa as an adaptive response to genome doubling.
TL;DR: A strategy to analyze both the PKD1 and PKD2 genes using next-generation sequencing by pooling long-range PCR amplicons and multiplexing bar-coded libraries is developed and validated and is a model for future genetic characterization of large ADPKD populations.
Abstract: Mutations in two large multi-exon genes, PKD1 and PKD2, cause autosomal dominant polycystic kidney disease (ADPKD). The duplication of PKD1 exons 1-32 as six pseudogenes on chromosome 16, the high level of allelic heterogeneity, and the cost of Sanger sequencing complicate mutation analysis, which can aid diagnostics of ADPKD. We developed and validated a strategy to analyze both the PKD1 and PKD2 genes using next-generation sequencing by pooling long-range PCR amplicons and multiplexing bar-coded libraries. We used this approach to characterize a cohort of 230 patients with ADPKD. This process detected definitely and likely pathogenic variants in 115 (63%) of 183 patients with typical ADPKD. In addition, we identified atypical mutations, a gene conversion, and one missed mutation resulting from allele dropout, and we characterized the pattern of deep intronic variation for both genes. In summary, this strategy involving next-generation sequencing is a model for future genetic characterization of large ADPKD populations.
TL;DR: A population genetics approach is developed and identified nine segregating structural forms of human chromosome 17q21.31 and it is shown that complex genome structures can be analyzed by imputation from SNPs.
Abstract: Structurally complex genomic regions are not yet well understood. One such locus, human chromosome 17q21.31, contains a megabase-long inversion polymorphism, many uncharacterized copy-number variations (CNVs) and markers that associate with female fertility, female meiotic recombination and neurological disease. Additionally, the inverted H2 form of 17q21.31 seems to be positively selected in Europeans. We developed a population genetics approach to analyze complex genome structures and identified nine segregating structural forms of 17q21.31. Both the H1 and H2 forms of the 17q21.31 inversion polymorphism contain independently derived, partial duplications of the KANSL1 gene; these duplications, which produce novel KANSL1 transcripts, have both recently risen to high allele frequencies (26% and 19%) in Europeans. An older H2 form lacking such a duplication is present at low frequency in European and central African hunter-gatherer populations. We further show that complex genome structures can be analyzed by imputation from SNPs.
TL;DR: This study has separated the two subgenomes present in modern grasses by combining identification of syntenic gene blocks with measures of gene pair divergence and different frequencies of ancient gene loss, suggesting that post-WGD gene loss may not be the cause of the grass radiation.
Abstract: The grasses, Poaceae, are one of the largest and most successful angiosperm families. Like many radiations of flowering plants, the divergence of the major grass lineages was preceded by a whole-genome duplication (WGD), although these events are not rare for flowering plants. By combining identification of syntenic gene blocks with measures of gene pair divergence and different frequencies of ancient gene loss, we have separated the two subgenomes present in modern grasses. Reciprocal loss of duplicated genes or genomic regions has been hypothesized to reproductively isolate populations and, thus, speciation. However, in contrast to previous studies in yeast and teleost fishes, we found very little evidence of reciprocal loss of homeologous genes between the grasses, suggesting that post-WGD gene loss may not be the cause of the grass radiation. The sets of homeologous and orthologous genes and predicted locations of deleted genes identified in this study, as well as links to the CoGe comparative genomics web platform for analyzing pan-grass syntenic regions, are provided along with this paper as a resource for the grass genetics community.
TL;DR: How the use of 'omics' data in network analysis can provide novel insights on network redundancy and rewiring is discussed and some directions for future research are concluded.
TL;DR: Examples taken from the MADS-box gene family are used to review what is known about the factors that influence the loss and retention of genes duplicated in different ways and examine the varied fates of the retained genes and their associated biological outcomes.
TL;DR: The results provide a time framework for identifying ancestral and derived genomic arrangements in the APOBEC loci, and to date the expansion of this gene family for different lineages through time, as a response to changes in viral/retroviral/retrotransposon pressure.
Abstract: The APOBEC3 (A3) genes play a key role in innate antiviral defense in mammals by introducing directed mutations in the DNA. The human genome encodes for seven A3 genes, with multiple splice alternatives. Different A3 proteins display different substrate specificity, but the very basic question on how discerning self from non-self still remains unresolved. Further, the expression of A3 activity/ies shapes the way both viral and host genomes evolve. We present here a detailed temporal analysis of the origin and expansion of the A3 repertoire in mammals. Our data support an evolutionary scenario where the genome of the mammalian ancestor encoded for at least one ancestral A3 gene, and where the genome of the ancestor of placental mammals (and possibly of the ancestor of all mammals) already encoded for an A3Z1-A3Z2-A3Z3 arrangement. Duplication events of the A3 genes have occurred independently in different lineages: humans, cats and horses. In all of them, gene duplication has resulted in changes in enzyme activity and/or substrate specificity, in a paradigmatic example of convergent adaptive evolution at the genomic level. Finally, our results show that evolutionary rates for the three A3Z1, A3Z2 and A3Z3 motifs have significantly decreased in the last 100 Mya. The analysis constitutes a textbook example of the evolution of a gene locus by duplication and sub/neofunctionalization in the context of virus-host arms race. Our results provide a time framework for identifying ancestral and derived genomic arrangements in the APOBEC loci, and to date the expansion of this gene family for different lineages through time, as a response to changes in viral/retroviral/retrotransposon pressure.