TL;DR: By phylogenetic analysis, the evolutionary origin of 675 Tetraodon duplicated genes assigned to chromosomes are determined, showing that similar mechanisms are at work in fishes as in yeast or plants and provide a framework for future investigation of the consequences of duplication in fishes and other animals.
Abstract: Teleost fishes provide the first unambiguous support for ancient whole-genome duplication in an animal lineage. Studies in yeast or plants have shown that the effects of such duplications can be mediated by a complex pattern of gene retention and changes in evolutionary pressure. To explore such patterns in fishes, we have determined by phylogenetic analysis the evolutionary origin of 675 Tetraodon duplicated genes assigned to chromosomes, using additional data from other species of actinopterygian fishes. The subset of genes, which was retained in double after the genome duplication, is enriched in development, signaling, behavior, and regulation functional categories. The evolutionary rate of duplicate fish genes appears to be determined by 3 forces: 1) fish proteins evolve faster than mammalian orthologs; 2) the genes kept in double after genome duplication represent the subset under strongest purifying selection; and 3) following duplication, there is an asymmetric acceleration of evolutionary rate in one of the paralogs. These results show that similar mechanisms are at work in fishes as in yeast or plants and provide a framework for future investigation of the consequences of duplication in fishes and other animals.
TL;DR: To better understand genome function and evolution in Mycobacterium tuberculosis, the genomes of 100 epidemiologically well characterized clinical isolates were interrogated by DNA microarrays and sequencing and 224 genes were found to be partially or completely deleted.
Abstract: To better understand genome function and evolution in Mycobacterium tuberculosis, the genomes of 100 epidemiologically well characterized clinical isolates were interrogated by DNA microarrays and sequencing. We identified 68 different large-sequence polymorphisms (comprising 186,137 bp, or 4.2% of the genome) that are present in H37Rv, but absent from one or more clinical isolates. A total of 224 genes (5.5%), including genes in all major functional categories, were found to be partially or completely deleted. Deletions are not distributed randomly throughout the genome but instead tend to be aggregated. The distinct deletions in some aggregations appear in closely related isolates, suggesting a genomically disruptive process specific to an individual mycobacterial lineage. Other genomic aggregations include distinct deletions that appear in phylogenetically unrelated isolates, suggesting that a genomic region is vulnerable throughout the species. Although the deletions identified here are evidently inessential to the causation of disease (they are found in active clinical cases), their frequency spectrum suggests that most are weakly deleterious to the pathogen. For some deletions, short-term evolutionary pressure due to the host immune system or antibiotics may favor the elimination of genes, whereas longer-term physiological requirements maintain the genes in the population.
TL;DR: This work presents a statistical approach for quantifying the contribution of residues and their interactions to protein function, using a statistical energy, the evolutionary Hamiltonian, and finds that these probability models predict the experimental effects of mutations with reasonable accuracy for a number of proteins.
Abstract: Modern biomedicine is challenged to predict the effects of genetic variation. Systematic functional assays of point mutants of proteins have provided valuable empirical information, but vast regions of sequence space remain unexplored. Fortunately, the mutation-selection process of natural evolution has recorded rich information in the diversity of natural protein sequences. Here, building on probabilistic models for correlated amino-acid substitutions that have been successfully applied to determine the three-dimensional structures of proteins, we present a statistical approach for quantifying the contribution of residues and their interactions to protein function, using a statistical energy, the evolutionary Hamiltonian. We find that these probability models predict the experimental effects of mutations with reasonable accuracy for a number of proteins, especially where the selective pressure is similar to the evolutionary pressure on the protein, such as antibiotics.
TL;DR: Using a maximum-likelihood formalism, a method with which to reconstruct the sequences of ancestral proteins is developed that allows the calculation of not only the most probable ancestral sequence but also of the probability of any amino acid at any given node in the evolutionary tree.
Abstract: Using a maximum-likelihood formalism, we have developed a method with which to reconstruct the sequences of ancestral proteins. Our approach allows the calculation of not only the most probable ancestral sequence but also of the probability of any amino acid at any given node in the evolutionary tree. Because we consider evolution on the amino acid level, we are better able to include effects of evolutionary pressure and take advantage of structural information about the protein through the use of mutation matrices that depend on secondary structure and surface accessibility. The computational complexity of this method scales linearly with the number of homologous proteins used to reconstruct the ancestral sequence.
TL;DR: The genome tree reveals several significant and notable differences from the gene trees, and these differences invoke new discussions about alternative narratives for the evolution of some of the currently accepted fungal groups.
Abstract: Fungi belong to one of the largest and most diverse kingdoms of living organisms. The evolutionary kinship within a fungal population has so far been inferred mostly from the gene-information–based trees (“gene trees”), constructed commonly based on the degree of differences of proteins or DNA sequences of a small number of highly conserved genes common among the population by a multiple sequence alignment (MSA) method. Since each gene evolves under different evolutionary pressure and time scale, it has been known that one gene tree for a population may differ from other gene trees for the same population depending on the subjective selection of the genes. Within the last decade, a large number of whole-genome sequences of fungi have become publicly available, which represent, at present, the most fundamental and complete information about each fungal organism. This presents an opportunity to infer kinship among fungi using a whole-genome information-based tree (“genome tree”). The method we used allows comparison of whole-genome information without MSA, and is a variation of a computational algorithm developed to find semantic similarities or plagiarism in two books, where we represent whole-genomic information of an organism as a book of words without spaces. The genome tree reveals several significant and notable differences from the gene trees, and these differences invoke new discussions about alternative narratives for the evolution of some of the currently accepted fungal groups.