TL;DR: VirSorter is a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses.
Abstract: Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter's prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in "reverse" to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.
TL;DR: The first metagenomic analyses of an uncultured viral community from human feces, using partial shotgun sequencing, show that the recognizable viruses were mostly siphophages, and the community contained an estimated 1,200 viral genotypes.
Abstract: Here we present the first metagenomic analyses of an uncultured viral community from human feces, using partial shotgun sequencing. Most of the sequences were unrelated to anything previously reported. The recognizable viruses were mostly siphophages, and the community contained an estimated 1,200 viral genotypes.
TL;DR: The most abundant fecal virus in this study was pepper mild mottle virus (PMMV), which was found in high concentrations—up to 109 virions per gram of dry weight fecal matter, indicating that this plant virus is prevalent in the human population.
Abstract: The human gut is known to be a reservoir of a wide variety of microbes, including viruses. Many RNA viruses are known to be associated with gastroenteritis; however, the enteric RNA viral community present in healthy humans has not been described. Here, we present a comparative metagenomic analysis of the RNA viruses found in three fecal samples from two healthy human individuals. For this study, uncultured viruses were concentrated by tangential flow filtration, and viral RNA was extracted and cloned into shotgun viral cDNA libraries for sequencing analysis. The vast majority of the 36,769 viral sequences obtained were similar to plant pathogenic RNA viruses. The most abundant fecal virus in this study was pepper mild mottle virus (PMMV), which was found in high concentrations—up to 109 virions per gram of dry weight fecal matter. PMMV was also detected in 12 (66.7%) of 18 fecal samples collected from healthy individuals on two continents, indicating that this plant virus is prevalent in the human population. A number of pepper-based foods tested positive for PMMV, suggesting dietary origins for this virus. Intriguingly, the fecal PMMV was infectious to host plants, suggesting that humans might act as a vehicle for the dissemination of certain plant viruses.
TL;DR: The discovery of a previously unidentified bacteriophage present in the majority of published human faecal metagenomes, which is referred to as crAssphage and predicted to have a Bacteroides host for this phage, consistent with Bactseroides-related protein homologues and a unique carbohydrate-binding domain encoded in the phage genome.
Abstract: Metagenomics, or sequencing of the genetic material from a complete microbial community, is a promising tool to discover novel microbes and viruses. Viral metagenomes typically contain many unknown sequences. Here we describe the discovery of a previously unidentified bacteriophage present in the majority of published human faecal metagenomes, which we refer to as crAssphage. Its ~97 kbp genome is six times more abundant in publicly available metagenomes than all other known phages together; it comprises up to 90% and 22% of all reads in virus-like particle (VLP)-derived metagenomes and total community metagenomes, respectively; and it totals 1.68% of all human faecal metagenomic sequencing reads in the public databases. The majority of crAssphage-encoded proteins match no known sequences in the database, which is why it was not detected before. Using a new co-occurrence profiling approach, we predict a Bacteroides host for this phage, consistent with Bacteroides-related protein homologues and a unique carbohydrate-binding domain encoded in the phage genome.
TL;DR: It is inferred that the extreme interpersonal diversity of human gut viruses derives from two sources, persistence of a small portion of the global virome within the gut of each individual and rapid evolution of some long-term virome members.
Abstract: Humans are colonized by immense populations of viruses, which metagenomic analysis shows are mostly unique to each individual. To investigate the origin and evolution of the human gut virome, we analyzed the viral community of one adult individual over 2.5 y by extremely deep metagenomic sequencing (56 billion bases of purified viral sequence from 24 longitudinal fecal samples). After assembly, 478 well-determined contigs could be identified, which are inferred to correspond mostly to previously unstudied bacteriophage genomes. Fully 80% of these types persisted throughout the duration of the 2.5-y study, indicating long-term global stability. Mechanisms of base substitution, rates of accumulation, and the amount of variation varied among viral types. Temperate phages showed relatively lower mutation rates, consistent with replication by accurate bacterial DNA polymerases in the integrated prophage state. In contrast, Microviridae, which are lytic bacteriophages with single-stranded circular DNA genomes, showed high substitution rates (>10−5 per nucleotide each day), so that sequence divergence over the 2.5-y period studied approached values sufficient to distinguish new viral species. Longitudinal changes also were associated with diversity-generating retroelements and virus-encoded Clustered Regularly Interspaced Short Palindromic Repeats arrays. We infer that the extreme interpersonal diversity of human gut viruses derives from two sources, persistence of a small portion of the global virome within the gut of each individual and rapid evolution of some long-term virome members.