TL;DR: M mothur is used as a case study to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments.
Abstract: mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the alpha and beta diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.
TL;DR: In this article, the authors compared six bioinformatic pipelines for the analysis of amplicon sequence data: three OTU-level flows (QIIME-uclust, MOTHUR, and USEARCH-UPARSE) and three ASV-level (DADA2, Qiime2-Deblur, and Uplabel-UNOISE3).
Abstract: Microbial amplicon sequencing studies are an important tool in biological and biomedical research. Widespread 16S rRNA gene microbial surveys have shed light on the structure of many ecosystems inhabited by bacteria, including the human body. However, specialized software and algorithms are needed to convert raw sequencing data into biologically meaningful information (i.e. tables of bacterial counts). While different bioinformatic pipelines are available in a rapidly changing and improving field, users are often unaware of limitations and biases associated with individual pipelines and there is a lack of agreement regarding best practices. Here, we compared six bioinformatic pipelines for the analysis of amplicon sequence data: three OTU-level flows (QIIME-uclust, MOTHUR, and USEARCH-UPARSE) and three ASV-level (DADA2, Qiime2-Deblur, and USEARCH-UNOISE3). We tested workflows with different quality control options, clustering algorithms, and cutoff parameters on a mock community as well as on a large (N = 2170) recently published fecal sample dataset from the multi-ethnic HELIUS study. We assessed the sensitivity, specificity, and degree of consensus of the different outputs. DADA2 offered the best sensitivity, at the expense of decreased specificity compared to USEARCH-UNOISE3 and Qiime2-Deblur. USEARCH-UNOISE3 showed the best balance between resolution and specificity. OTU-level USEARCH-UPARSE and MOTHUR performed well, but with lower specificity than ASV-level pipelines. QIIME-uclust produced large number of spurious OTUs as well as inflated alpha-diversity measures and should be avoided in future studies. This study provides guidance for researchers using amplicon sequencing to gain biological insights.
TL;DR: The caeca microbial communities were more diverse in comparison to ilea and caeca, and the main functional differences between the two sites were found to be related to nutrient absorption and bacterial colonization.
Abstract: Chicken gut microbiota has paramount roles in host performance, health and immunity. Understanding the topological difference in gut microbial community composition is crucial to provide knowledge on the functions of each members of microbiota to the physiological maintenance of the host. The gut microbiota profiling of the chicken was commonly performed previously using culture-dependent and early culture-independent methods which had limited coverage and accuracy. Advances in technology based on next-generation sequencing (NGS), offers unparalleled coverage and depth in determining microbial gut dynamics. Thus, the aim of this study was to investigate the ileal and caecal microbiota development as chicken aged, which is important for future effective gut modulation. Ileal and caecal contents of broiler chicken were extracted from 7, 14, 21 and 42-day old chicken. Genomic DNA was then extracted and amplified based on V3 hyper-variable region of 16S rRNA. Bioinformatics, ecological and statistical analyses such as Principal Coordinate Analysis (PCoA) was performed in mothur software and plotted using PRIMER 6. Additional analyses for predicted metagenomes were performed through PICRUSt and STAMP software package based on Greengenes databases. A distinctive difference in bacterial communities was observed between ilea and caeca as the chicken aged (P < 0.001). The microbial communities in the caeca were more diverse in comparison to the ilea communities. The potentially pathogenic bacteria such as Clostridium were elevated as the chicken aged and the population of beneficial microbe such as Lactobacillus was low at all intervals. On the other hand, based on predicted metagenomes analysed, clear distinction in functions and roles of gut microbiota such as gene pathways related to nutrient absorption (e.g. sugar and amino acid metabolism), and bacterial proliferation and colonization (e.g. bacterial motility proteins, two-component system and bacterial secretion system) were observed between ilea and caeca, respectively (P < 0.05). The caeca microbial communities were more diverse in comparison to ilea. The main functional differences between the two sites were found to be related to nutrient absorption and bacterial colonization. Based on the composition of the microbial community, future gut modulation with beneficial bacteria such as probiotics may benefit the host.
TL;DR: The mothur software package was designed to create a comprehensive package that allowed users to analyze amplicon sequence data using the most robust methods available and strives to expand mothur's capabilities in a data-driven manner to incorporate new ideas and accommodate changes in data and desires of the research community.
Abstract: More than 10 years ago, we published the paper describing the mothur software package in Applied and Environmental Microbiology. Our goal was to create a comprehensive package that allowed users to analyze amplicon sequence data using the most robust methods available. mothur has helped lead the community through the ongoing sequencing revolution and continues to provide this service to the microbial ecology community. Beyond its success and impact on the field, mothur’s development exposed a series of observations that are generally translatable across science. Perhaps the observation that stands out the most is that all science is done in the context of prevailing ideas and available technologies. Although it is easy to criticize choices that were made 10 years ago through a modern lens, if we were to wait for all of the possible limitations to be solved before proceeding, science would stall. Even preceding the development of mothur, it was necessary to address the most important problems and work backwards to other problems that limited access to robust sequence analysis tools. At the same time, we strive to expand mothur’s capabilities in a data-driven manner to incorporate new ideas and accommodate changes in data and desires of the research community. It has been edifying to see the benefit that a simple set of tools can bring to so many other researchers.
TL;DR: This dataset represents a comprehensive resource of sponge-associated microbial communities based on 16S rRNA gene sequences that can be used to address overarching hypotheses regarding host-associated prokaryotes, including host specificity, convergent evolution, environmental drivers of microbiome structure, and the sponge- associated rare biosphere.
Abstract: Marine sponges (phylum Porifera) are a diverse, phylogenetically deep-branching clade known for forming intimate partnerships with complex communities of microorganisms. To date, 16S rRNA gene sequencing studies have largely utilised different extraction and amplification methodologies to target the microbial communities of a limited number of sponge species, severely limiting comparative analyses of sponge microbial diversity and structure. Here, we provide an extensive and standardised dataset that will facilitate sponge microbiome comparisons across large spatial, temporal, and environmental scales. Samples from marine sponges (n = 3569 specimens), seawater (n = 370), marine sediments (n = 65) and other environments (n = 29) were collected from different locations across the globe. This dataset incorporates at least 268 different sponge species, including several yet unidentified taxa. The V4 region of the 16S rRNA gene was amplified and sequenced from extracted DNA using standardised procedures. Raw sequences (total of 1.1 billion sequences) were processed and clustered with (i) a standard protocol using QIIME closed-reference picking resulting in 39 543 operational taxonomic units (OTU) at 97% sequence identity, (ii) a de novo clustering using Mothur resulting in 518 246 OTUs, and (iii) a new high-resolution Deblur protocol resulting in 83 908 unique bacterial sequences. Abundance tables, representative sequences, taxonomic classifications, and metadata are provided. This dataset represents a comprehensive resource of sponge-associated microbial communities based on 16S rRNA gene sequences that can be used to address overarching hypotheses regarding host-associated prokaryotes, including host specificity, convergent evolution, environmental drivers of microbiome structure, and the sponge-associated rare biosphere.