Proceedings Article10.1109/BIBM55620.2022.9995554
A New Phylogeny-Driven Random Forest-Based Classification Approach for Functional Metagenomics
Jyotsna Talreja Wassan,Haiying Wang,Huiru Zheng +2 more
- 06 Dec 2022
pp 32-37
1
TL;DR: In this paper , a new classification method based on RF as guided by the evolutionary ancestry of microbial phylogeny, i.e. Phylogeny-RF, has been developed in order to capture the effects of phylogenetic relatedness in a ML classifier itself.
read more
Abstract: Classifying microbial genes into their functional repertoire is an important task for metagenomic studies, where the research community is trying to develop Machine Learning (ML) based methods to achieve good classification performance. Random Forest (RF) has been proposed as one of the most favorable methods for such supervised analysis when applied over the abundance profiles of microbial genes mapping them to functional phenotypes. To further explore and make optimization in the existing RF model (based on the biological relationships between microbial features), a new classification method based on RF as guided by the evolutionary ancestry of microbial phylogeny, i.e. Phylogeny-RF, has been developed in this paper. This method facilitates to capture the effects of phylogenetic relatedness in a ML classifier itself. Closely related microbes by phylogeny are highly correlated and tend to have similar genetic and phenotypic traits. Such microbes behave similarly; and hence tend to be selected together or one of these could be dropped from the analysis, to make the ML process better. The proposed Phylogeny-RF algorithm has been compared with state-of-the-art classification methods including RF and the phylogeny-aware method of MetaPhyl, using 2 real-world 16S rRNA metagenomic data sets. It is observed that the proposed method performed better than the other phylogeny-driven benchmarks. For example, Phylogeny-RF attained a high AUC of 0.949 over soil microbiomes in comparison to other benchmarks.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Developing a new Phylogeny-driven Random Forest Model for Functional Metagenomics.
TL;DR: In this article , a Phylogeny-RF model for functional classification of metagenomes has been proposed to capture the effects of phylogenetic relatedness in an ML classifier itself rather than just applying a supervised classifier over the raw abundance of microbial genes.
4
References
Applied Logistic Regression.
TL;DR: Applied Logistic Regression, Third Edition provides an easily accessible introduction to the logistic regression model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables.
40.1K
APE: Analyses of Phylogenetics and Evolution in R language
TL;DR: UNLABELLED Analysis of Phylogenetics and Evolution (APE) is a package written in the R language for use in molecular evolution and phylogenetics that provides both utility functions for reading and writing data and manipulating phylogenetic trees.
12.5K
Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences
Morgan G. I. Langille,Jesse R. Zaneveld,J. Gregory Caporaso,J. Gregory Caporaso,Daniel McDonald,Dan Knights,Joshua A Reyes,Jose C. Clemente,Deron E. Burkepile,Rebecca Vega Thurber,Rob Knight,Rob Knight,Robert G. Beiko,Curtis Huttenhower,Curtis Huttenhower +14 more
TL;DR: The results demonstrate that phylogeny and function are sufficiently linked that this 'predictive metagenomic' approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available.
•Journal Article
Understanding interobserver agreement: the kappa statistic.
TL;DR: Items such as physical exam findings, radiographic interpretations, or other diagnostic tests often rely on some degree of subjective interpretation by observers and studies that measure the agreement between two or more observers should include a statistic that takes into account the fact that observers will sometimes agree or disagree simply by chance.
7.6K
The use of the area under the ROC curve in the evaluation of machine learning algorithms
TL;DR: AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities.
7K