Gene function classification using Bayesian models with hierarchy-based priors.
Babak Shahbaba,Radford M. Neal +1 more
TL;DR: Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.
read more
Abstract: We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone. Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Incorporating functional inter-relationships into protein function prediction algorithms.
TL;DR: A method to enhance the performance of classification-based protein function prediction algorithms by incorporating interrelationships between functional classes constituting functional classification schemes, and helps uncover novel biology in the form of previously unknown functional annotations is proposed.
Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction
TL;DR: A tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC), which takes into account network information in the learning phase improves the predictive performance of the learned models for predicting gene function.
Integration of relational and hierarchical network information for protein function prediction
TL;DR: The results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network- based methods.
Hierarchical Ensemble Methods for Protein Function Prediction
TL;DR: This paper provides a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines, and discusses the main hierarchical ensemble methods proposed in the literature in the context of existing computational methods.
•Posted Content
Using the Gene Ontology Hierarchy when Predicting Gene Function
Sara Mostafavi,Quaid Morris +1 more
TL;DR: Two simple methods for incorporating information about the hierarchical nature of the categorization scheme are proposed and results show that using the hierarchy information directly, compared to using reconciliation methods, improves gene function prediction.
35
References
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Cluster analysis and display of genome-wide expression patterns
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Improved tools for biological sequence comparison.
TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
13.3K
•Book
Escherichia coli and Salmonella :cellular and molecular biology
Frederick C. Neidhardt
- 01 Jan 1996
TL;DR: The Enteric Bacterial Cell and the Age of Bacteria Variations on a Theme by Escherichia is described.
8.1K
Related Papers (5)
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more