Illuminating Dark Proteins using Reactome Pathways
Lisa Matthews,Robin Haw,Timothy Brunson,Nasim Sanati,Solomon Shorser,Deidre Beavers,P. Conley,Lincoln Stein,Peter D'Eustachio +8 more
TL;DR: A random forest is trained with 106 protein/gene pairwise features collected from multiple resources to predict functional interactions between dark proteins and proteins annotated in Reactome and three scores are developed to measure the interactions betweendark proteins and Reactome pathways based on enrichment analysis and fuzzy logic simulations.
read more
Abstract: Limited knowledge about a substantial portion of protein coding genes, known as “dark” proteins, hinders our understanding of their functions and potential therapeutic applications. To address this, we leveraged Reactome, the most comprehensive, open source, open-access pathway knowledgebase, to contextualize dark proteins within biological pathways. By integrating multiple resources and employing a random forest classifier trained on 106 protein/gene pairwise features, we predicted functional interactions between dark proteins and Reactome-annotated proteins. We then developed three scores to measure the interactions between dark proteins and Reactome pathways, utilizing enrichment analysis and fuzzy logic simulations. Correlation analysis of these scores with an independent single-cell RNA sequencing dataset provided supporting evidence for this approach. Furthermore, systematic natural language processing (NLP) analysis of over 22 million PubMed abstracts and manual checking of the literature associated with 20 randomly selected dark proteins reinforced the predicted interactions between proteins and pathways. To enhance the visualization and exploration of dark proteins within Reactome pathways, we developed the Reactome IDG portal, deployed at https://idg.reactome.org, a web application featuring tissue-specific protein and gene expression overlay, as well as drug interactions. Our integrated computational approach, together with the user-friendly web platform, offers a valuable resource for uncovering potential biological functions and therapeutic implications of dark proteins.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The Reactome Pathway Knowledgebase 2024.
Marija Milacic,Deidre Beavers,P. Conley,Chuqiao Gong,Marc Gillespie,Johannes Griss,Robin Haw,Bijay Jassal,Lisa Matthews,Bruce May,Robert Petryszak,Eliot Ragueneau,Karen Rothfels,Cristoffer Sevilla,Veronica Shamovsky,Ralf Stephan,Krishna Tiwari,Thawfeek M. Varusai,Joel Weiser,Adam Wright,Guanming Wu,Lincoln Stein,H. Hermjakob,Peter D'Eustachio +23 more
TL;DR: Progress towards annotation of the entire human proteome, targeted annotation of disease-causing genetic variants of proteins and of small-molecule drugs in a pathway context, and towards supporting explicit annotation of cell- and tissue-specific pathways are reviewed.
341
State of the Interactomes: an evaluation of molecular networks for generating biological insights
Sarah N. Wright,Scott Colton,Leah V. Schaffer,Rudolf T. Pillich,Christopher Churas,Dexter Pratt,Trey Ideker +6 more
TL;DR: This study evaluates 46 human interactomes, identifying large composite networks as most effective for disease gene identification and smaller networks for interaction prediction, providing a benchmark for network performance and a pipeline for future assessments.
References
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Yoav Benjamini,Yosef Hochberg +1 more
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.
Damian Szklarczyk,Annika L. Gable,David Lyon,Alexander Junge,Stefan Wyder,Jaime Huerta-Cepas,Milan Simonovic,Nadezhda Tsankova Doncheva,John H. Morris,Peer Bork,Lars Juhl Jensen,Christian von Mering +11 more
TL;DR: The latest version of STRING more than doubles the number of organisms it covers, and offers an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input.
16.2K
Integrating single-cell transcriptomic data across different conditions, technologies, and species.
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
The Genotype-Tissue Expression (GTEx) project
John T. Lonsdale,Jeffrey Thomas,Mike Salvatore,Rebecca Phillips,Edmund Lo,Saboor Shad,Richard Hasz,Gary Walters,Fernando U. Garcia,Nancy Young,Barbara A. Foster,Mike Moser,Ellen Karasik,Bryan Gillard,Kimberley Ramsey,Susan L. Sullivan,Jason Bridge,Harold Magazine,John Syron,Johnelle Fleming,Laura A. Siminoff,Heather M. Traino,Maghboeba Mosavel,Laura Barker,Scott D. Jewell,Daniel C. Rohrer,Dan Maxim,Dana Filkins,Philip Harbach,Eddie Cortadillo,Bree Berghuis,Lisa Turner,Eric Hudson,Kristin Feenstra,Leslie H. Sobin,James A. Robb,Phillip Branton,Greg E. Korzeniewski,Charles Shive,David Tabor,Liqun Qi,Kevin Groch,Sreenath Nampally,Steve Buia,Angela Zimmerman,Anna M. Smith,Robin Burges,Karna Robinson,Kim Valentino,Deborah Bradbury,Mark Cosentino,Norma Diaz-Mayoral,Mary Kennedy,Theresa Engel,Penelope Williams,Kenyon Erickson,Kristin G. Ardlie,Wendy Winckler,Gad Getz,Gad Getz,David S. DeLuca,MacArthur Daniel MacArthur,MacArthur Daniel MacArthur,Manolis Kellis,Alexander Thomson,Taylor Young,Ellen Gelfand,Molly Donovan,Yan Meng,George B. Grant,Deborah C. Mash,Yvonne Marcus,Margaret J. Basile,Jun Liu,Jun Zhu,Zhidong Tu,Nancy J. Cox,Dan L. Nicolae,Eric R. Gamazon,Hae Kyung Im,Anuar Konkashbaev,Jonathan K. Pritchard,Jonathan K. Pritchard,Matthew Stevens,Timothée Flutre,Xiaoquan Wen,Emmanouil T. Dermitzakis,Tuuli Lappalainen,Roderic Guigó,Jean Monlong,Michael Sammeth,Daphne Koller,Alexis Battle,Sara Mostafavi,Mark I. McCarthy,Manual Rivas,Julian Maller,Ivan Rusyn,Andrew B. Nobel,Fred A. Wright,Andrey A. Shabalin,Mike Feolo,Nataliya Sharopova,Anne Sturcke,Justin Paschal,James M. Anderson,Elizabeth L. Wilder,Leslie Derr,Eric D. Green,Jeffery P. Struewing,Gary F. Temple,Simona Volpi,Joy T. Boyer,Elizabeth J. Thomson,Mark S. Guyer,Cathy Ng,Assya Abdallah,Deborah Colantuoni,Thomas R. Insel,Susan E. Koester,Roger Little,Patrick Bender,Thomas Lehner,Yin Yao,Carolyn C. Compton,Jimmie B. Vaught,Sherilyn Sawyer,Nicole C. Lockhart,Joanne P. Demchok,Helen F. Moore +129 more
TL;DR: The Genotype-Tissue Expression (GTEx) project is described, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.