Knowledge-guided data mining on the standardized architecture of NRPS: subtypes, novel motifs, and sequence entanglements
Vanessa Opassinis
- 17 Mar 2022
2
TL;DR: In this article , the motif-and-intermotif standardization of NRPS sequences was proposed to provide a consistent standard for annotating NRPS domains and modules, which has made data-driven discoveries challenging.
read more
Abstract: Abstract Non-ribosomal peptide synthetase (NRPS) is a diverse family of biosynthetic enzymes for the assembly of bioactive peptides. Despite advances in microbial sequencing, the lack of a consistent standard for annotating NRPS domains and modules has made data-driven discoveries challenging. To address this, we introduced a standardized architecture for NRPS, by using known conserved motifs to partition typical domains. This motif-and-intermotif standardization allowed for systematic evaluations of sequence properties from a large number of NRPS pathways, resulting in the most comprehensive cross-kingdom C domain subtype classifications to date, as well as the discovery and experimental validation of novel conserved motifs with functional significance. Furthermore, our coevolution analysis revealed important barriers associated with reengineering NRPSs and uncovered the entanglement between phylogeny and substrate specificity in NRPS sequences. Our findings provide a comprehensive and statistically insightful analysis of NRPS sequences, opening avenues for future data-driven discoveries. Author Summary NRPS, a gigantic enzyme that produces diverse microbial secondary metabolites, provides a rich source for important medical products including antibiotics. Despite the extensive knowledge gained about its structure and the large amount of sequencing data available, the frequent failure of reengineering NRPS in synthetic biology highlights the fact that much is still unknown. In this work, we applied existing knowledge to data mining of NRPS sequences, using well-known conserved motifs to partition NRPS sequences into motif-intermotif architectures. This standardization allows for integrating large amounts of sequences from different sources, providing a comprehensive overview of NRPSs across different kingdoms. Our findings included new C domain subtypes, novel conserved motifs with implication in structural flexibility, and insights into why NRPSs are so difficult to reengineer. To facilitate researchers in related fields, we constructed an online platform “NRPS Motif Finder” for parsing the motif-and-intermotif architecture and C domain subtype classification ( http://www.bdainformatics.org/page?type=NRPSMotifFinder ). We believe that this knowledge-guided approach not only advances our understanding of NRPSs but also provides a useful methodology for data mining in large-scale biological sequences.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Biosynthetic diversification of peptaibol mediates fungus-mycohost interactions
Jie Fan,Jinwei Ren,Ruolin He,Penglin Wei,Yuanyuan Li,Wei Liu,Dawei Chen,Irina S. Druzhinina,Zhiyuan Li,Wen-Bing Yin +9 more
TL;DR: This study elucidate fungus-mycohost specific interactions mediated by a family of polypeptides, i.e., peptaibols, and provides insights into the role of metabolic diversity of biosynthetic pathways in interfungal interactions.
4
Forging the Iron-Net: Towards a Quantitative Understanding of Microbial Communities via Siderophore-Mediated Interactions
Shaohua Gu,Ruolin He,Gengyan Xiong,Zhaoliang Qu,Yuanzhe Shao,Li Yu,Di Zhang,Fanhao Wang,Ruichen Xu,Pei Guo,Niu Xi,Yinxiang Li,Yongning Wu,Wei Wang,Zhiyuan Li +14 more
- 11 Jul 2024
TL;DR: Researchers propose constructing an "iron-net" to understand microbial communities' iron interactions, leveraging siderophores and machine learning to manipulate microbiota, with potential applications in medicine, agriculture, and ecology.
1
References
Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega
Fabian Sievers,Andreas Wilm,David Dineen,Toby J. Gibson,Kevin Karplus,Weizhong Li,Rodrigo Lopez,Hamish McWilliam,Michael Remmert,Johannes Söding,Julie D. Thompson,Desmond G. Higgins +11 more
TL;DR: A new program called Clustal Omega is described, which can align virtually any number of protein sequences quickly and that delivers accurate alignments, and which outperforms other packages in terms of execution time and quality.
The Phyre2 web portal for protein modeling, prediction and analysis
Lawrence A. Kelley,Stefans Mezulis,Christopher M. Yates,Christopher M. Yates,Mark N. Wass,Mark N. Wass,Michael J.E. Sternberg +6 more
TL;DR: An updated protocol for Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants for a user's protein sequence.
UniProt: the universal protein knowledgebase in 2021
Alex Bateman,Maria Jesus Martin,Sandra Orchard,Michele Magrane,Rahat Agivetova,Shadab Ahmad,Emanuele Alpi,Emily H Bowler-Barnett,Ramona Britto,Borisas Bursteinas,Hema Bye-A-Jee,Ray Coetzee,Austra Cukura,Alan Wilter Sousa da Silva,Paul Denny,Tunca Doğan,ThankGod Ebenezer,Jun Fan,Leyla Jael Garcia Castro,Penelope Garmiri,George Georghiou,Leonardo Gonzales,Emma Hatton-Ellis,Abdulrahman Hussein,Alexandr Ignatchenko,Giuseppe Insana,Rizwan Ishtiaq,Petteri Jokinen,Vishal Joshi,Dushyanth Jyothi,Antonia Lock,Rodrigo Lopez,Aurelien Luciani,Jie Luo,Yvonne Lussi,Alistair MacDougall,Fábio Madeira,Mahdi Mahmoudy,Manuela Menchi,Alok Mishra,Katie Moulang,Andrew Nightingale,Carla Susana Oliveira,Sangya Pundir,Guoying Qi,Shriya Raj,Daniel Rice,Milagros Rodriguez Lopez,Rabie Saidi,Joseph Sampson,Tony Sawford,Elena Speretta,Edward Turner,Nidhi Tyagi,Preethi Vasudev,Vladimir Volynkin,Kate Warner,Xavier Watkins,Rossana Zaru,Hermann Zellner,Alan Bridge,Sylvain Poux,Nicole Redaschi,Lucila Aimo,Ghislaine Argoud-Puy,Andrea H. Auchincloss,Kristian B. Axelsen,Parit Bansal,Delphine Baratin,Marie-Claude Blatter,Jerven Bolleman,Emmanuel Boutet,Lionel Breuza,Cristina Casals-Casas,Edouard de Castro,Kamal Chikh Echioukh,Elisabeth Coudert,Béatrice A. Cuche,M Doche,Dolnide Dornevil,Anne Estreicher,Maria Livia Famiglietti,Marc Feuermann,Elisabeth Gasteiger,Sebastien Gehant,Vivienne Baillie Gerritsen,Arnaud Gos,Nadine Gruaz-Gumowski,Ursula Hinz,Chantal Hulo,Nevila Hyka-Nouspikel,Florence Jungo,Guillaume Keller,Arnaud Kerhornou,Vicente Lara,Philippe Le Mercier,Damien Lieberherr,Thierry Lombardot,Xavier D. Martin,Patrick Masson,Anne Morgat,Teresa Batista Neto,Salvo Paesano,Ivo Pedruzzi,Sandrine Pilbout,Lucille Pourcel,Monica Pozzato,Manuela Pruess,Catherine Rivoire,Christian J. A. Sigrist,K Sonesson,Andre Stutz,Shyamala Sundaram,Michael Tognolli,Laure Verbregue,Cathy H. Wu,Cecilia N. Arighi,Leslie Arminski,Chuming Chen,Yongxing Chen,John S. Garavelli,Hongzhan Huang,Kati Laiho,Peter B. McGarvey,Darren A. Natale,Karen E. Ross,C. R. Vinayaka,Qinghua Wang,Yuqi Wang,Lai-Su L. Yeh,Jian Zhang,Patrick Ruch,Douglas Teodoro +132 more
TL;DR: The UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal and a credit-based publication submission interface was developed.
•Book
Accelerated Profile HMM Searches
Sean R. Eddy
- 01 May 2015
TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.
Pfam: The protein families database in 2021.
Jaina Mistry,Sara Chuguransky,Lowri Williams,Matloob Qureshi,Gustavo A. Salazar,Erik L. L. Sonnhammer,Silvio C. E. Tosatto,Lisanna Paladin,Shriya Raj,Lorna Richardson,Robert D. Finn,Alex Bateman +11 more
TL;DR: The Pfam database is a widely used resource for classifying protein sequences into families and domains and the reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family.
5.6K