ColabFold: making protein folding accessible to all
TL;DR: ColabFold as discussed by the authors combines the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold for protein folding and achieves 40-60fold faster search and optimized model utilization.
read more
Abstract: ColabFold offers accelerated prediction of protein structures and complexes by combining the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold's 40-60-fold faster search and optimized model utilization enables prediction of close to 1,000 structures per day on a server with one graphics processing unit. Coupled with Google Colaboratory, ColabFold becomes a free and accessible platform for protein folding. ColabFold is open-source software available at https://github.com/sokrypton/ColabFold and its novel environmental databases are available at https://colabfold.mmseqs.com .
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Evolutionary-scale prediction of atomic level protein structure with a language model
Zeming Lin,Halil Akin,Roshan Ara Rao,Brian Hie,Zhong-li Zhu,Wenting Lu,Nikita Smetanin,Robert Verkuil,Ori Kabeli,Yaniv Shmueli,Allan dos Santos Costa,Maryam Fazel-Zarandi,Tom Sercu,Salvatore Candido,Alexander Rives +14 more
TL;DR: The ESM Metage-nomic Atlas as discussed by the authors is the first large-scale structural characterization of metagenomic proteins, with more than 617 million structures, including more than 225 million high confidence predictions.
2.2K
UCSF ChimeraX: Tools for Structure Building and Analysis.
Elaine C Meng,Thomas D. Goddard,E. Pettersen,Gregory S. Couch,Zach J Pearson,John H. Morris,T. Ferrin +6 more
TL;DR: New methods in the UCSF ChimeraX molecular modeling package are described that take advantage of machine‐learning structure predictions, provide likelihood‐based fitting in maps, and compute per‐residue scores to identify modeling errors.
1.1K
Harnessing protein folding neural networks for peptide–protein docking
TL;DR: For example, AlphaFold2 as discussed by the authors generates peptide-protein complex models without requiring multiple sequence alignment information for the peptide partner, and can handle binding-induced conformational changes of the receptor.
Scientific discovery in the age of artificial intelligence
Hanchen Wang,Tianfan Fu,Yuanqi Du,Wenhao Gao,Kexin Huang,Ziming Liu,Payal Chandak,Shengchao Liu,Peter Van Katwyk,A Deac,Animashree Anandkumar,Karianne J. Bergen,Carla Gomes,Shirley Ho,Pushmeet Kohli,L. Lasenby,Jure Leskovec,Tie-Yan Liu,Arjun K. Manrai,Debora Marks,Bharath Ramsundar,Le Song,Jimeng Sun,Jian Tang,Petar Veličković,Max Welling,Linfeng Zhang,Connor W. Coley,Yoshua Bengio,Marinka Zitnik +29 more
TL;DR: This work examines breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deeplearning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency.
696
Large language models generate functional protein sequences across diverse families
Ali Madani,Ben Krause,Eric R. Greene,Subu Subramanian,Benjamin P. Mohr,James M. Holton,Jose L. Olmos,Caiming Xiong,Zachary Z Sun,Richard Socher,James S. Fraser,Nikhil Naik +11 more
TL;DR: ProGen is described, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics.
600
References
Matplotlib: A 2D Graphics Environment
TL;DR: Matplotlib is a 2D graphics package used for Python for application development, interactive scripting, and publication-quality image generation across user interfaces and operating systems.
34.7K
Highly accurate protein structure prediction with AlphaFold
John M. Jumper,Richard O. Evans,Alexander Pritzel,Tim Green,Michael Figurnov,Olaf Ronneberger,Kathryn Tunyasuvunakool,Russell Bates,Augustin Žídek,Anna Potapenko,Alex Bridgland,Clemens Meyer,Simon A. A. Kohl,Andrew J. Ballard,Andrew Cowie,Bernardino Romera-Paredes,Stanislav Nikolov,R. D. Jain,Jonas Adler,Trevor Back,Stig Petersen,David Reiman,Ellen Clancy,Michal Zielinski,Martin Steinegger,Michalina Pacholska,Tamas Berghammer,Sebastian Bodenstein,David L. Silver,Oriol Vinyals,Andrew W. Senior,Koray Kavukcuoglu,Pushmeet Kohli,Demis Hassabis +33 more
TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.
•Book
Accelerated Profile HMM Searches
Sean R. Eddy
- 01 May 2015
TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.
Pfam: The protein families database in 2021.
Jaina Mistry,Sara Chuguransky,Lowri Williams,Matloob Qureshi,Gustavo A. Salazar,Erik L. L. Sonnhammer,Silvio C. E. Tosatto,Lisanna Paladin,Shriya Raj,Lorna Richardson,Robert D. Finn,Alex Bateman +11 more
TL;DR: The Pfam database is a widely used resource for classifying protein sequences into families and domains and the reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family.
5.6K