Open AccessProceedings Article
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development
Kexin Huang,Tianfan Fu,Wenhao Gao,Yue Zhao,Yusuf H. Roohani,Jure Leskovec,Connor W. Coley,Cao Xiao,Jimeng Sun,Marinka Zitnik +9 more
- 18 Feb 2021
TL;DR: The Therapeutics Data Commons (TDC) as mentioned in this paper is an open-science platform to systematically access and evaluate machine learning across the entire range of therapeutics and includes 66 AI-ready datasets spread across 22 learning tasks.
read more
Abstract: Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics. To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools and community resources, including 33 data functions and types of meaningful data splits, 23 strategies for systematic model evaluation, 17 molecule generation oracles, and 29 public leaderboards. All resources are integrated and accessible via an open Python library. We carry out extensive experiments on selected datasets, demonstrating that even the strongest algorithms fall short of solving key therapeutics challenges, including real dataset distributional shifts, multi-scale modeling of heterogeneous data, and robust generalization to novel data points. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation. TDC is an open-science initiative available at this https URL.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Integrating structure-based approaches in generative molecular design.
TL;DR: In this paper , the authors focus on recent approaches that incorporate protein structure into de novo molecule optimization in an attempt to maximize the predicted on-target binding affinity of generated molecules.
43
A Systematic Survey of Chemical Pre-trained Models
Jun Xia,Yanqiao Zhu,Yuanqi Du,Stan Z. Li +3 more
- 01 Aug 2023
TL;DR: A systematic survey of chemical pre-trained models summarizing the current progress and challenges in the field.
BiComp-DTA: Drug-target binding affinity prediction through complementary biological-related and compression-based featurization approach
Mahmood Kalemati,Somayyeh Koohi +1 more
TL;DR: BiComp-DTA as discussed by the authors employs Normalized Compression Distance and Smith-Waterman measures for capturing complementary information from the algorithmic information theory and biological domains, and utilizes the proposed measure to encode the input proteins feeding a new deep neural network-based method for drug-target binding affinity prediction.
Quantum Machine Learning Predicting ADME-Tox Properties in Drug Discovery.
TL;DR: A quantum machine learning framework consisting of a classical support vector classifier algorithm with a kernel-based quantum classifier and the quantum model achieved the best performance as compared to classical counterparts in terms of the area under the curve of the receiver operating characteristic curve.
15
COATI: Multimodal Contrastive Pretraining for Representing and Traversing Chemical Space.
Benjamin Kaufman,Edward C Williams,Carl Underkoffler,Ryan Pederson,Narbe Mardirossian,Ian Watson,John Parkhill +6 more
TL;DR: This work presents contrastive optimization for accelerated therapeutic inference (COATI), a pretrained, multimodal encoder-decoder model of druglike chemical space that possesses many of the desired properties of universal molecular embedding.
11
Related Papers (5)
Adriana Tomic,Adriana Tomic,Ivan Tomic,Levi Waldron,Ludwig Geistlinger,Max Kuhn,Rachel L. Spreng,Lindsay C. Dahora,Kelly E. Seaton,Georgia D. Tomaras,Jennifer Hill,Niharika A. Duggal,Ross D. Pollock,Norman R. Lazarus,Stephen D. R. Harridge,Janet M. Lord,Janet M. Lord,Purvesh Khatri,Andrew J. Pollard,Mark M. Davis,Mark M. Davis +20 more
- 08 Jan 2021