Chemoinformatics and Machine Learning Approaches for Identifying Antiviral Compounds.
Lijo John,Lijo John,Yarasi Soujanya,Yarasi Soujanya,Hridoy Jyoti Mahanta,Hridoy Jyoti Mahanta,G. Narahari Sastry,G. Narahari Sastry +7 more
15
TL;DR: In this paper, a set of 2358 antiviral compounds were compiled from the CAS COVID-19 antiviral SAR dataset whose activity was reported based on IC50 value, and the most highly correlated descriptors were selected using Tree-based, Correlation-based and Mutual information-based feature selection methods.
read more
Abstract: Current pandemics propelled research efforts in unprecedented fashion, primarily triggering computational efforts towards new vaccine and drug development as well as drug repurposing. There is an urgent need to design novel drugs with targeted biological activity and minimum adverse reactions that may be useful to manage viral outbreaks. Hence an attempt has been made to develop Machine Learning based predictive models that can be used to assess whether a compound has the potency to be antiviral or not. To this end, a set of 2358 antiviral compounds were compiled from the CAS COVID-19 antiviral SAR dataset whose activity was reported based on IC50 value. A total 1157 two-dimensional molecular descriptors were computed among which, the most highly correlated descriptors were selected using Tree-based, Correlation-based and Mutual information-based feature selection methods. Seven Machine Learning algorithms i. e., Random Forest, XGBoost, Support Vector Machine, KNN, Decision Tree, MLP Classifier and Logistic Regression were benchmarked. The best performance was achieved by the models developed using Random Forest and XGBoost algorithms in all the feature selection methods. The maximum predictive accuracy of both these models was 88 % with internal validation. Whereas, with an external dataset, a maximum accuracy of 93.10 % for XGBoost and 100 % for Random Forest based model was achievable. Furthermore, the study demonstrated scaffold analysis of the molecules as a pragmatic approach to explore the importance of structurally diverse compounds in data driven studies.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Artificial intelligence in virtual screening: models versus experiments.
TL;DR: A review of machine learning and deep learning-based scoring functions for solving classification and ranking problems in drug discovery can be found in this paper , where the authors highlight studies in which ML and DL models were successfully deployed to identify lead compounds for which the experimental validations are available from bioassay studies.
96
Machine learning based dynamic consensus model for predicting blood-brain barrier permeability
TL;DR: In this article , machine learning and deep learning-based predictive models were built using XGboost, Random Forest, Extra-tree classifiers and deep neural network for predicting BBB permeability.
18
OSADHI - An online structural and analytics based database for herbs of India
K. Kiewhuo,Dipshikha Gogoi,Hridoy Jyoti Mahanta,Ravindra K. Rawal,Debabrata Das,S. Vaikundamani,Esther Jamir,G. Narahari Sastry +7 more
TL;DR: A PAN India database of medicinal plants along with their phytochemicals and geographical availability has been developed by as mentioned in this paper , which consists of 6959 unique medicinal plants belonging to 348 families which are available across 28 states and 8 union territories of India.
11
Towards systematic exploration of chemical space: building the fragment library module in molecular property diagnostic suite
Anamika Singh Gaur,Lijo John,Nandan Kumar,M Ram Vivek,Selvaraman Nagamani,Hridoy Jyoti Mahanta,G. Narahari Sastry +6 more
TL;DR: In this paper , a fragment-based drug discovery (FBDD) approach has traditionally been of utmost significance in drug design studies, which allows the exploration of large chemical space to find novel scaffolds and chemotypes which can be improved into selective inhibitors with good affinity.
North East India medicinal plants database (NEI-MPDB)
K. Kiewhuo,Dipshikha Gogoi,Hridoy Jyoti Mahanta,Ravindra K. Rawal,Debabrata Das,G. Narahari Sastry +5 more
TL;DR: In this article , a comprehensive resource of the medicinal plants with a quantitative analysis of the phytochemicals which can enhance knowledge on therapeutic indications and contribute in drug discovery and development is presented.
10
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
•Journal Article
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +15 more
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Support-Vector Networks
Corinna Cortes,Vladimir Vapnik +1 more
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Induction of Decision Trees
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
DrugBank 5.0: a major update to the DrugBank database for 2018
David S. Wishart,Yannick Djoumbou Feunang,An Chi Guo,Elvis J. Lo,Ana Marcu,Jason R. Grant,Tanvir Sajed,Daniel Johnson,Carin Li,Zinat Sayeeda,Nazanin Assempour,Ithayavani Iynkkaran,Yifeng Liu,Adam Maciejewski,Nicola Gale,Alex Wilson,Lucy Chin,Ryan Cummings,Diana Le,Allison Pon,Craig Knox,Michael Wilson +21 more
TL;DR: This year’s update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years and significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions.
7.6K