Journal Article10.1021/CI100253R
Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set.
Iurii Sushko,Sergii Novotarskyi,Robert Körner,Anil Kumar Pandey,Artem Cherkasov,Jiazhong Li,Paola Gramatica,Katja Hansen,Timon Schroeter,Klaus-Robert Müller,Lili Xi,Huanxiang Liu,Xiaojun Yao,Tomas Öberg,Farhad Hormozdiari,Phuong Dao,Cenk Sahinalp,Roberto Todeschini,Pavel G. Polishchuk,A. Artemenko,Victor E. Kuz’min,Todd M. Martin,Douglas M. Young,Denis Fourches,Eugene N. Muratov,Alexander Tropsha,Igor I. Baskin,Dragos Horvath,Gilles Marcou,Christophe Muller,A. Varnek,Volodymyr V. Prokopenko,Igor V. Tetko +32 more
240
TL;DR: This work demonstrates that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs and can be used to halve the cost of experimental measurements by providing a similar prediction accuracy.
read more
Abstract: The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of “distance to model” (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30−60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of exp...
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Extended Functional Groups (EFG): An Efficient Set for Chemical Characterization and Structure-Activity Relationship Studies of Chemical Compounds
TL;DR: An extension of a set previously used by the CheckMol software that covers in addition heterocyclic compound classes and periodic table groups is described, which demonstrates that EFG can be efficiently used to develop and interpret structure-activity relationship models.
1K
Deep Learning in Drug Discovery.
TL;DR: An overview of this emerging field of molecular informatics, the basic concepts of prominent deep learning methods are presented, and motivation to explore these techniques for their usefulness in computer‐assisted drug discovery and design is offered.
691
Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.
TL;DR: Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building, and gives an R(2) that is more like that of true prospective prediction than the R(1) from random selection or from the analog of leave-class-out selection.
283
ToxAlerts: a Web server of structural alerts for toxic chemicals and compounds with potential adverse reactions.
TL;DR: A Web-based platform for collecting and storing toxicological structural alerts from literature and for virtual screening of chemical libraries to flag potentially toxic chemicals and compounds that can cause adverse side effects is presented.
241
Computational Approaches in Preclinical Studies on Drug Discovery and Development.
Fengxu Wu,Yuquan Zhou,Langhui Li,Xianhuan Shen,Ganying Chen,Xiaoqing Wang,Xianyang Liang,Mengyuan Tan,Zunnan Huang +8 more
TL;DR: A systematic classification and description of the databases and software commonly used for ADMET prediction and some applications that are related to the prediction categories and web tools are listed.
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
•Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
- 01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Statistical learning theory
Vladimir Vapnik
- 01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
30.4K
•Book
Data Mining: Practical Machine Learning Tools and Techniques
Ian H. Witten,Eibe Frank,Mark Hall +2 more
- 25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
25.4K
Related Papers (5)
[...]
Leo Breiman
- 01 Oct 2001
Iurii Sushko,Sergii Novotarskyi,Robert Körner,Anil Kumar Pandey,Matthias Rupp,Wolfram Teetz,Stefan Brandmaier,Ahmed Abdelaziz,Volodymyr V. Prokopenko,Vsevolod Yu. Tanchuk,Roberto Todeschini,Alexandre Varnek,Gilles Marcou,Peter Ertl,Vladimir Potemkin,Maria Grishina,Johann Gasteiger,Christof H. Schwab,Igor I. Baskin,Vladimir A. Palyulin,Eugene V. Radchenko,William J. Welsh,Vladyslav Kholodovych,Dmitriy Chekmarev,Artem Cherkasov,João Aires-de-Sousa,Qingyou Zhang,Andreas Bender,Florian Nigsch,Luc Patiny,Antony J. Williams,Valery Tkachenko,Igor V. Tetko +32 more