Abstract: In ligand-based drug
design, quantitative structure–activity
relationship (QSAR) models play an important role in activity prediction.
One of the major end points of QSAR models is half-maximal inhibitory
concentration (IC<sub>50</sub>). Experimental IC<sub>50</sub> data
from various research groups have been accumulated in publicly accessible
databases, providing an opportunity for us to use such data in predictive
QSAR models. In this study, we focused on using a ranking-oriented
QSAR model as a predictive model because relative potency strength
within the same assay is solid information that is not based on any
mechanical assumptions. We conducted rigorous validation using the
ChEMBL database and previously reported data sets. Ranking support
vector machine (ranking-SVM) models trained on compounds from similar
assays were as good as support vector regression (SVR) with the Tanimoto
kernel trained on compounds from all the assays. As effective ways
of data integration, for ranking-SVM, integrated compounds should
be selected from only similar assays in terms of compounds. For SVR
with the Tanimoto kernel, entire compounds from different assays can
be incorporated.
Abstract: Organic small molecules are proven to be capable of passivating
the bulk/interfacial defects in inorganic perovskite solar cells.
Considering the burdensome situation to screen the functional small
molecules, we employ a modified machine learning (ML) strategy to
guide screening suitable small molecules toward efficient solar cells
through three modified ML algorithms to construct the prediction model:
(i) random forest algorithm (RF), (ii) support vector machine algorithm
(SVR), and (iii) XGBoost. Among them, the XGBoost algorithm displays
a better overall predictive performance, whereby the <i>R</i><sup>2</sup> index reaches 0.939. Accordingly, eight small molecules
are selected to modify the interface of perovskite films, and both
the theoretical and experimental results certify that the difluorobenzylamine
with additional fluorine atoms has a better interface modification
effect among the small molecules containing functional groups, e.g.,
the benzene ring and amino group. The high accuracy of the modified
machine learning model enables us to simplify the small-molecule screening
process and form an important step for ongoing developments in perovskite
solar cells and other optoelectronic devices.
Abstract: The amalgamation of surface-enhanced Raman spectroscopy
(SERS)
and machine learning (ML) presents a discerning capability to differentiate
among a diverse spectrum of bacterial strains. However, addressing
the challenge of achieving expeditious and robust bacterial detection
remains a prominent focal point. This study delineates a comprehensive
bacterial classification and identification methodology grounded in
the amoxicillin response, which effectively categorizes bacteria utilizing
SERS and ML within a time frame of less than 20 min. The bacterial
specimens are subjected to pharmacological stimulation, inducing the
release of purine molecules that are integral to metabolic processes.
Capitalizing on the preferential entry of these molecules into SERS
hot spots over the bacteria themselves facilitates the consistent
acquisition of stable SERS signals. Experimental evidence demonstrates
that the interaction of S. aureus, E. coli, S. epidermidis, C. albicans, and K. pneumoniae with amoxicillin contributes to an
enhancement in the stability and signal intensity of bacterial SERS.
Utilizing a random forest (RF) model on pure bacterial samples yields
an exemplary classification accuracy of 99%. Furthermore, the application
of three distinct models, support vector machine (SVM), RF, and CNN-LSTM-Attention
(CLA) in the analysis of clinical samples culminates in final classification
accuracies of 92%, 87%, and 96%, respectively. This approach establishes
a rapid, straightforward, and stable classification methodology for
SERS-based bacterial detection, demonstrating significant potential
for clinical diagnostic applications.
Abstract: Understanding the phase stability of gas hydrates under
confinement
is fundamental to the geological stability evolutions of gas hydrate
systems on Earth. Herein, the phase stability of CH<sub>4</sub> and
CO<sub>2</sub> hydrates under confinement is predicted by machine
learning. Three machine learning models, including support vector
machine, random forest, and gradient boosting decision tree, are constructed
to predict the phase stability of CH<sub>4</sub> and CO<sub>2</sub> hydrates under confinement. Our machine learning results show that
the prediction accuracy of the support vector machine model is highest,
yet the prediction accuracy of the random forest model is lowest among
those machine learning models in determining the phase stability of
confined gas hydrates. Based on their performance in predicting the
phase stability of confined gas hydrates, the support vector machine
model with a training set fraction of 0.7 is finally chosen to deal
with the unknown phase stability of confined gas hydrates. Importantly,
the average accuracy of the support vector machine model can reach
more than 90% in predicting the unknown phase stability of both CH<sub>4</sub> and CO<sub>2</sub> hydrates. The trained machine learning
models can help us to quickly and accurately determine the phase stability
of CH<sub>4</sub> and CO<sub>2</sub> hydrates under confinement in
future applications.
TL;DR: This study proposes machine learning models for rapid screening of new psychoactive substances (NPS) using mass spectrometric data, achieving an F1 score of 0.35-0.97, and successfully identifying six seizures of NPS.
Abstract: Over
the past few years, new psychoactive substances (NPS) have
become a global health and social problem because of their wide variety,
constant structural renewal, vague legal definitions, and rapid adaptation
to legal restrictions. The rapid structural modifications of NPS have
posed significant challenges for the screening and identification
of these new substances using traditional mass spectrometric techniques
based on reference substances or a mass spectral database. Here, we
propose supervised machine learning (ML) classification models such
as k-nearest neighbors, support vector machine, random forest, and
multigrained cascade forest for the rapid screening of NPS using mass
spectrometric data. This approach utilizes ML methods to learn the
statistical probability distributions of mass spectral data for NPS
and non-NPS. Four classification ML models were generated and evaluated
using a data set comprising 567 LC-MS and 732 GC-MS spectra. Through
cross validation, we achieved an F1 score of 0.35–0.97. These
algorithms were applied in conjunction with mass spectrometry techniques
for the detection of six seizures including electronic cigarette oil
and suspected powdered substances netted in drug trafficking cases.
The models provided warning signals for synthetic cannabinoids, synthetic
cathinones, and fentanyl. Thus, an early warning system was successfully
established, which provided a useful method for reliable and effective
identifications of unknown NPS.