TL;DR: A novel approach combining traditional statistics and AI/ML techniques identifies 18 transcriptomic biomarkers for accurate CVD prediction with up to 96% accuracy.
Abstract: Abstract Personalized interventions are deemed vital given the intricate characteristics, advancement, inherent genetic composition, and diversity of cardiovascular diseases (CVDs). The appropriate utilization of artificial intelligence (AI) and machine learning (ML) methodologies can yield novel understandings of CVDs, enabling improved personalized treatments through predictive analysis and deep phenotyping. In this study, we proposed and employed a novel approach combining traditional statistics and a nexus of cutting-edge AI/ML techniques to identify significant biomarkers for our predictive engine by analyzing the complete transcriptome of CVD patients. After robust gene expression data pre-processing, we utilized three statistical tests (Pearson correlation, Chi-square test, and ANOVA) to assess the differences in transcriptomic expression and clinical characteristics between healthy individuals and CVD patients. Next, the recursive feature elimination classifier assigned rankings to transcriptomic features based on their relation to the case–control variable. The top ten percent of commonly observed significant biomarkers were evaluated using four unique ML classifiers (Random Forest, Support Vector Machine, Xtreme Gradient Boosting Decision Trees, and k-Nearest Neighbors). After optimizing hyperparameters, the ensembled models, which were implemented using a soft voting classifier, accurately differentiated between patients and healthy individuals. We have uncovered 18 transcriptomic biomarkers that are highly significant in the CVD population that were used to predict disease with up to 96% accuracy. Additionally, we cross-validated our results with clinical records collected from patients in our cohort. The identified biomarkers served as potential indicators for early detection of CVDs. With its successful implementation, our newly developed predictive engine provides a valuable framework for identifying patients with CVDs based on their biomarker profiles.
TL;DR: The fusion of finite element and machine learning methods accurately predicts rock shear strength parameters. The model utilizes finite element analysis software and machine learning algorithms to predict cohesion and friction angle. The results are in good agreement with laboratory data and offer a valuable reference for accurate parameter prediction.
Abstract: Abstract The trial-and-error method for calibrating rock mechanics parameters has the disadvantages in complexity, time-consuming and difficulty in ensuring accuracy. Harnessing the repeatability and scalability intrinsic to numerical simulation calculations and amalgamating them with the data-driven attributes of machine learning methods. The study utilised the finite element analysis software RS2 to establish 252 sets of sandstone sample data. The Recursive Feature Elimination and Cross-Validation (RFECV) method was employed for feature selection. The shear strength parameters of sandstone were predicted using machine learning models optimised by Particle Swarm Optimization (PSO) algorithm, including BP neural network (BP), Bayesian Ridge Regression (BRR), Support Vector Regression (SVR), and Light Gradient Boosting Machine (LightGBM). The predicted value of cohesion is proposed as the input feature to predict the friction angle. The results indicate that the optimal input characteristics for predicting cohesion are elastic modulus, Poisson's ratio, peak stress, and peak strain, while the optimal input characteristics for predicting friction angle are peak stress and cohesion. The PSO-SVR model demonstrates the best performance. The maximum error between the predicted values of cohesion and friction angle and the calculated results of RSData program is 3.5% and 4.31%, respectively. The finite element calculation is in good agreement with the stress-strain curve obtained in the laboratory. The sensitivity analysis indicates that SVR's prediction performance for cohesion and friction angle tends to be stable when the sample size is greater than 25. These results offer a valuable reference for accurately predicting rock mechanics parameters.
TL;DR: The proposed work aims to automate water quality estimation using AI and explain the significant parameters contributing to water potability. The work uses various ML models and XAI techniques to classify water quality and explain the reasoning behind the classification.
Abstract: The consumption of water constitutes the physical health of most of the living species and hence management of its purity and quality is extremely essential as contaminated water has to potential to create adverse health and environmental consequences. This creates the dire necessity to measure, control and monitor the quality of water. The primary contaminant present in water is Total Dissolved Solids (TDS), which is hard to filter out. There are various substances apart from mere solids such as potassium, sodium, chlorides, lead, nitrate, cadmium, arsenic and other pollutants. The proposed work aims to provide the automation of water quality estimation through Artificial Intelligence and uses Explainable Artificial Intelligence (XAI) for the explanation of the most significant parameters contributing towards the potability of water and the estimation of the impurities. XAI has the transparency and justifiability as a white-box model since the Machine Learning (ML) model is black-box and unable to describe the reasoning behind the ML classification. The proposed work uses various ML models such as Logistic Regression, Support Vector Machine (SVM), Gaussian Naive Bayes, Decision Tree (DT) and Random Forest (RF) to classify whether the water is drinkable. The various representations of XAI such as force plot, test patch, summary plot, dependency plot and decision plot generated in SHAPELY explainer explain the significant features, prediction score, feature importance and justification behind the water quality estimation. The RF classifier is selected for the explanation and yields optimum Accuracy and F1-Score of 0.9999, with Precision and Re-call of 0.9997 and 0.998 respectively. Thus, the work is an exploratory analysis of the estimation and management of water quality with indicators associated with their significance. This work is an emerging research at present with a vision of addressing the water quality for the future as well.
TL;DR: This study investigates the impact of varying train-test split ratios on machine learning model performance using the BraTS 2013 dataset, revealing significant variations in accuracies and emphasizing the need to strike a balance to avoid overfitting or underfitting.
Abstract: Artificial intelligence (AI) and machine learning (ML) aim to mimic human intelligence and enhance decision making processes across various fields. A key performance determinant in a ML model is the ratio between the training and testing dataset. This research investigates the impact of varying train-test split ratios on machine learning model performance and generalization capabilities using the BraTS 2013 dataset. Logistic regression, random forest, k nearest neighbors, and support vector machines were trained with split ratios ranging from 60:40 to 95:05. Findings reveal significant variations in accuracies across these ratios, emphasizing the critical need to strike a balance to avoid overfitting or underfitting. The study underscores the importance of selecting an optimal train-test split ratio that considers tradeoffs such as model performance metrics, statistical measures, and resource constraints. Ultimately, these insights contribute to a deeper understanding of how ratio selection impacts the effectiveness and reliability of machine learning applications across diverse fields.
TL;DR: Adaptive SV-Borderline SMOTE-SVM algorithm effectively addresses imbalanced data classification by generating reliable and diverse new samples based on the SV+ neighbors and their class distribution.
Abstract: In recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose an adaptive SV-Borderline SMOTE-SVM (Synthetic Minority Oversampling Technique-Support Vector Machine) algorithm, specifically designed to overcome the challenges associated with imbalanced data classification. The algorithm begins by mapping the dataset into the kernel space using SVM to identify the class boundary samples, known as support vectors (SVs). Subsequently, the neighbors of positive sample’s support vector (SV+) are calculated based on the kernel distance. Based on the class distribution of these neighbors, the SV+ samples are labeled as either “concave” or “convex”. Based on these labels, new samples are adaptively generated using two distinct calculation approaches for different labeled SV+ samples. To construct the SVM decision function without requiring the explicit expression of new samples in the kernel space, a Gram matrix is designed. Notably, all the processes ensure the credibility and reliability of the new samples. Additionally, the adaptive interpolation approach helps to ensure the security and diversity of new samples. Extensive experiments were conducted on a set of 50 KEEL datasets to evaluate the performance of our proposed method for imbalanced data classification. In experiments, our method achieved the highest G-mean score in 33 datasets and the highest F-values in 32 datasets. These results highlight the effectiveness and superiority of our proposed method compared to other approaches in addressing the challenges of imbalanced data classification.
TL;DR: Quantum support vector machines are affected by statistical uncertainty due to the probabilistic nature of quantum mechanics. The training algorithms are affected by this uncertainty, which has a major impact on their complexity. The complexity of training quantum support vector machines is analyzed in terms of the number of circuit evaluations and shown to be exponential compared to any known classical algorithm for certain data sets.
Abstract: Quantum support vector machines employ quantum circuits to define the kernel function. It has been shown that this approach offers a provable exponential speedup compared to any known classical algorithm for certain data sets. The training of such models corresponds to solving a convex optimization problem either via its primal or dual formulation. Due to the probabilistic nature of quantum mechanics, the training algorithms are affected by statistical uncertainty, which has a major impact on their complexity. We show that the dual problem can be solved in O(M4.67/ε2) quantum circuit evaluations, where M denotes the size of the data set and ε the solution accuracy compared to the ideal result from exact expectation values, which is only obtainable in theory. We prove under an empirically motivated assumption that the kernelized primal problem can alternatively be solved in O(min{M2/ε6,1/ε10}) evaluations by employing a generalization of a known classical algorithm called Pegasos. Accompanying empirical results demonstrate these analytical complexities to be essentially tight. In addition, we investigate a variational approximation to quantum support vector machines and show that their heuristic training achieves considerably better scaling in our experiments.
TL;DR: A single-step multiclass SVM based on quantum annealing for remote sensing data classification achieves comparable accuracy to standard SVM methods while scaling much more efficiently with the number of training examples.
Abstract: In recent years, the development of quantum annealers has enabled experimental demonstrations and has increased research interest in applications of quantum annealing, such as in quantum machine learning and in particular for the popular quantum Support Vector Machine (SVM). Several versions of the quantum SVM have been proposed, and quantum annealing has been shown to be effective in them. Extensions to multiclass problems have also been made, which consist of an ensemble of multiple binary classifiers. This work proposes a novel quantum SVM formulation for direct multiclass classification based on quantum annealing, called Quantum Multiclass SVM (QMSVM). The multiclass classification problem is formulated as a single quadratic unconstrained binary optimization problem solved with quantum annealing. The main objective of this work is to evaluate the feasibility, accuracy, and time performance of this approach. Experiments have been performed on the D-Wave Advantage quantum annealer for a classification problem on remote sensing data. Results indicate that, despite the memory demands of the quantum annealer, QMSVM can achieve an accuracy that is comparable to standard SVM methods, such as the one-versus-one (OVO), depending on the dataset (compared to OVO: 0.8663 vs 0.8598 on Toulouse, 0.8123 vs 0.8521 on Potsdam). More importantly, it scales much more efficiently with the number of training examples, resulting in nearly constant time (compared to OVO: 85.72s vs 248.02s on Toulouse, 58.89s vs 580.17s on Potsdam). This work shows an approach for bringing together classical and quantum computation, solving practical problems in remote sensing with current hardware.
TL;DR: Machine learning models effectively predict bed sediment load based on hydraulic, hydrological, and sedimentary factors. The models exhibit superior performance compared to other methods, with low RMSE and high R2 values.
Abstract: Abstract The intricate calculation of bed sediment load (BSL), which is influenced by hydraulic, hydrological, and sedimentary factors, is vital for informed decision-making in water resource management. Machine learning models, which are gaining popularity due to their accessibility and ability to reveal complex relationships, play a significant role in tackling these challenges. The efficacy of gene expression programming (GEP) models, support vector machines (SVMs), multi-layer perceptron (MLP), and multivariate adaptive regression splines (MARS) has been assessed through measured data of number 540 obtained from six rivers, namely Oak Creek, Nahal Yatir, Sagehen Creek, Elbow River, Jacoby River, and Goodwin Creek from 1954 to 1992. The assessment of model performance has been conducted utilizing root mean square error (RMSE), R2, Nash–Sutcliffe coefficient (NSE), and developed discrepancy ratio (DDR) as indices. Following data normalization within the range of 0–1, the data models underwent training and testing processes with a partition ratio of 80% for training and 20% for testing. Four dimensionless parameters, denoted as Fr = U/√gy, U/U*, Se, and ω = τU/γs√gyDs3, were employed as inputs in the models. The outcomes indicate that they exhibit superior performance compared to other methods, as evidenced by the following metrics in predicting BSL during the test stage: RMSE = 1.4088, NSE = 0.73054, R2 = 0.8729, and maximum QDDR(max) = 1.9564.
TL;DR: This study explores the use of machine learning for early diagnosis of Autism Spectrum Disorder (ASD) by evaluating various classification and clustering models. The study finds that diverse models achieve high accuracy in ASD detection and clustering, paving the way for a faster and more cost-effective diagnosis.
Abstract: Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process. We study eight state-of-the-art classification models to determine their effectiveness in ASD detection. We evaluate the models using accuracy, precision, recall, specificity, F1-score, area under the curve (AUC), kappa, and log loss metrics to find the best classifier for these binary datasets. Among all the classification models, for the children dataset, the SVM and LR models achieve the highest accuracy of 100% and for the adult dataset, the LR model produces the highest accuracy of 97.14%. Our proposed ANN model provides the highest accuracy of 94.24% for the new combined dataset when hyperparameters are precisely tuned for each model. As almost all classification models achieve high accuracy which utilize true labels, we become interested in delving into five popular clustering algorithms to understand model behavior in scenarios without true labels. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI and ARI metrics while demonstrating comparability to the optimal SC achieved by k-means. The implemented code is available at GitHub.
TL;DR: This study introduces PSSM-Sumo, a deep learning model that accurately predicts sumoylation sites using optimized features and a multi-layer DNN, achieving an exceptional average prediction accuracy of 98.71% through tenfold cross-validation.
Abstract: Post-translational modifications (PTMs) are fundamental to essential biological processes, exerting significant influence over gene expression, protein localization, stability, and genome replication. Sumoylation, a PTM involving the covalent addition of a chemical group to a specific protein sequence, profoundly impacts the functional diversity of proteins. Notably, identifying sumoylation sites has garnered significant attention due to their crucial roles in proteomic functions and their implications in various diseases, including Parkinson's and Alzheimer's. Despite the proposal of several computational models for identifying sumoylation sites, their effectiveness could be improved by the limitations associated with conventional learning methodologies. In this study, we introduce pseudo-position-specific scoring matrix (PsePSSM), a robust computational model designed for accurately predicting sumoylation sites using an optimized deep learning algorithm and efficient feature extraction techniques. Moreover, to streamline computational processes and eliminate irrelevant and noisy features, sequential forward selection using a support vector machine (SFS-SVM) is implemented to identify optimal features. The multi-layer Deep Neural Network (DNN) is a robust classifier, facilitating precise sumoylation site prediction. We meticulously assess the performance of PSSM-Sumo through a tenfold cross-validation approach, employing various statistical metrics such as the Matthews Correlation Coefficient (MCC), accuracy, sensitivity, specificity, and the Area under the ROC Curve (AUC). Comparative analyses reveal that PSSM-Sumo achieves an exceptional average prediction accuracy of 98.71%, surpassing existing models. The robustness and accuracy of the proposed model position it as a promising tool for advancing drug discovery and the diagnosis of diverse diseases linked to sumoylation sites.
TL;DR: SangerBox 2, an updated clinical bioinformatics platform, enhances functionalities with novel analytical tools, improved performance, and optimized graphics, facilitating efficient data analysis and research through random forests, support vector machines, and interactive plotting functions.
Abstract: Abstract In recent years, development in high‐throughput sequencing technologies has experienced an increasing application of statistics, pattern recognition, and machine learning in bioinformatics analyses. SangeBox platform to meet different scientific demands. The new version of Sangs is a widely used tool among many researchers, which encourages us to continuously improve the plerBox 2 ( http://vip.sangerbox.com ) and extends and optimizes the functions of interactive graphics and analysis of clinical bioinformatics data. We introduced novel analytical tools such as random forests and support vector machines, as well as corresponding plotting functions. At the same time, we also optimized the performance of the platform and fixed known problems to allow users to perform data analyses more quickly and efficiently. SangerBox 2 improved the speed of analysis, reduced resource required for computer performance, and provided more analysis methods, greatly promoting the research efficiency.
TL;DR: This study demonstrated that the combination of UAV-derived multispectral VIs and the SVM regression algorithm could be successfully applied to map the LCC of winter wheat for different species, growth stages, and nitrogen stress conditions, offering valuable insights for precision nitrogen fertilization management.
Abstract: The rapid and accurate estimation of leaf chlorophyll content (LCC), an important indicator of crop photosynthetic capacity and nutritional status, is of great significance for precise nitrogen fertilization management. To explore the existence of a versatile regression model that can be successfully used to estimate the LCC for different varieties under different growth stages and nitrogen stress conditions, a study was conducted in 2023 across the growing season for winter wheat with five species and five nitrogen application levels. Two machine learning regression algorithms, support vector machine (SVM) and random forest (RF), were used to establish the bridge between UAV-derived multispectral vegetation indices and ground truth LCC (relative chlorophyll content, SPAD), taking the multivariate linear regression (MLR) algorithm as a reference. The results show that the visible atmospherically resistant index, vegetative index, and normalized difference vegetation index had the highest correlation with ground truth LCC, with a Pearson’s correlation coefficient of 0.95. All three regression algorithms (MLR, RF, and SVM) performed well on the training dataset (R2: 0.932–0.944, RMSE: 3.96 to 4.37), but performed differently on validation datasets with different growth stages, species, and nitrogen application levels. Compared to winter wheat species and nitrogen application levels, the growth stages had the greatest influence on the generalization ability of LCC estimation models, especially for the dough stage. At the dough stage, compared to MLR and RF, SVM performed best, with R2 increasing by 0.27 and 0.10, respectively, and RMSE decreasing by 1.13 and 0.46, respectively. Overall, this study demonstrated that the combination of UAV-derived multispectral VIs and the SVM regression algorithm could be successfully applied to map the LCC of winter wheat for different species, growth stages, and nitrogen stress conditions. Ultimately, this research is significant as it shows the successful application of UAV data for mapping the LCC of winter wheat across diverse conditions, offering valuable insights for precision nitrogen fertilization management.
TL;DR: This study uses multi-sourced geospatial datasets to develop an advanced machine learning framework for flood hazard assessment in the Arambag region of West Bengal, India and finds that elevation, precipitation, and distance to rivers play the most crucial roles in the decision-making process for flood hazard assessment.
Abstract: Abstract Flooding is a major natural hazard worldwide, causing catastrophic damage to communities and infrastructure. Due to climate change exacerbating extreme weather events robust flood hazard modeling is crucial to support disaster resilience and adaptation. This study uses multi-sourced geospatial datasets to develop an advanced machine learning framework for flood hazard assessment in the Arambag region of West Bengal, India. The flood inventory was constructed through Sentinel-1 SAR analysis and global flood databases. Fifteen flood conditioning factors related to topography, land cover, soil, rainfall, proximity, and demographics were incorporated. Rigorous training and testing of diverse machine learning models, including RF, AdaBoost, rFerns, XGB, DeepBoost, GBM, SDA, BAM, monmlp, and MARS algorithms, were undertaken for categorical flood hazard mapping. Model optimization was achieved through statistical feature selection techniques. Accuracy metrics and advanced model interpretability methods like SHAP and Boruta were implemented to evaluate predictive performance. According to the area under the receiver operating characteristic curve (AUC), the prediction accuracy of the models performed was around > 80%. RF achieves an AUC of 0.847 at resampling factor 5, indicating strong discriminative performance. AdaBoost also consistently exhibits good discriminative ability, with AUC values of 0.839 at resampling factor 10. Boruta and SHAP analysis indicated precipitation and elevation as factors most significantly contributing to flood hazard assessment in the study area. Most of the machine learning models pointed out southern portions of the study area as highly susceptible areas. On average, from 17.2 to 18.6% of the study area is highly susceptible to flood hazards. In the feature selection analysis, various nature-inspired algorithms identified the selected input parameters for flood hazard assessment, i.e., elevation, precipitation, distance to rivers, TWI, geomorphology, lithology, TRI, slope, soil type, curvature, NDVI, distance to roads, and gMIS. As per the Boruta and SHAP analyses, it was found that elevation, precipitation, and distance to rivers play the most crucial roles in the decision-making process for flood hazard assessment. The results indicated that the majority of the building footprints (15.27%) are at high and very high risk, followed by those at very low risk (43.80%), low risk (24.30%), and moderate risk (16.63%). Similarly, the cropland area affected by flooding in this region is categorized into five risk classes: very high (16.85%), high (17.28%), moderate (16.07%), low (16.51%), and very low (33.29%). However, this interdisciplinary study contributes significantly towards hydraulic and hydrological modeling for flood hazard management.
TL;DR: A novel SVM-SSA model for predicting suspended sediment load in rivers with high accuracy and efficiency.
Abstract: Abstract Prediction of suspended sediment load (SSL) in streams is significant in hydrological modeling and water resources engineering. Development of a consistent and accurate sediment prediction model is highly necessary due to its difficulty and complexity in practice because sediment transportation is vastly non-linear and is governed by several variables like rainfall, strength of flow, and sediment supply. Artificial intelligence (AI) approaches have become prevalent in water resource engineering to solve multifaceted problems like sediment load modelling. The present work proposes a robust model incorporating support vector machine with a novel sparrow search algorithm (SVM-SSA) to compute SSL in Tilga, Jenapur, Jaraikela and Gomlai stations in Brahmani river basin, Odisha State, India. Five different scenarios are considered for model development. Performance assessment of developed model is analyzed on basis of mean absolute error (MAE), root mean squared error (RMSE), determination coefficient (R 2 ), and Nash–Sutcliffe efficiency (E NS ). The outcomes of SVM-SSA model are compared with three hybrid models, namely SVM-BOA (Butterfly optimization algorithm), SVM-GOA (Grasshopper optimization algorithm), SVM-BA (Bat algorithm), and benchmark SVM model. The findings revealed that SVM-SSA model successfully estimates SSL with high accuracy for scenario V with sediment (3-month lag) and discharge (current time-step and 3-month lag) as input than other alternatives with RMSE = 15.5287, MAE = 15.3926, and E NS = 0.96481. The conventional SVM model performed the worst in SSL prediction. Findings of this investigation tend to claim suitability of employed approach to model SSL in rivers precisely and reliably. The prediction model guarantees the precision of the forecasted outcomes while significantly decreasing the computing time expenditure, and the precision satisfies the demands of realistic engineering applications.
TL;DR: The proposed algorithm accurately predicts hotspot temperature changes in 3D chips with high accuracy and low computational cost. It can predict the variations in hotspot temperature of 3D chips up to 28 layers while changing modeling and cooling parameters.
Abstract: Parameter changes in the complex internal structure of multi-layer 3D stacked chips will greatly reduce the efficiency of modeling and thermal analysis. In this work, by combining thermal simulation analysis with machine learning algorithms, we can skillfully predict the hotspot temperature changes of 3D chips up to 28 layers while changing some key parameters of 3D chips with less computational effort. K-fold Cross Validation (K-CV) algorithm and support vector regression (SVR) algorithm were developed to predict the variations in hotspot temperature of the 3D chip while changing modeling and cooling parameters. Based on the training matrix, the support vector regression (SVR) model can accurately predict the random power distribution case, the random processor core location distribution case, and the random Through Silicon Via (TSV) distribution case. The validation results show that the prediction accuracy deviation is close to 0.6 K, and the correlation coefficient R2 is close to 1. Meanwhile, the variable parameter and variable layer prediction method based on the aforementioned training model can more accurately predict the hotspot temperature of the 3D chips with higher layers (28 layers) from the modeling analysis data of 3D chips with lower layers (4–5 layers). Its prediction deviation is less than 0.2%. The predicted data match the simulated numerical data quite well, indicating that the predictive algorithm can accurately feed the sample data set.
TL;DR: Machine learning is revolutionizing healthcare by enabling accurate disease prediction and diagnostics through the analysis of large datasets.
Abstract: The role of machine learning in health care in emerging times, the field of research is industry. In machine learning, there are various forms of learning, including supervised, unsupervised, and reinforcement learning. These strategies are necessary to discover previously unknown relationships in data that are beneficial to society. In predictive modeling, historical data are used to predict a result variable. The uses of machine learning in medical care are turning into a benefit for disease identification and diagnostics. The healthcare industry can benefit from machine learning's capacity to assist in the intelligent analysis of huge amounts of data. Different methods of machine learning, including supervised, unsupervised, and semi-supervised, reinforcement learning for health care, such as SVM, KNN, K-Mean clustering, neural network, and decision tree, provide varying levels of accuracy, precision, and sensitivity. The area of machine learning (ML) is on the rise. The purpose of machine learning is to automatically discover patterns and reason with data. ML offers tailored therapy-dubbed precision medicine. Health care has benefited from the application of machine learning approaches. Within a few years, machine learning will alter the healthcare industry.
TL;DR: By accurately decoding muscle activity, the developed systems can facilitate more intuitive and responsive robotic arm movements, contributing to the advancement of innovative solutions for individuals requiring prosthetic devices or undergoing rehabilitation, hence improving the quality of life for users.
Abstract: Signals play a fundamental role in science, technology, and communication by conveying information through varying patterns, amplitudes, and frequencies. This paper introduces innovative methodologies for processing electromyographic (EMG) signals to develop artificial intelligence systems capable of decoding muscle activity for controlling arm movements. The study investigates advanced signal processing techniques and machine learning classification algorithms using the GRABMyo dataset, aiming to enhance prosthetic control systems and rehabilitation technologies. A comprehensive analysis was conducted on signal processing techniques, including signal filtering and discrete wavelet transform (DWT), alongside a composite feature set comprising Mean Absolute Value (MAV), Waveform Length (WL), Zero Crossing (ZC), Slope Sign Changes (SSC), Root Mean Square (RMS), Enhanced Waveform Length (EWL), and Enhanced Mean Absolute Value (EMAV). These features, refined through Linear Discriminant Analysis (LDA) for dimensionality reduction, were classified using Support Vector Machine (SVM) algorithms. Signal filtering and DWT improved signal quality, facilitating better feature extraction, while the diverse feature set enhanced classification accuracy. LDA further improved accuracy by isolating the most informative features, and the SVM achieved optimal performance in decoding complex EMG patterns. Machine learning models, including K-Nearest Neighbor (KNN), Naïve Bayes (NB), and the SVM, were evaluated, with the SVM outperforming the others. The significance of these results lies in their potential applications in prosthetic control systems and rehabilitation technologies. By accurately decoding muscle activity, the developed systems can facilitate more intuitive and responsive robotic arm movements, contributing to the advancement of innovative solutions for individuals requiring prosthetic devices or undergoing rehabilitation, hence improving the quality of life for users. This research marks a significant step forward in the integration of advanced signal processing and machine learning in the field of EMG analysis.
TL;DR: Correlation-filter-based channel and feature selection framework for hybrid EEG-fNIRS BCI applications significantly enhances classification accuracy. ReliefF-based filter outperforms other filters with high accuracy of 94.77 ± 4.26%.
Abstract: The proposed study is based on a feature and channel selection strategy that uses correlation filters for brain-computer interface (BCI) applications using electroencephalography (EEG)-functional near-infrared spectroscopy (fNIRS) brain imaging modalities. The proposed approach fuses the complementary information of the two modalities to train the classifier. The channels most closely correlated with brain activity are extracted using a correlation-based connectivity matrix for fNIRS and EEG separately. Furthermore, the training vector is formed through the identification and fusion of the statistical features of both modalities (i.e., slope, skewness, maximum, skewness, mean, and kurtosis). The constructed fused feature vector is passed through various filters (including ReliefF, minimum redundancy maximum relevance, chi-square test, analysis of variance, and Kruskal-Wallis filters) to remove redundant information before training. Traditional classifiers such as neural networks, support-vector machines, linear discriminant analysis, and ensembles were used for the purpose of training and testing. A publicly available dataset with motor imagery information was used for validation of the proposed approach. Our findings indicate that the proposed correlation-filter-based channel and feature selection framework significantly enhances the classification accuracy of hybrid EEG-fNIRS. The ReliefF-based filter outperformed other filters with the ensemble classifier with a high accuracy of 94.77 ± 4.26%. The statistical analysis also validated the significance (p < 0.01) of the results. A comparison of the proposed framework with the prior findings was also presented. Our results show that the proposed approach can be used in future EEG-fNIRS-based hybrid BCI applications.
TL;DR: Three-way imbalanced learning based on fuzzy twin SVM effectively solves imbalanced classification problems by combining three-way decision with SVM and proposing a new three-way fuzzy membership function and a new fuzzy twin support vector machine with three-way membership (TWFTSVM).
Abstract: Three-way decision (3WD) is a powerful tool for granular computing to deal with uncertain data, commonly used in information systems, decision-making, and medical care. Three-way decision gets much research in traditional rough set models. However, three-way decision is rarely combined with the currently popular field of machine learning to expand its research. In this paper, three-way decision is connected with SVM, a standard binary classification model in machine learning, for solving imbalanced classification problems that SVM needs to improve. A new three-way fuzzy membership function and a new fuzzy twin support vector machine with three-way membership (TWFTSVM) are proposed. The new three-way fuzzy membership function is defined to increase the certainty of uncertain data in both input space and feature space, which assigns higher fuzzy membership to minority samples compared with majority samples. To evaluate the effectiveness of the proposed model, comparative experiments are designed for forty-seven different datasets with varying imbalance ratios. In addition, datasets with different imbalance ratios are derived from the same dataset to further assess the proposed model's performance. The results show that the proposed model significantly outperforms other traditional SVM-based methods.
TL;DR: This paper develops an anti-noise Quantum Support Vector Machine (QSVM) algorithm to classify noisy data in quantum computers, using a weight factor in hinge loss and alternative iterative optimization, achieving efficient and stable accuracy in noisy data.
Abstract: Noisy data is ubiquitous in quantum computer, greatly affecting the performance of various algorithms. However, existing quantum support vector machine models are not equipped with anti-noise ability, and often deliver low performance when learning accurate hyperplane normal vectors from noisy data. To attack this issue, an anti-noise quantum support vector machine algorithm is developed in this paper. Specifically, a weight factor is first embedded into the hinge loss, so as to construct the objective function of anti-noise support vector machine. And then, an alternative iterative optimization strategy and a quantum circuit are designed for solving the objective function, aiming to obtain the normal vector and intercept of the hyperplane that finally divides the data. Finally, the classification and anti-noise effect of the algorithm are verified on artificial dataset and public dataset. Experimental results show that the proposed algorithm is efficient, yet maintains stable accuracy in noisy data.
Abstract: The ability to accurately classify land use/cover (LULC) is critical for environmental monitoring and land use planning. This study compares three machine learning algorithms: Artificial Neural Network (ANN), Support Vector Machine (SVM), and Random Forest (RF) for LULC classification using Google Earth images from the years 2006, 2014, and 2022. The objective of this study is to evaluate and identify the best classifier for LULC classification and change detection. Four LULC categories (Built-up, Open area, Farmland, and Agroforestry) were identified. The evaluation criteria included overall accuracy, kappa coefficient, producer's accuracy, user's accuracy, computing time, algorithm stability, and visual quality. The results showed that the RF algorithm outperformed both SVM and ANN algorithms with an average overall accuracy of 0.97, kappa coefficient of 0.98, producer's accuracy of 0.99, and user's accuracy of 0.97, surpassing the accuracies achieved by SVM (0.96, 0.97, 0.98, and 0.97) and ANN (0.89, 0.81, 0.94, and 0.88), with corresponding computing times of 6.33, 15, and 30 s. All classifiers performed stably with different training sizes. Visual quality assessment revealed that RF had the highest precision. Consequently, the built-up change detection result shows, the net change in built-up area between 2006 and 2022 was increased by 0.74 Km2for ANN, 1.74 Km2 for SVM, and 1.66 Km2for RF. The comparison reveals that the RF algorithm showcasing high precision in detecting change, consistent with the data (increased by 1.65 Km2) obtained from Dilla town land administration office. To validate the results, the study considered field surveys, reference images, local experts, and previous studies. Based on the findings, the study concludes that using RF classifier with an object-based approach is an effective way to map LULC and detect changes in the study area over time. Future researchers are recommended to utilize this effective algorithm for addressing LULC related problems in the study area.
TL;DR: AdaDL-SVDD method utilizes sparse representations and SVDD to achieve superior anomaly detection performance with uncertain data.
Abstract: Anomaly detection aims to identify unusual behavior or discriminate abnormal samples by referring to the normal samples of data. Most exiting anomaly detection approaches train the model using only the normal data due to the scarcity of anomalies. However, the negative data or anomalies do occur in many practical applications. In this paper, we propose a novel anomaly detection method called AdaDL-SVDD for addressing uncertain data problem. In this method, both normal and anomalous samples are utilized to generate sparse representations with dictionary learning in the training phase. Meanwhile, we incorporate Support Vector Data Description (SVDD) into framework to construct a minimum hypersphere for anomaly detection over the test data. Additionally, the AdaBoost method is considered to construct a strong classifier via combining the weak classifiers. In the end, the experimental results demonstrate that the proposed AdaDL-SVDD method achieves superior performance over the UCI datasets with uncertainty and noise.