Efficient Multiclass Classification Using Feature Selection in High-Dimensional Datasets
TL;DR: In this article , the authors proposed a novel feature selection approach that combines filter and wrapper techniques to select optimal features using Mutual Information with the Sequential Forward Method and 10-fold cross-validation.
read more
Abstract: Feature selection has become essential in classification problems with numerous features. This process involves removing redundant, noisy, and negatively impacting features from the dataset to enhance the classifier’s performance. Some features are less useful than others or do not correlate with the system’s evaluation, and their removal does not affect the system’s performance. In most cases, removing features with a monotonically decreasing impact on the system’s performance increases accuracy. Therefore, this research aims to propose a dimensionality reduction method using a feature selection technique to enhance accuracy. This paper proposes a novel feature-selection approach that combines filter and wrapper techniques to select optimal features using Mutual Information with the Sequential Forward Method and 10-fold cross-validation. Results show that the proposed algorithm can reduce features by more than 75% in datasets with large features and achieve a maximum accuracy of 97%. The algorithm outperforms or performs similarly to existing ones. The proposed algorithm could be a better option for classification problems with minimized features.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Feature engineering impact on position falsification attacks detection in vehicular ad-hoc network
Eslam Abdelkreem,Sherif Hussein,Ashraf Tammam +2 more
TL;DR: This study investigates the impact of feature engineering on detecting position falsification attacks in vehicular ad-hoc networks, finding that two models employing feature engineering outperform existing studies, with accuracy improvements of 6.31-47% using the VeReMi dataset.
2
Efficient Storage Approach for Big Data Analytics: An Iterative-Probabilistic Method for Dynamic Resource Allocation of Big Satellite Images
TL;DR: This paper addresses the issue of transmitting a considerable amount of satellite images across the network to various storage supports and shows that the proposed heuristics outperform those developed in the literature.
2
Deep Error-Correcting Output Codes
Guoqiang Zhong,Yuchen Zheng,Peng Zhang,Mengqi Li,Junyu Dong +4 more
- 24 Apr 2017
TL;DR: This paper combines the ideas of ensemble learning and deep learning, and presents a novel deep learning framework called deep error-correcting output codes (DeepECOC), which performs not only better than traditional ECOC and feature learning algorithms, but also state-of-the-art deep learning models in most cases.
Cross-Project Defect Prediction Based on Domain Adaptation and LSTM Optimization
Khadija Javed,Shengbing Ren,Muhammad Asim,Mudasir Ahmad Wani +3 more
TL;DR: This research proposes Smote Correlation and Attention Gated recurrent unit based Long Short-Term Memory optimization (SCAG-LSTM), which first employs a novel hybrid technique that extends the synthetic minority over-sampling technique (SMOTE) with edited nearest neighbors (ENN) to rebalance class distributions and mitigate the issues caused by noisy and irrelevant instances in both source and target domains.
2
Assessing the Efficiency of Foreign Investment in a Certification Procedure Using an Ensemble Machine Learning Model
Aleksandar Kemives,Lidija Barjaktarović,Milan Ranđelović,Milan Cabarkapa,Dragan Ranđelović +4 more
TL;DR: The proposed solution simultaneously analyzes the impact of different factors on foreign investments in order to determine the most important factors and thus enable each local government to ensure the best possible efficiency in this process.
1
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
An introduction to variable and feature selection
Isabelle Guyon,André Elisseeff +1 more
TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
An introduction to variable and feature selection
GuyonIsabelle,ElisseeffAndré +1 more
TL;DR: In this paper, variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available, such as t...
7K
Gradient boosting machines, a tutorial.
Alexey Natekin,Alois Knoll +1 more
TL;DR: This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling.
Using mutual information for selecting features in supervised neural net learning
TL;DR: This paper investigates the application of the mutual information criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier.