A kernel-based multivariate feature selection method for microarray data classification.

doi:10.1371/JOURNAL.PONE.0102541

Open AccessJournal Article10.1371/JOURNAL.PONE.0102541

A kernel-based multivariate feature selection method for microarray data classification.

Shiquan Sun, +2 more

- 21 Jul 2014

- PLOS ONE

- Vol. 9, Iss: 7

64

TL;DR: Kernel method is used to discover inherent nonlinear correlations among features as well as between feature and target and the performance of this method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.

Abstract: High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA) in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, -nearest neighbor) on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article

Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy

Yap Bee Wah, +4 more

- 01 Jan 2018

- pertanika journal of science and technol...

TL;DR: Simulation results showed that the wrapper method (sequential forward selection and sequential backward elimination) methods were better than the filter method in selecting the correct features.

...read moreread less

131

Journal Article•10.1007/S10489-017-0992-2

Feature clustering based support vector machine recursive feature elimination for gene selection

Huang Xiaojuan, +4 more

- 01 Mar 2018

- Applied Intelligence

TL;DR: Experiments on seven public gene expression datasets show that FCSVM-RFE can achieve a better classification performance and lower computational complexity when compared with the state-the-art-of methods, such as SVM- RFE.

...read moreread less

121

Journal Article•10.1016/J.JTBI.2018.12.010

Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine.

Chuanze Kang, +5 more

- 21 Feb 2019

- Journal of Theoretical Biology

TL;DR: The experimental results show that the method proposed in this paper selects fewer feature genes and achieves higher classification accuracy, and rL-GenSVM uses regularization parameters to avoid overfitting and can be widely applied to high-dimensional and small-sample tumor data classification.

...read moreread less

116

Journal Article•10.1016/j.compbiomed.2022.105766

Graph-based relevancy-redundancy gene selection method for cancer diagnosis

Saeid Azadifar, +4 more

- 01 Jun 2022

TL;DR: This research advocates a graph theoretic-based gene selection method for cancer diagnosis that uses well-known and successful social network approaches such as the maximum weighted clique criterion and edge centrality to rank genes.

...read moreread less

89

Journal Article•10.1016/J.INS.2015.04.012

Hidden Markov models for cancer classification using gene expression profiles

Thanh Nguyen, +3 more

- 20 Sep 2015

- Information Sciences

TL;DR: The proposed combination between the modified AHP and HMM is a powerful tool for cancer classification and useful as a real clinical decision support system for medical practitioners.

...read moreread less

70

...

Expand

References

•Journal Article•10.1162/153244303322753616

An introduction to variable and feature selection

Isabelle Guyon, +1 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

...read moreread less

15.5K

Journal Article•10.1126/SCIENCE.286.5439.531

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Todd R. Golub, +12 more

- 15 Oct 1999

- Science

TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

...read moreread less

13.3K

Journal Article•10.1038/35000501

Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling

Ash A. Alizadeh, +29 more

- 03 Feb 2000

- Nature

TL;DR: It is shown that there is diversity in gene expression among the tumours of DLBCL patients, apparently reflecting the variation in tumour proliferation rate, host response and differentiation state of the tumour.

...read moreread less

10.5K

Journal Article•10.1109/TPAMI.2005.159

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

Hanchuan Peng, +2 more

- 01 Aug 2005

- IEEE Transactions on Pattern Analysis an...

TL;DR: In this article, the maximal statistical dependency criterion based on mutual information (mRMR) was proposed to select good features according to the maximal dependency condition. But the problem of feature selection is not solved by directly implementing mRMR.

...read moreread less

9.9K

•Journal Article•10.1016/S0004-3702(97)00043-X

Wrappers for feature subset selection

Ron Kohavi, +1 more

- 01 Dec 1997

- Artificial Intelligence

TL;DR: The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection.

...read moreread less

9.6K