Proceedings Article10.1109/IJCNN.2007.4370965
DNA Microarray Data Analysis: Effective Feature Selection for Accurate Cancer Classification
Jagdish C. Patra,G.P. Lim,Pramod Kumar Meher,Ee Luang Ang +3 more
- 29 Oct 2007
- pp 260-265
6
TL;DR: Experimental results show that classifiers, which have learned from different feature sets that are negatively correlated with each other, produce the best recognition rates on the two benchmark datasets.
read more
Abstract: Accurate classification of DNA microarray data is vital for cancer diagnosis and treatment. For greater accuracy, a preferable strategy is to make a decision based on the result of a single classifier that is trained with various aspects of data space. It is a difficult task to create an optimal classifier for DNA analysis that deals with only a few samples with large number of features. Usually, different feature sets are provided for classifiers to learn. If the feature sets provide similar information, the classifiers trained from them cannot improve the performance because they will make the same error and there is no possibility of compensation. In this paper, we adopt correlation analysis of feature selection methods as a guideline for selection of features for classifiers to learn. We use a negative correlation method for generation of feature sets those are mutually exclusive. Each classifier is learned from different features sets based on correlation analysis to classify cancer precisely. In this way, we evaluated the performance with two benchmark datasets. Experimental results show that classifiers, which have learned from different feature sets that are negatively correlated with each other, produce the best recognition rates on the two benchmark datasets.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The Self-Organizing Map
Teuvo Kohonen
- 01 Jan 1990
TL;DR: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article, where the authors present an overview of their work.
2.9K
A Novel Hybrid Method of Gene Selection and Its Application on Tumor Classification
Zhu-Hong You,Shu-Lin Wang,Jie Gui,Shanwen Zhang +3 more
- 15 Sep 2008
TL;DR: A novel hybrid gene selection method is proposed to find a feature gene subset so that the feature genes related to certain cancer can be kept and the redundant genes can be leave out and experimental results show that this algorithm leads to better classification performance than other methods.
14
Hybrid Correlation based Gene Selection for Accurate Cancer Classification of Gene Expression Data
TL;DR: This paper proposes a hybrid negative correlated method, which combines the features from various correlation based feature selection techniques, for the generation of mutually exclusive informative feature sets and test the effectiveness of the proposed approach using a neural network based classifier on two benchmark gene expression data sets colon dataset and leukemia dataset.
Rough-Mutual Feature Selection Based on Min-Uncertainty and Max-Certainty
TL;DR: The maximal lower approximation (Max-Certainty) - minimal boundary region (Min-Uncertainty) criterion, focuses on feature selection methods based on rough set and mutual information which use different values among the lower approximation information and the information contained in the boundary region.
1
Classifying Gene Expression Data of Cancer Using Multistage Ensemble of Neural Networks
TL;DR: In this paper, a multistage ensembles combination scheme was proposed to classify gene expression data of cancer, where the classified copy of results of training samples from first stage neural networks are used as an input features for second stage neural network.
1
References
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Ash A. Alizadeh,Michael B. Eisen,R. Eric Davis,Izidore S. Lossos,Andreas Rosenwald,Jennifer C. Boldrick,Hajeer Sabet,Truc Tran,Xin Yu,John Powell,Liming Yang,Gerald E. Marti,Troy Moore,James I. Hudson,Li-Sheng Lu,David B. Lewis,Robert Tibshirani,Gavin Sherlock,Wing C. Chan,Timothy C. Greiner,Dennis D. Weisenburger,James O. Armitage,Roger A. Warnke,Ronald Levy,Wyndham H. Wilson,M. R. Grever,John C. Byrd,David Botstein,Patrick O. Brown,Louis M. Staudt +29 more
TL;DR: It is shown that there is diversity in gene expression among the tumours of DLBCL patients, apparently reflecting the variation in tumour proliferation rate, host response and differentiation state of the tumour.
Gene Selection for Cancer Classification using Support Vector Machines
TL;DR: In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.
Missing value estimation methods for DNA microarrays.
Olga G. Troyanskaya,Michael N. Cantor,Gavin Sherlock,Patrick O. Brown,Trevor Hastie,Robert Tibshirani,David Botstein,Russ B. Altman +7 more
TL;DR: It is shown that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVD Impute and KNN Impute surpass the commonly used row average method (as well as filling missing values with zeros).
Gene selection and classification of microarray data using random forest
TL;DR: It is shown that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy.