Case-Based Retrieval Framework for Gene Expression Data
TL;DR: A case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles for better diagnosis and treatment of childhood leukemia is proposed.
read more
Abstract: Background: The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process.
Methods: This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles.
Results: The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children's Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set.
Conclusion: The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Journal Article
When is nearest neighbor meaningful
TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.
1.9K
A CBR framework with gradient boosting based feature selection for lung cancer subtype classification.
Juan Ramos-González,Daniel López-Sánchez,José A. Castellanos-Garzón,Juan F. De Paz,Juan M. Corchado +4 more
TL;DR: A novel Case Based Reasoning framework with gradient boosting based feature selection is proposed and applied to the task of squamous cell carcinoma and adenocarcinoma discrimination, aiming to provide accurate diagnosis with a reduced set of genes.
51
A hybrid three-scale model of tumor growth.
Heber L. Rocha,Regina C. Almeida,Ernesto A. B. F. Lima,Anna Claudia M. Resende,J. T. Oden,Thomas E. Yankeelov +5 more
TL;DR: The hybrid model is built in a modular way, enabling the investigation of the role of different mechanisms at multiple scales on tumor progression, and can adequately describe some complex mechanisms of tumor dynamics, including growth arrest in avascular tumors.
Looking beyond the cancer cell for effective drug combinations
TL;DR: It is suggested that to fully exploit these treatment modalities using effective drug combinations it is necessary to develop multiscale computational approaches that take into account the full complexity underlying the biology of a tumor, its microenvironment, and a patient’s response to the drugs.
Frequency domain decomposition‐based multisensor data fusion for assessment of progressive damage in structures
Abstract: In this paper, we focused on the development and verification of a solid and robust framework for structural condition assessment of real‐life structures using measured vibration responses, with the presence of multiple progressive damages occurring within the inspected structures. A self‐tuning learning method for structural condition assessment was proposed. Damage sensitive features were extracted using a frequency domain decomposition (FDD) approach to fuse all the measured responses, followed by random projection algorithm for dimensionality reduction. An automatic parameter selection method called Appropriate Distance to the Enclosing Surface (ADES) was used for tuning the classifier parameter. The effect of operational conditions on the robustness of the proposed method was also investigated, and it was realized that application of FDD to extract damage sensitive feature reduces the variation in the results. Promising results in the assessment of damage were obtained based on two comprehensive case studies, which included single and multiple damage scenarios. The contributions of the work are threefold. First, through two comprehensive case studies, we demonstrate that the frequency‐based feature from a single sensor might not be adequate enough to detect the progress of damage, even if the sensor is in the vicinity of damage. Second, we show that data fusion using FDD can reliably assess the severity of damage, and finally, we propose a new automated approach for tuning the classifier parameter.
25
References
SMOTE: synthetic minority over-sampling technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence
John H. Holland
- 01 May 1992
TL;DR: Initially applying his concepts to simply defined artificial systems with limited numbers of parameters, Holland goes on to explore their use in the study of a wide range of complex, naturally occuring processes, concentrating on systems having multiple factors that interact in nonlinear ways.
16.6K
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
SMOTE: Synthetic Minority Over-sampling Technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.
Uri Alon,Naama Barkai,Daniel A. Notterman,Kurt C. Gish,S. Ybarra,David H. Mack,A. J. Levine,A. J. Levine +7 more
TL;DR: In this paper, a two-way clustering algorithm was applied to both the genes and the tissues, revealing broad coherent patterns that suggest a high degree of organization underlying gene expression in these tissues.