Open Access
Using Rank-One Biclusters to Classify Microarray Data
Nasimeh Asgarian,Russell Greiner +1 more
- 01 Jan 2007
17
TL;DR: This paper proposes a novel algorithm for learning a microarray classier by reducing the dimensionality of the data matrix using biclusters, where each bicluster is a subset of genes andA subset of samples whose expression values have similar patterns.
read more
Abstract: Motivation: A DNA-microarray measures the gene expression levels of tens of thousands of genes for a particular sample, corresponding to some specic experimental condition Our goal is to learn a microarray classier that can distinguish different classes o eg, to predict which patient will respond well to a treatment, based on the data from his/her microarray Unfortunately, the large number of genes and the small number of samples make building such classiers very challenging Results: This paper proposes a method for learning a microarray classier by rst reducing the dimensionality of the data matrix using biclusters, where each bicluster is a subset of genes and a subset of samples whose expression values have similar patterns We propose a novel algorithm for nding biclusters from the microarray data, based on the best rank-1 matrix approximation, then show how to use these biclusters to classify novel samples We demonstrate that our method works effectively by comparing its prediction accuracy with that of other classiers , including one based on another bicluster algorithm, over a number of publicly available microarray datasets, both diagnostic and prognostic Availability: http://wwwcsualbertaca/ogreiner/R/RoBiC
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling
Orly Alter,Patrick O. Brown,David Botstein +2 more
- 01 Mar 2001
TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
1.9K
On the Complexity of Nonnegative Matrix Factorization
TL;DR: An exact version of nonnegative matrix factorization is defined and it is established that it is equivalent to a problem in polyhedral combinatorics; it is NP-hard; and that a polynomial-time local search heuristic exists.
743
Biclustering via Sparse Singular Value Decomposition
TL;DR: Sparse singular value decomposition (SSVD) is proposed as a new exploratory analysis tool for biclustering or identifying interpretable row–column associations within high‐dimensional data matrices.
322
Finding large average submatrices in high dimensional data
TL;DR: A statistically motivated biclustering procedure that finds large average submatrices within a given real-valued data matrix and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value is proposed.
219
Finding large average submatrices in high dimensional data
TL;DR: In this article, a statistically motivated biclustering procedure (LAS) is proposed to find large average submatrices within a given real-valued data matrix, and the procedure operates in an iterative-residual fashion, and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value.
References
•Book
Data Mining: Practical Machine Learning Tools and Techniques
Ian H. Witten,Eibe Frank,Mark Hall +2 more
- 25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
25.4K
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Gene expression profiling predicts clinical outcome of breast cancer
Laura J. van't Veer,Hongyue Dai,Marc J. van de Vijver,Yudong D. He,Augustinus A. M. Hart,Mao Mao,Hans Peterse,Karin van der Kooy,Matthew J. Marton,Anke T. Witteveen,George J. Schreiber,Ron M. Kerkhoven,Christopher J. Roberts,Peter S. Linsley,René Bernards,Stephen H. Friend +15 more
TL;DR: DNA microarray analysis on primary breast tumours of 117 young patients is used and supervised classification is applied to identify a gene expression signature strongly predictive of a short interval to distant metastases (‘poor prognosis’ signature) in patients without tumour cells in local lymph nodes at diagnosis, providing a strategy to select patients who would benefit from adjuvant therapy.
Original Contribution: Stacked generalization
TL;DR: The conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate.
7.5K
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.
Uri Alon,Naama Barkai,Daniel A. Notterman,Kurt C. Gish,S. Ybarra,David H. Mack,A. J. Levine,A. J. Levine +7 more
TL;DR: In this paper, a two-way clustering algorithm was applied to both the genes and the tissues, revealing broad coherent patterns that suggest a high degree of organization underlying gene expression in these tissues.