Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery.
Nora K. Speicher,Nico Pfeifer +1 more
TL;DR: Current multiple kernel learning for dimensionality reduction approaches are applied and extended, and it is shown that one can even use several kernels per data type and thereby alleviate the user from having to choose the best kernel functions and kernel parameters for each data type beforehand.
read more
Abstract: Motivation: Despite ongoing cancer research, available therapies are still limited in quantity and effectiveness, and making treatment decisions for individual patients remains a hard problem. Established subtypes, which help guide these decisions, are mainly based on individual data types. However, the analysis of multidimensional patient data involving the measurements of various molecular features could reveal intrinsic characteristics of the tumor. Large-scale projects accumulate this kind of data for various cancer types, but we still lack the computational methods to reliably integrate this information in a meaningful manner. Therefore, we apply and extend current multiple kernel learning for dimensionality reduction approaches. On the one hand, we add a regularization term to avoid overfitting during the optimization procedure, and on the other hand, we show that one can even use several kernels per data type and thereby alleviate the user from having to choose the best kernel functions and kernel parameters for each data type beforehand.
Results: We have identified biologically meaningful subgroups for five different cancer types. Survival analysis has revealed significant differences between the survival times of the identified subtypes, with P values comparable or even better than state-of-the-art methods. Moreover, our resulting subtypes reflect combined patterns from the different data sources, and we demonstrate that input kernel matrices with only little information have less impact on the integrated kernel matrix. Our subtypes show different responses to specific therapies, which could eventually assist in treatment decision making.
Availability and implementation: An executable is available upon request.
Contact: ed.gpm.fni-ipm@aron or ed.gpm.fni-ipm@refiefpn
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Multi-omics Data Integration, Interpretation, and Its Application.
TL;DR: This review collected the tools and methods that adopt integrative approach to analyze multiple omics data and summarized their ability to address applications such as disease subtyping, biomarker prediction, and deriving insights into the data.
1K
More Is Better: Recent Progress in Multi-Omics Data Integration Methods.
TL;DR: This review outlines the progress done in the field of multi-omics integration and comprehensive tools developed so far in this field and discusses the integration methods to predict patient survival.
Using machine learning approaches for multi-omics data analysis: A review
TL;DR: In this article, the authors explore different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease.
494
Multi-omic and multi-view clustering algorithms: review and cancer benchmark
Nimrod Rappoport,Ron Shamir +1 more
TL;DR: This review covers methods developed specifically for omic data as well as generic multi-view methods developed in the machine learning community for joint clustering of multiple data types, providing the first systematic comparison of leading multi-omics and multi-View clustering algorithms.
407
Integration strategies of multi-omics data for machine learning analysis.
TL;DR: In this article, the authors focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications and summarize the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical.
341
References
Hallmarks of cancer: the next generation.
TL;DR: Recognition of the widespread applicability of these concepts will increasingly affect the development of new means to treat human cancer.
63.3K
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.
19K
A tutorial on spectral clustering
TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.
Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1
Roel G.W. Verhaak,Katherine A. Hoadley,Elizabeth Purdom,Victoria Wang,Yuan-yuan Qi,Matthew D. Wilkerson,C. Ryan Miller,Li Ding,Todd R. Golub,Jill P. Mesirov,Gabriele Alexe,Michael S. Lawrence,Michael O'Kelly,Pablo Tamayo,Barbara A. Weir,Stacey Gabriel,Wendy Winckler,Supriya Gupta,Lakshmi Jakkula,Heidi S. Feiler,J. Graeme Hodgson,C. David James,Jann N. Sarkaria,Cameron Brennan,Ari B. Kahn,Paul T. Spellman,Richard K. Wilson,Terence P. Speed,Terence P. Speed,Joe W. Gray,Matthew Meyerson,Gad Getz,Charles M. Perou,Charles M. Perou,D. Neil Hayes +34 more
TL;DR: A robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes is described and multidimensional genomic data is integrated to establish patterns of somatic mutations and DNA copy number.
7.2K
Objective Criteria for the Evaluation of Clustering Methods
TL;DR: This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data.