PIMKL: Pathway Induced Multiple Kernel Learning
TL;DR: Pathway Induced Multiple Kernel Learning (PIMKL) as discussed by the authors exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a multiple kernel learning (MKL) algorithm.
read more
Abstract: Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power, limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, and we have very little understanding about the mechanisms that lead to the prediction. While opaqueness concerning machine behaviour might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway Induced Multiple Kernel Learning (PIMKL), a novel methodology to reliably classify samples that can also help gain insights into the molecular mechanisms that underlie the classification. PIMKL exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm, an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels for prediction of a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Integration strategies of multi-omics data for machine learning analysis.
TL;DR: In this article, the authors focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications and summarize the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical.
341
Incorporating biological structure into machine learning models in biomedicine.
Jake Crawford,Casey S. Greene +1 more
TL;DR: For machine learning in biomedicine, where sample size is limited and model interpretability is crucial, incorporating prior knowledge in the form of structured data can be particularly useful.
35
Informative top-k class associative rule for cancer biomarker discovery on microarray data
TL;DR: An enhanced associative classification algorithm that integrates microarray data with biological information from gene ontology, KEGG pathways, and protein-protein interactions to generate informative class associative rules is introduced.
15
Multi-omics data integration approaches for precision oncology.
TL;DR: The present review addresses the impact of current multi-omics data integration approaches, and their synergy with machine learning approaches, on the precision oncology field.
14
PRECISE+ predicts drug response in patients by non-linear subspace-based transfer from cell lines and PDX models
Soufiane Mourragui,Soufiane Mourragui,Marco Loog,Marco Loog,Daniel J. Vis,Kat Moore,Anna G. Manjón,Mark A. van de Wiel,Mark A. van de Wiel,Marcel J. T. Reinders,Marcel J. T. Reinders,Lodewyk F. A. Wessels,Lodewyk F. A. Wessels +12 more
TL;DR: The interpretability of PRECISE+ is used to validate the approach by identifying known biomarkers to targeted therapies and to propose novel putative biomarkers of resistance to Paclitaxel and Gemcitabine.
11
References
Pattern Recognition and Machine Learning
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
30.8K
KEGG: Kyoto Encyclopedia of Genes and Genomes
Minoru Kanehisa,Susumu Goto +1 more
TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
The Molecular Signatures Database Hallmark Gene Set Collection
Arthur Liberzon,Chet Birger,Helga Thorvaldsdottir,Mahmoud Ghandi,Jill P. Mesirov,Pablo Tamayo +5 more
TL;DR: A combination of automated approaches and expert curation is used to develop a collection of "hallmark" gene sets, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression in MSigDB.
10.5K
Gene Selection for Cancer Classification using Support Vector Machines
TL;DR: In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.
NCBI GEO: archive for functional genomics data sets—update
Tanya Barrett,Stephen E. Wilhite,Pierre Ledoux,Carlos Evangelista,Irene F. Kim,Maxim Tomashevsky,Kimberly A. Marshall,Katherine Phillippy,Patti M. Sherman,Michelle Holko,Andrey Yefanov,Hye Seung Lee,Naigong Zhang,Cynthia L. Robertson,Nadezhda Serova,Sean Davis,Alexandra Soboleva +16 more
TL;DR: The Gene Expression Omnibus is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community and supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable.
9.4K