PIMKL: Pathway Induced Multiple Kernel Learning

doi:10.1038/S41540-019-0086-3

Open AccessJournal Article10.1038/S41540-019-0086-3

PIMKL: Pathway Induced Multiple Kernel Learning

Matteo Manica, +5 more

- 29 Mar 2018

- arXiv: Molecular Networks

24

TL;DR: Pathway Induced Multiple Kernel Learning (PIMKL) as discussed by the authors exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a multiple kernel learning (MKL) algorithm.

Abstract: Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power, limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, and we have very little understanding about the mechanisms that lead to the prediction. While opaqueness concerning machine behaviour might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway Induced Multiple Kernel Learning (PIMKL), a novel methodology to reliably classify samples that can also help gain insights into the molecular mechanisms that underlie the classification. PIMKL exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm, an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels for prediction of a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1016/J.CSBJ.2021.06.030

Integration strategies of multi-omics data for machine learning analysis.

Milan Picard, +4 more

- 01 Jan 2021

- Computational and structural biotechnolo...

TL;DR: In this article, the authors focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications and summarize the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical.

...read moreread less

341

•Journal Article•10.1016/J.COPBIO.2019.12.021

Incorporating biological structure into machine learning models in biomedicine.

Jake Crawford, +1 more

- 18 Jan 2020

- Current Opinion in Biotechnology

TL;DR: For machine learning in biomedicine, where sample size is limited and model interpretability is crucial, incorporating prior knowledge in the form of structured data can be particularly useful.

...read moreread less

35

•Journal Article•10.1016/J.ESWA.2019.113169

Informative top-k class associative rule for cancer biomarker discovery on microarray data

Huey Fang Ong, +4 more

- 15 May 2020

- Expert Systems With Applications

TL;DR: An enhanced associative classification algorithm that integrates microarray data with biological information from gene ontology, KEGG pathways, and protein-protein interactions to generate informative class associative rules is introduced.

...read moreread less

15

Journal Article•10.1039/d1mo00411e

Multi-omics data integration approaches for precision oncology.

Raidel Correa-Aguila, +2 more

- 26 Apr 2022

- Molecular omics

TL;DR: The present review addresses the impact of current multi-omics data integration approaches, and their synergy with machine learning approaches, on the precision oncology field.

...read moreread less

14

...

Expand

References

Journal Article•10.1198/TECH.2007.S518

Pattern Recognition and Machine Learning

Radford M. Neal

- 01 Aug 2007

- Technometrics

TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.

...read moreread less

30.8K

•Journal Article•10.1093/NAR/28.1.27

KEGG: Kyoto Encyclopedia of Genes and Genomes

Minoru Kanehisa, +1 more

- 01 Jan 1999

- Nucleic Acids Research

TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.

...read moreread less

30K

•Journal Article•10.1016/J.CELS.2015.12.004

The Molecular Signatures Database Hallmark Gene Set Collection

Arthur Liberzon, +5 more

- 23 Dec 2015

- Cell systems

TL;DR: A combination of automated approaches and expert curation is used to develop a collection of "hallmark" gene sets, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression in MSigDB.

...read moreread less

10.5K

•Journal Article•10.1023/A:1012487302797

Gene Selection for Cancer Classification using Support Vector Machines

Isabelle Guyon, +3 more

- 11 Mar 2002

- Machine Learning

TL;DR: In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.

...read moreread less

9.5K

•Journal Article•10.1093/NAR/GKS1193

NCBI GEO: archive for functional genomics data sets—update

Tanya Barrett, +16 more

- 27 Nov 2012

- Nucleic Acids Research

TL;DR: The Gene Expression Omnibus is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community and supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable.

...read moreread less

9.4K