Proceedings Article10.1145/3520304.3533944
A genetic algorithm for classifying metagenomic data
Jolanta Kawulok,Michal Kawulok +1 more
- 09 Jul 2022
TL;DR: In this article , a new technique that exploits a genetic algorithm for selecting a subset of k-mer features that are used for classification was proposed, and the initial results obtained for the problem of detecting type 2 diabetes from human gut metagenomic samples.
read more
Abstract: The goal of metagenomic analysis is to extract relevant information concerning the organisms that have left their genetic traces in an environmental sample. Each sample is subject to nucleotide sequencing, and obtained DNA fragments are decomposed into k-mers---short sequences of k nucleotides. Based on the found k-mers and their occurrence frequencies, it is possible to identify the organisms present in the sample---this allows for further analysis, but requires using large taxonomic datasets. Alternatively, depending on the specific goal of the analysis, the whole sample may be classified directly based on its k-mer profile. However, this is challenging due to a large number of possible k-mers, and choosing the most valuable ones remains an open research problem. In this paper, we propose a new technique that exploits a genetic algorithm for selecting a subset of k-mer features that are used for classification. We report our initial, yet promising results obtained for the problem of detecting the type 2 diabetes from human gut metagenomic samples. We expect that the proposed classification framework will enhance the capabilities of metagenomic analysis, without the need for performing costly taxonomic classification.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
A survey on feature selection methods
Girish Chandrashekar,Ferat Sahin +1 more
TL;DR: The objective is to provide a generic introduction to variable elimination which can be applied to a wide array of machine learning problems and focus on Filter, Wrapper and Embedded methods.
4.9K
KMC 3: counting and manipulating k-mer statistics.
TL;DR: Deorowicz et al. as discussed by the authors introduced KMC3, a significant improvement of the former KMC2 algorithm together with KMC tools for manipulating k-mer databases, which is shown on a few real problems.
628
Metagenomic applications in microbial diversity, bioremediation, pollution monitoring, enzyme and drug discovery. A review
TL;DR: Application of metagenomics for bioremediation, pollution monitoring, enzyme, and drug discovery, and diagnosis and monitoring is reviewed.
114
DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
TL;DR: The alignment-free supervised classification method DectICO can accurately classify metagenomic samples without dependence on known microbial genomes and is more accurate than non-dynamic feature selection methods and a recently published recursive-SVM-based classification approach.
Metagenomics Approaches to Investigate the Gut Microbiome of COVID-19 Patients.
Sofia Sehli,Imane Allali,Rajaa Chahboune,Youssef Bakri,Najib Al Idrissi,Salsabil Hamdi,Chakib Nejjari,Chakib Nejjari,Saaïd Amzazi,Hassan Ghazal +9 more
TL;DR: In this article, the authors present an overview of approaches and methods used in the current published studies on COVID-19 patients and the gut microbiome and the accuracy of these researches depends on the appropriate choice and the optimal use of the metagenomics bioinformatics platforms and tools.