Evolutionary Sparse Learning for Phylogenomics.

doi:10.1093/MOLBEV/MSAB227

Open AccessJournal Article10.1093/MOLBEV/MSAB227

Evolutionary Sparse Learning for Phylogenomics.

Sudhir Kumar, +2 more

- 27 Oct 2021

- Molecular Biology and Evolution

- Vol. 38, Iss: 11, pp 4674-4682

10

TL;DR: ESL as mentioned in this paper is a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci (e.g., genes, proteins, genomic segments, and positions) as parameters.

Abstract: We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL models do not directly involve conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and nonmolecular data types and incorporate biological and functional annotations of genomic loci in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example, and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics, particularly for identifying influential loci and sequences given a phylogeny and building models to test hypotheses. ESL’s fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1093/molbev/msac043

OUP accepted manuscript

01 Mar 2022

- Molecular Biology and Evolution

TL;DR: In this article , the authors proposed a method to shrink the carbon footprint of molecular evolutionary analyses by aligning multiple sequences, optimizing substitution models, inferring evolutionary trees, testing phylogenies by bootstrap analysis and estimating divergence times.

...read moreread less

21

Journal Article•10.1186/s13015-023-00233-3

Constructing phylogenetic networks via cherry picking and machine learning

Giulia Bernardini, +3 more

- 16 Sep 2023

- Algorithms for Molecular Biology

TL;DR: Constructing phylogenetic networks via cherry picking and machine learning produces efficient heuristics applicable to large datasets, leveraging machine learning techniques to capture essential information on the structure of input trees.

...read moreread less

4

•Journal Article•10.1093/bioinformatics/btac252

OUP accepted manuscript

24 Jun 2022

- Bioinformatics

TL;DR: In this article , an artificial-intelligence-based approach is proposed to select the optimal subset of sites and a formula by which one can compute the log-likelihood of the entire data based on this subset.

...read moreread less

3

•Journal Article•10.1093/bioinformatics/btac252

A LASSO-based approach to sample sites for phylogenetic tree search

Noa Ecker, +6 more

- 24 Jun 2022

- Bioinformatics

TL;DR: An artificial-intelligence-based approach, which provides means to select the optimal subset of sites and a formula by which one can compute the log-likelihood of the entire data based on this subset, based on training a regularized Lasso-regression model.

...read moreread less

References

Journal Article•10.1111/J.2517-6161.1996.TB02080.X

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996

- Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

45.4K

•Journal Article•10.1016/J.PATREC.2005.10.010

An introduction to ROC analysis

Tom Fawcett

- 01 Jun 2006

- Pattern Recognition Letters

TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.

...read moreread less

21.3K

•Journal Article•10.1111/J.1467-9868.2005.00503.X

Regularization and variable selection via the elastic net

Hui Zou, +1 more

- 01 Apr 2005

- Journal of The Royal Statistical Society...

TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.

...read moreread less

20.2K

Journal Article•10.1198/016214501753382273

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Jianqing Fan, +1 more

- 01 Dec 2001

- Journal of the American Statistical Asso...

TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.

...read moreread less

10.1K