Multiple alignment by aligning alignments
Travis J. Wheeler,John Kececioglu +1 more
- 01 Jul 2007
- Vol. 23, Iss: 13, pp 559-568
TL;DR: A new tool is produced that on benchmark alignments matches the quality of the top tools, without employing alignment consistency or hydrophobic gap penalties, and is freely available at http://opal.cs.arizona.edu.
read more
Abstract: Motivation: Multiple sequence alignment is a fundamental task in bioinformatics. Current tools typically form an initial alignment by merging subalignments, and then polish this alignment by repeated splitting and merging of subalignments to obtain an improved final alignment. In general this form-and-polish strategy consists of several stages, and a profusion of methods have been tried at every stage. We carefully investigate: (1) how to utilize a new algorithm for aligning alignments that optimally solves the common subproblem of merging subalignments, and (2) what is the best choice of method for each stage to obtain the highest quality alignment.
Results: We study six stages in the form-and-polish strategy for multiple alignment: parameter choice, distance estimation, merge-tree construction, sequence-pair weighting, alignment merging, and polishing. For each stage, we consider novel approaches as well as standard ones. Interestingly, the greatest gains in alignment quality come from (i) estimating distances by a new approach using normalized alignment costs, and (ii) polishing by a new approach using 3-cuts. Experiments with a parameter-value oracle suggest large gains in quality may be possible through an input-dependent choice of alignment parameters, and we present a promising approach for building such an oracle. Combining the best approaches to each stage yields a new tool we call Opal that on benchmark alignments matches the quality of the top tools, without employing alignment consistency or hydrophobic gap penalties.
Availability:Opal, a multiple alignment tool that implements the best methods in our study, is freely available at http://opal.cs.arizona.edu
Contact: [email protected]
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega
Fabian Sievers,Andreas Wilm,David Dineen,Toby J. Gibson,Kevin Karplus,Weizhong Li,Rodrigo Lopez,Hamish McWilliam,Michael Remmert,Johannes Söding,Julie D. Thompson,Desmond G. Higgins +11 more
TL;DR: A new program called Clustal Omega is described, which can align virtually any number of protein sequences quickly and that delivers accurate alignments, and which outperforms other packages in terms of execution time and quality.
Whole-genome analyses resolve early branches in the tree of life of modern birds
Erich D. Jarvis,Siavash Mirarab,Andre J. Aberer,Bo Li,Bo Li,Bo Li,Peter Houde,Cai Li,Cai Li,Simon Y. W. Ho,Brant C. Faircloth,Benoit Nabholz,Jason T. Howard,Alexander Suh,Claudia C. Weber,Rute R. da Fonseca,Jianwen Li,Fang Zhang Zhang,Hui Li,Long Zhou,Nitish Narula,Nitish Narula,Liang Liu,Ganesh Ganapathy,Bastien Boussau,Shamsuzzoha Bayzid,Volodymyr Zavidovych,Sankar Subramanian,Toni Gabaldón,Salvador Capella-Gutierrez,Jaime Huerta-Cepas,Bhanu Rekepalli,Bhanu Rekepalli,Kasper Munch,Mikkel H. Schierup,Bent E. K. Lindow,Wesley C. Warren,David A. Ray,Richard E. Green,Michael William Bruford,Xiangjiang Zhan,Xiangjiang Zhan,Andrew Dixon,Shengbin Li,Ning Li,Yinhua Huang,Elizabeth P. Derryberry,Elizabeth P. Derryberry,Mads F. Bertelsen,Frederick H. Sheldon,Robb T. Brumfield,Claudio V. Mello,Claudio V. Mello,Peter V. Lovell,Morgan Wirthlin,Maria Paula Cruz Schneider,Francisco Prosdocimi,José Alfredo Samaniego,Amhed Missael Vargas Velazquez,Alonzo Alfaro-Núñez,Paula F. Campos,Bent O. Petersen,Thomas Sicheritz-Pontén,An Pas,Thomas L. Bailey,R. Paul Scofield,Michael Bunce,David M. Lambert,Qi Zhou,Polina L. Perelman,Amy C. Driskell,Beth Shapiro,Zijun Xiong,Yongli Zeng,Shiping Liu,Zhenyu Li,Binghang Liu,Kui Wu,Jin Xiao,Xiong Yinqi,Quiemei Zheng,Yong Zhang,Huanming Yang,Jian Wang,Linnéa Smeds,Frank E. Rheindt,Michael J. Braun,Jon Fjeldså,Ludovic Orlando,F. Keith Barker,Knud A. Jønsson,Warren E. Johnson,Klaus-Peter Koepfli,Stephen J. O'Brien,David Haussler,Oliver A. Ryder,Carsten Rahbek,Eske Willerslev,Gary R. Graves,Gary R. Graves,Travis C. Glenn,John E. McCormack,Dave Burt,Hans Ellegren,Per Alström,Scott V. Edwards,Alexandros Stamatakis,David P. Mindell,Joel Cracraft,Edward L. Braun,Tandy Warnow,Tandy Warnow,Wang Jun,M. Thomas P. Gilbert,M. Thomas P. Gilbert,Guojie Zhang,Guojie Zhang +116 more
TL;DR: A genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves recovered a highly resolved tree that confirms previously controversial sister or close relationships and identifies the first divergence in Neoaves, two groups the authors named Passerea and Columbea.
MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.
TL;DR: The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons).
Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0.
Francesco Asnicar,Andrew Maltez Thomas,Francesco Beghini,Claudia Mengoni,Serena Manara,Paolo Manghi,Qiyun Zhu,Mattia Bolzan,Fabio Cumbo,Uyen May,Jon G. Sanders,Jon G. Sanders,Moreno Zolfo,Evguenia Kopylova,Edoardo Pasolli,Edoardo Pasolli,Rob Knight,Siavash Mirarab,Curtis Huttenhower,Nicola Segata +19 more
TL;DR: PhyloPhlAn 3.0 can assign genomes from isolate sequencing or MAGs to species-level genome bins built from >230,000 publically available sequences, and reconstructs strain-level phylogenies from among the closest species using clade-specific maximally informative markers.
Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes
Daniel E. Neafsey,Robert M. Waterhouse,Mohammad Reza Abai,Sergey Aganezov,Max A. Alekseyev,James E. Allen,James Amon,Bruno Arcà,Peter Arensburger,Gleb N. Artemov,Lauren A. Assour,Hamidreza Basseri,Aaron M. Berlin,Bruce W. Birren,Stéphanie Blandin,Stéphanie Blandin,Andrew I. Brockman,Thomas R. Burkot,Austin Burt,Clara S. Chan,Cedric Chauve,Joanna C. Chiu,Mikkel B. Christensen,Carlo Costantini,Victoria L.M. Davidson,Elena Deligianni,Tania Dottorini,Vicky Dritsou,Stacey Gabriel,Wamdaogo M. Guelbeogo,Andrew Brantley Hall,Mira V. Han,Thaung Hlaing,Daniel S.T. Hughes,Daniel S.T. Hughes,Adam M. Jenkins,Xiaofang Jiang,Irwin Jungreis,Evdoxia G. Kakani,Evdoxia G. Kakani,Maryam Kamali,Petri Kemppainen,Ryan C. Kennedy,Ioannis K. Kirmitzoglou,Ioannis K. Kirmitzoglou,Lizette L. Koekemoer,Njoroge Laban,Nicholas Langridge,Mara K. N. Lawniczak,Manolis Lirakis,Neil F. Lobo,Ernesto Lowy,Robert M. MacCallum,Chunhong Mao,Gareth Maslen,Charles Mbogo,Jenny McCarthy,Kristin Michel,Sara N. Mitchell,Wendy Moore,Katherine A. Murphy,Anastasia N. Naumenko,Tony Nolan,Eva Maria Novoa,Samantha M. O’Loughlin,Chioma Oringanje,Mohammad Ali Oshaghi,Nazzy Pakpour,Philippos Aris Papathanos,Philippos Aris Papathanos,Ashley Peery,Michael Povelones,Anil Prakash,David P. Price,Ashok Rajaraman,Lisa J. Reimer,David C. Rinker,Antonis Rokas,Tanya L. Russell,N’Fale Sagnon,Maria V. Sharakhova,Terrance Shea,Felipe A. Simão,Felipe A. Simão,Frédéric Simard,Michel A. Slotman,Pradya Somboon,V. N. Stegniy,Claudio J. Struchiner,Claudio J. Struchiner,Gregg W.C. Thomas,Marta Tojo,Pantelis Topalis,Jose M. C. Tubio,Maria F. Unger,John Vontas,Catherine Walton,Craig S. Wilding,Judith H. Willis,Yi-Chieh Wu,Yi-Chieh Wu,Guiyun Yan,Evgeny M. Zdobnov,Evgeny M. Zdobnov,Xiaofan Zhou,Flaminia Catteruccia,Flaminia Catteruccia,George K. Christophides,Frank H. Collins,Robert S. Cornman,Andrea Crisanti,Andrea Crisanti,Martin J. Donnelly,Martin J. Donnelly,Scott J. Emrich,Michael C. Fontaine,Michael C. Fontaine,William M. Gelbart,Matthew W. Hahn,Immo A. Hansen,Paul I. Howell,Fotis C. Kafatos,Manolis Kellis,Daniel Lawson,Christos Louis,Shirley Luckhart,Marc A. T. Muskavitch,Marc A. T. Muskavitch,José M. C. Ribeiro,Michael A. Riehle,Igor V. Sharakhov,Zhijian Tu,Laurence J. Zwiebel,Nora J. Besansky +133 more
TL;DR: The authors investigated the genomic basis of vectorial capacity and explore new avenues for vector control, sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila.
References
Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
Naruya Saitou,Masatoshi Nei +1 more
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
MUSCLE: multiple sequence alignment with high accuracy and high throughput
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
45.1K
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.
•Book
The Neutral Theory of Molecular Evolution
Motoo Kimura
- 01 Jan 1983
TL;DR: The neutral theory as discussed by the authors states that the great majority of evolutionary changes at the molecular level are caused not by Darwinian selection but by random drift of selectively neutral mutants, which has caused controversy ever since.
8K