Multiple alignment-free sequence comparison

doi:10.1093/BIOINFORMATICS/BTT462

Open AccessJournal Article10.1093/BIOINFORMATICS/BTT462

Multiple alignment-free sequence comparison

Jie Ren, +4 more

- 01 Nov 2013

- Bioinformatics

- Vol. 29, Iss: 21, pp 2690-2698

19

TL;DR: Although for real data, all of the statistics show a similar performance, on simulated data the Shepp- type statistics are in some instances outperformed by star-type statistics.

Abstract: Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics con- tains, first, Cand C S , extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, C � , C S and C geo , averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences. Results: Our investigation uses both simulated data as well as cis- regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free stat- istics are more sensitive to contamination in the data than the pairwise average statistics. Availability: Our implementation of the five statistics is available as R package named 'multiAlignFree' at be http://www-rcf.usc.edu/ � fsun/Programs/multiAlignFree/multiAlignFreemain.html. Contact: reinert@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1186/S13059-017-1319-7

Alignment-free sequence comparison: benefits, applications, and tools

Andrzej Zielezinski, +3 more

- 03 Oct 2017

- Genome Biology

TL;DR: This work provides a guide to the currently available alignment-free sequence analysis tools and addresses questions about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research.

...read moreread less

539

•Journal Article•10.1146/ANNUREV-BIODATASCI-080917-013431

Alignment-Free Sequence Analysis and Applications

Jie Ren, +8 more

- 20 Jul 2018

- Social Science Research Network

TL;DR: A review of word-count based approaches for alignment-free sequence analysis can be found in this article, where the authors provide an updated review of these applications and other related developments of word count-based approaches.

...read moreread less

97

•Journal Article•10.1071/SB15001

Molecular homology and multiple-sequence alignment: an analysis of concepts and practice

David A. Morrison, +2 more

- 10 Sep 2015

- Australian Systematic Botany

TL;DR: This work presents examples of molecular-data levels at which homology might be considered, and proposes terminology with which to better describe and discuss molecular homology at these levels, and sheds light on the multitude of automated procedures that have been created for multiple-sequence alignment.

...read moreread less

42

•Journal Article•10.1186/S12859-015-0806-7

Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data

Saulo Alves Aflitos, +5 more

- 17 Nov 2015

- arXiv: Genomics

TL;DR: Cnidaria as discussed by the authors is a tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances, achieving 100% identification accuracy at supra-species level and 78% accuracy for species level.

...read moreread less

15

•Journal Article•10.1186/S12859-016-0980-2

On the comparison of regulatory sequences with multiple resolution Entropic Profiles.

Matteo Comin, +1 more

- 18 Mar 2016

- BMC Bioinformatics

TL;DR: An alignment-free statistic is proposed, called EP2∗$EP^{*}_{2}$, that is based on multiple resolution patterns derived from the Entropic Profiles (EPs), that is highly successful in discriminating functionally related enhancers and, in almost all experiments, outperforms fixed-resolution methods.

...read moreread less

14

...

Expand

References

Journal Article•10.1016/0022-2836(70)90057-4

A general method applicable to the search for similarities in the amino acid sequence of two proteins

Saul B. Needleman, +1 more

- 28 Mar 1970

- Journal of Molecular Biology

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.

...read moreread less

13.2K

Journal Article•10.1016/0022-2836(81)90087-5

Identification of common molecular subsequences.

Temple F. Smith, +1 more

- 25 Mar 1981

- Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less

11.3K

•Journal Article•10.1093/BIOINFORMATICS/BTI623

ROCR: visualizing classifier performance in R

Tobias Sing, +3 more

- 15 Oct 2005

- Bioinformatics

TL;DR: UNLABELLED ROCR is a package for evaluating and visualizing the performance of scoring classifiers in the statistical language R that features over 25 performance measures that can be freely combined to create two-dimensional performance curves.

...read moreread less

3.3K

•Book

The Regulatory Genome: Gene Regulatory Networks In Development And Evolution

Eric H. Davidson

- 30 May 2006

TL;DR: The "Regulatory Genome" for Animal Development and Gene Regulatory Networks: The Roots of Causality and Diversity in Animal Evolution are presented.

...read moreread less

1.1K

•Journal Article

ChIP-seq Identification of Weakly Conserved Heart Enhancers

Matthew J. Blow

- 29 Sep 2010

- Lawrence Berkeley National Laboratory

TL;DR: This paper used ChIP-seq with the enhancer-associated protein p300 from mouse embryonic day 11.5 heart tissue to identify over three thousand candidate heart enhancers genome-wide.

...read moreread less

407