Estimating the Repeat Structure and Length of DNA Sequences Using ℓ-Tuples
Xiaoman Li,Michael S. Waterman +1 more
TL;DR: This work approaches estimating genome length by first estimating the repeat structure of the genome or BAC, sometimes of interest in its own right, on the basis of a set of random reads from a genome project, based on the l-tuple content of the reads.
read more
Abstract: In shotgun sequencing projects, the genome or BAC length is not always known. We approach estimating genome length by first estimating the repeat structure of the genome or BAC, sometimes of interest in its own right, on the basis of a set of random reads from a genome project. Moreover, we can find the consensus for repeat families before assembly. Our methods are based on the l-tuple content of the reads.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Genome-wide survey and genetic characteristics of <i>Ophichthus evermanni </i>(Jordan<i> et </i>Richardson, 1909) based on Illumina sequencing platform
03 May 2022
TL;DR: The first de novo assembled 1.97 Gb draft genome of Ophichthus evermanni was predicted based on K-mer analysis without obvious nucleotide bias as discussed by the authors .
5
Rat Genome ( Rattus norvegicus )
TL;DR: There is a strong correlation between local rates of microinsertions and microdeletions, nucleotide substitutions, and transposable element insertions in the rat and mouse lineages, although the events occurred independently since the divergence of the two branches.
5
Shotgun Sequence Assembly
Mihai Pop
TL;DR: Throughout this chapter, the main computational challenges imposed by the shotgun sequencing method are described, and the most widely used assembly algorithms are surveyed.
5
Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT
Shahab Sarmashghi,Metin Balaban,Eleonora Rachtman,Behrouz Touri,Siavash Mirarab,Vineet Bafna +5 more
TL;DR: In this paper, a constrained optimization approach (Spline Linear Programming) is used to estimate the k-mer spectra of the genome, where the constraints are learned empirically on reads simulated at 1X coverage from 66 genomes.
•Dissertation
Fuzzy k-mers and their application to comparative genome assembly
John Healy
- 22 Oct 2013
TL;DR: This document summarizes current capabilities, research and operational priorities, and plans for further studies that were established at the 2015 USGS workshop on quantitative hazard assessments of earthquake-triggered landsliding and liquefaction in the Central American region.
4
References
•Book
The EM algorithm and extensions
Geoffrey J. McLachlan,Thriyambakam Krishnan +1 more
- 15 Nov 1996
TL;DR: The EM Algorithm and Extensions describes the formulation of the EM algorithm, details its methodology, discusses its implementation, and illustrates applications in many statistical contexts, opening the door to the tremendous potential of this remarkably versatile statistical tool.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
Fleischmann Rd,Adams,Owen White,Rebecca A. Clayton,Ewen F. Kirkness,Anthony R. Kerlavage,Carol J. Bult,J F Tomb,Brian Dougherty,Merrick Jm +9 more
TL;DR: An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence of the genome from the bacterium Haemophilus influenzae Rd.
6.2K
REPuter: the manifold applications of repeat analysis on a genomic scale.
Stefan Kurtz,Jomuna V. Choudhuri,Enno Ohlebusch,Chris Schleiermacher,Jens Stoye,Robert Giegerich +5 more
TL;DR: The wide scope of repeat analysis is circumscribes using applications in five different areas of sequence analysis: checking fragment assemblies, searching for low copy repeats, finding unique sequences, comparing gene structures and mapping of cDNA/EST sequences.
Goodness-of-Fit Techniques.
TL;DR: "Overview, Ralph B. Stephens Tests Based on Regression and Correlation, Michael A. D'Agostino Tests of Chi-Squared Type, David S. Michael and William R. Shenton Tests for the Uniform Distribution, MichaelA.
1.6K
An Eulerian path approach to DNA fragment assembly
TL;DR: This work abandons the classical “overlap–layout–consensus” approach in favor of a new euler algorithm that, for the first time, resolves the 20-year-old “repeat problem” in fragment assembly.