A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
Guillaume Marçais,Carl Kingsford +1 more
TL;DR: This work proposes a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient, based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length.
read more
Abstract: Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.
Results: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution.
Availability: The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish.
Contact: [email protected]
Supplementary information:Supplementary data are available at Bioinformatics online.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The genome evolution and domestication of tropical fruit mango
Peng Wang,Yingfeng Luo,Yingfeng Luo,Huang Jianfeng,Shenghan Gao,Shenghan Gao,Guopeng Zhu,Dang Zhiguo,Jiangtao Gai,Meng Yang,Min Zhu,Huangkai Zhang,Xiuxu Ye,Aiping Gao,Xinyu Tan,Xinyu Tan,Sen Wang,Shuangyang Wu,Edgar B. Cahoon,Beibei Bai,Beibei Bai,Zhao Zhichang,Qian Li,Junya Wei,Chen Huarui,Luo Ruixiong,Deyong Gong,Kexuan Tang,Bing Zhang,Zhangguang Ni,Guodi Huang,Songnian Hu,Songnian Hu,Chen Yeyuan,Chen Yeyuan +34 more
TL;DR: Analysis of chromosome-scale mango genome sequences reveals photosynthesis and lipid metabolism are preferentially retained after a recent WGD event, and expansion of CHS genes is likely associated with urushiol biosynthesis in mango.
Evolution of DNA Methylation Patterns in the Brassicaceae is Driven by Differences in Genome Organization
TL;DR: It is found that the lineage-specific expansion and contraction of transposon and repeat sequences is the main driver of interspecific differences in DNA methylation, and the most heavily methylated portions of the genome are not conserved at the sequence level.
SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks
Carmen Bravo González-Blas,Seppe De Winter,Gert Hulselmans,Nikolai Hecker,Irina Matetovici,Valerie Christiaens,Suresh Poovathingal,Jasper Wouters,Sara Aibar,Stein Aerts +9 more
TL;DR: A new method for the inference of eGRNs, called SCENIC+.SCENIC+ predicts genomic enhancers along with candidate upstream transcription factors (TF) and links these enhancers to candidate target genes.
188
The crown-of-thorns starfish genome as a guide for biocontrol of this coral reef pest
Michael R. Hall,Kevin M. Kocot,Kenneth W. Baughman,Selene L. Fernandez-Valverde,Marie Gauthier,William L. Hatleberg,Arunkumar Krishnan,Carmel McDougall,Cherie A. Motti,Eiichi Shoguchi,Tianfang Wang,Xueyan Xiang,Min Zhao,Min Zhao,Utpal Bose,Utpal Bose,Chuya Shinzato,Kanako Hisata,Manabu Fujie,Miyuki Kanda,Scott F. Cummins,Noriyuki Satoh,Sandie M. Degnan,Bernard M. Degnan +23 more
TL;DR: Insight is provided into COTS-specific communication that may guide the generation of peptide mimetics for use on reefs with COTS outbreaks, and on water-borne chemical plumes released from aggregating COTS.
A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants
Sarah M. Pilkington,Ross N. Crowhurst,Elena Hilario,Simona Nardozza,Lena G. Fraser,Yongyan Peng,Yongyan Peng,Kularajathevan Gunaseelan,Robert M. Simpson,Jibran Tahir,Simon C. Deroles,Kerry Robert Templeton,Zhiwei Luo,Marcus Davy,Canhong Cheng,Mark A McNeilage,Davide Scaglione,Yifei Liu,Qiong Zhang,P. M. Datson,Nihal De Silva,Susan E. Gardiner,H. Bassett,David Chagné,John McCallum,Helge Dzierzon,Cecilia H. Deng,Yen-Yi Wang,Lorna Barron,Kelvina I. Manako,Judith H. Bowen,Toshi Foster,Zoe A. Erridge,Heather R. Tiffin,Chethi N. Waite,Kevin M. Davies,Ella R. P. Grierson,William A. Laing,Rebecca Kirk,Xiuyin Chen,Marion Wood,Mirco Montefiori,David A. Brummell,Kathy E. Schwinn,Andrew Catanach,Christina G. Fullerton,Dawei Li,Sathiyamoorthy Meiyalaghan,Niels J. Nieuwenhuizen,Nicola C. Read,Roneel Prakash,Donald A. Hunter,Huaibi Zhang,Marian J. McKenzie,Mareike Knäbel,Alastair Harris,Andrew C. Allan,Andrew C. Allan,Andrew P. Gleave,Angela Chen,Bart J. Janssen,Blue Plunkett,Charles Ampomah-Dwamena,Charlotte Voogd,Davin Leif,Davin Leif,Declan J. Lafferty,Edwige J. F. Souleyre,Erika Varkonyi-Gasic,Francesco Gambi,Jenny Hanley,Jia-Long Yao,Joey Cheung,Karine M. David,Ben Warren,K.B. Marsh,Kimberley C. Snowden,Kui Lin-Wang,Lara Brian,Marcela Martínez-Sánchez,Mindy Y. Wang,Nadeesha R. Ileperuma,Nikolai Macnee,Robert Campin,Peter A. McAtee,Revel S.M. Drummond,Richard V. Espley,Hilary S. Ireland,Rongmei Wu,Ross G. Atkinson,Sakuntala Karunairetnam,Sean Bulley,Shayhan Chunkath,Zac Hanley,Roy Storey,Amali H. Thrimawithana,Susan Thomson,Charles David,Raffaele Testolin,Hongwen Huang,Roger P. Hellens,Robert J. Schaffer,Robert J. Schaffer +102 more
TL;DR: The use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction, especially relevant for certain types of gene families such as the EXPANSIN like genes.
References
MUSCLE: multiple sequence alignment with high accuracy and high throughput
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
45.1K
•Book
Introduction to Algorithms
Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest +2 more
- 01 Jan 1990
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
24.8K
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
A Whole-Genome Assembly of Drosophila
Eugene W. Myers,Granger G. Sutton,Arthur L. Delcher,Ian M. Dew,Dan P. Fasulo,Michael Flanigan,Saul A. Kravitz,Clark M. Mobarry,Knut Reinert,Karin A. Remington,Eric L. Anson,Randall Bolanos,Hui-Hsien Chou,Catherine Jordan,Aaron L. Halpern,Stefano Lonardi,Ellen M. Beasley,Rhonda C. Brandon,Lin Chen,Patrick J. Dunn,Zhongwu Lai,Yong Liang,Deborah R. Nusskern,Ming Zhan,Qing Zhang,Xiangqun Zheng,Gerald M. Rubin,Mark Raymond Adams,J. Craig Venter +28 more
TL;DR: The quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it are reported on and should be of substantial value to the scientific community.
1.6K