BigDataScript: a scripting language for data pipelines

doi:10.1093/BIOINFORMATICS/BTU595

Open AccessJournal Article10.1093/BIOINFORMATICS/BTU595

BigDataScript: a scripting language for data pipelines

Pablo Cingolani, +2 more

- 01 Jan 2015

- Bioinformatics

- Vol. 31, Iss: 1, pp 10-16

39

TL;DR: By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.12688/F1000RESEARCH.29032.2

Sustainable data analysis with Snakemake.

Felix Mölder, +19 more

- 19 Apr 2021

- F1000Research

TL;DR: It is shown how the popular workflow management system Snakemake can be used to guarantee reproducibility, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

...read moreread less

1.2K

•Journal Article•10.1093/BIB/BBW020

A review of bioinformatic pipeline frameworks

Jeremy Leipzig

- 24 Mar 2016

- Briefings in Bioinformatics

TL;DR: The design philosophies of several current pipeline frameworks are surveyed and compared and practical recommendations are provided based on analysis requirements and the user base.

...read moreread less

294

•Journal Article•10.1093/GIGASCIENCE/GIZ037

GenPipes: an open-source framework for distributed and scalable genomic analyses

Mathieu Bourgey, +22 more

- 01 Jun 2019

- GigaScience

TL;DR: GenPipes is a flexible Python-based framework that facilitates the development and deployment of multi-step workflows optimized for high-performance computing clusters and the cloud, and offers genomics researchers a simple method to analyze different types of data.

...read moreread less

174

Journal Article•10.1016/J.SCITOTENV.2021.147798

Deciphering microbial mechanisms underlying soil organic carbon storage in a wheat-maize rotation system.

Xingjie Wu, +8 more

- 15 May 2021

- Science of The Total Environment

TL;DR: In this article, a link between microbial life history strategies and soil organic carbon storage in agroecosystems is presumed, but largely unexplored at the gene level, and the authors aimed to elucidate whether and how differential organic material amendments (manure versus peat-vermiculite) affect, relative to sole chemical fertilizer application, the link between microorganisms' life history strategy and soil carbon storage, in a wheat-maize rotation field experiment.

...read moreread less

40

•Posted Content•10.1101/041236

NextflowWorkbench: Reproducible and Reusable Workflows for Beginners and Experts

Jason P Kurs, +2 more

- 24 Feb 2016

- bioRxiv

TL;DR: The NextflowWorkbench is presented, which was designed for both beginners and experts, and blends the distinction between user interface and scripting language, and extends and reuses the popular Nextflow workflow description language and shares its advantages.

...read moreread less

23

...

Expand

References

•Journal Article•10.1093/BIOINFORMATICS/BTP324

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, +1 more

- 01 Jul 2009

- Bioinformatics

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.

...read moreread less

55.5K

•Journal Article•10.1101/GR.107524.110

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Aaron McKenna, +10 more

- 01 Sep 2010

- Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

27.2K

•Journal Article•10.4161/FLY.19695

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3

Pablo Cingolani, +8 more

- 01 Apr 2012

- Fly

TL;DR: It appears that the 5′ and 3′ UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus.

...read moreread less

10.7K

•Journal Article•10.1093/BIOINFORMATICS/BTS480

Snakemake--a scalable bioinformatics workflow engine.

Johannes Köster, +1 more

- 01 Oct 2012

- Bioinformatics

TL;DR: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow.

...read moreread less

2.5K

Related Papers (5)

Fast and accurate short read alignment with Burrows–Wheeler transform

[...]

Heng Li, +1 more

- 01 Jul 2009

- Bioinformatics

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

[...]

Jeremy Goecks, +2 more

- 25 Aug 2010

- Genome Biology

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li, +8 more

- 01 Aug 2009

- Bioinformatics

From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline

[...]

Géraldine A. Van der Auwera, +14 more

- 15 Oct 2013

- Current protocols in human genetics

BigDataScript: a scripting language for data pipelines

Chat with Paper

AI Agents for this Paper

Citations

Sustainable data analysis with Snakemake.

A review of bioinformatic pipeline frameworks

GenPipes: an open-source framework for distributed and scalable genomic analyses

Deciphering microbial mechanisms underlying soil organic carbon storage in a wheat-maize rotation system.

NextflowWorkbench: Reproducible and Reusable Workflows for Beginners and Experts

References

Fast and accurate short read alignment with Burrows–Wheeler transform

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3

From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline

Snakemake--a scalable bioinformatics workflow engine.

Related Papers (5)

Fast and accurate short read alignment with Burrows–Wheeler transform

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

The Sequence Alignment/Map format and SAMtools

From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline

A framework for variation discovery and genotyping using next-generation DNA sequencing data