Top 117 papers published in the topic of Bioconductor in 2017

Showing papers on "Bioconductor published in 2017"

Journal Article•10.1111/2041-210X.12628•

ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data

[...]

Guangchuang Yu¹, David K. Smith¹, Hongbo Zhu¹, Yi Guan¹, Tommy Tsan-Yuk Lam¹ - Show less +1 more•Institutions (1)

01 Jan 2017-Methods in Ecology and Evolution

TL;DR: An r package, ggtree, which provides programmable visualization and annotation of phylogenetic trees, which can read more tree file formats than other softwares, and support visualization of phylo, multiphylo, phylo4, phyla4d, obkdata and phyloseq tree objects defined in other r packages.

...read moreread less

Abstract: Summary We present an r package, ggtree, which provides programmable visualization and annotation of phylogenetic trees. ggtree can read more tree file formats than other softwares, including newick, nexus, NHX, phylip and jplace formats, and support visualization of phylo, multiphylo, phylo4, phylo4d, obkdata and phyloseq tree objects defined in other r packages. It can also extract the tree/branch/node-specific and other data from the analysis outputs of beast, epa, hyphy, paml, phylodog, pplacer, r8s, raxml and revbayes software, and allows using these data to annotate the tree. The package allows colouring and annotation of a tree by numerical/categorical node attributes, manipulating a tree by rotating, collapsing and zooming out clades, highlighting user selected clades or operational taxonomic units and exploration of a large tree by zooming into a selected portion. A two-dimensional tree can be drawn by scaling the tree width based on an attribute of the nodes. A tree can be annotated with an associated numerical matrix (as a heat map), multiple sequence alignment, subplots or silhouette images. The package ggtree is released under the artistic-2.0 license. The source code and documents are freely available through bioconductor (http://www.bioconductor.org/packages/ggtree).

...read moreread less

4,275 citations

Journal Article•10.1093/BIOINFORMATICS/BTW777•

Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.

[...]

Davis J. McCarthy¹, Kieran R. Campbell², Aaron T. L. Lun³, Quin F. Wills⁴, Quin F. Wills² - Show less +1 more•Institutions (4)

European Bioinformatics Institute¹, Wellcome Trust Centre for Human Genetics², University of Cambridge³, John Radcliffe Hospital⁴

15 Apr 2017-Bioinformatics

TL;DR: The R/Bioconductor package scater is developed to facilitate rigorous pre‐processing, quality control, normalization and visualization of scRNA‐seq data and provides a convenient, flexible workflow to process raw sequencing reads into a high‐quality expression dataset ready for downstream analysis.

...read moreread less

Abstract: Single-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalization.We have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalization and visualization of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development.The open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater .davis@ebi.ac.uk.Supplementary data are available at Bioinformatics online.

...read moreread less

1,661 citations

Journal Article•10.1093/BIOINFORMATICS/BTX513•

ChAMP: updated methylation analysis pipeline for Illumina BeadChips.

[...]

Yuan Tian¹, Tiffany Morris, Amy P. Webster², Zhen Yang¹, Stephan Beck², Andrew Feber², Andrew E. Teschendorff², Andrew E. Teschendorff¹ - Show less +4 more•Institutions (2)

CAS-MPG Partner Institute for Computational Biology¹, University College London²

15 Dec 2017-Bioinformatics

TL;DR: A significantly updated and improved version of the Bioconductor package ChAMP, which can be used to analyze EPIC and 450k data and many enhanced functionalities have been added, including correction for cell‐type heterogeneity, network analysis and a series of interactive graphical user interfaces.

...read moreread less

Abstract: Summary: The Illumina Infinium HumanMethylationEPIC BeadChip is the new platform for high-throughput DNA methylation analysis, effectively doubling the coverage compared to the older 450 K array. Here we present a significantly updated and improved version of the Bioconductor package ChAMP, which can be used to analyze EPIC and 450k data. Many enhanced functionalities have been added, including correction for cell-type heterogeneity, network analysis and a series of interactive graphical user interfaces. / Availability and implementation: ChAMP is a BioC package available from https://bioconductor.org/packages/release/bioc/html/ChAMP.html. / Contact: a.teschendorff@ucl.ac.uk or s.beck@ucl.ac.uk or a.feber@ucl.ac.uk / Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

906 citations

Journal Article•10.1186/S13059-017-1305-0•

Splatter: simulation of single-cell RNA sequencing data

[...]

Luke Zappia¹, Luke Zappia², Belinda Phipson¹, Alicia Oshlack², Alicia Oshlack¹ - Show less +1 more•Institutions (2)

Royal Children's Hospital¹, University of Melbourne²

12 Sep 2017-Genome Biology

TL;DR: The Splatter Bioconductor package is presented for simple, reproducible, and well-documented simulation of scRNA-seq data and provides an interface to multiple simulation methods including Splatter, the authors' own simulation, based on a gamma-Poisson distribution.

...read moreread less

Abstract: As single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed, and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here, we present the Splatter Bioconductor package for simple, reproducible, and well-documented simulation of scRNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types, or differentiation paths.

...read moreread less

867 citations

Journal Article•10.1093/BIOINFORMATICS/BTX346•

karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data.

[...]

Bernat Gel, Eduard Serra

01 Oct 2017-Bioinformatics

TL;DR: KaryoploteR as mentioned in this paper is an R/Bioconductor package to create linear chromosomal representations of any genome with genomic annotations and experimental data plotted along them, which allows the creation of highly customizable plots from arbitrary data with complete freedom on data positioning and representation.

...read moreread less

Abstract: Motivation Data visualization is a crucial tool for data exploration, analysis and interpretation. For the visualization of genomic data there lacks a tool to create customizable non-circular plots of whole genomes from any species. Results We have developed karyoploteR, an R/Bioconductor package to create linear chromosomal representations of any genome with genomic annotations and experimental data plotted along them. Plot creation process is inspired in R base graphics, with a main function creating karyoplots with no data and multiple additional functions, including custom functions written by the end-user, adding data and other graphical elements. This approach allows the creation of highly customizable plots from arbitrary data with complete freedom on data positioning and representation. Availability and implementation karyoploteR is released under Artistic-2.0 License. Source code and documentation are freely available through Bioconductor (http://www.bioconductor.org/packages/karyoploteR) and at the examples and tutorial page at https://bernatgel.github.io/karyoploter_tutorial. Contact bgel@igtp.cat.

...read moreread less

692 citations

Journal Article•10.1093/BIOINFORMATICS/BTX162•

SynergyFinder: a web application for analyzing drug combination dose-response matrix data.

[...]

Aleksandr Ianevski¹, Liye He¹, Tero Aittokallio¹, Tero Aittokallio², Jing Tang¹, Jing Tang² - Show less +2 more•Institutions (2)

University of Helsinki¹, University of Turku²

01 Aug 2017-Bioinformatics

TL;DR: A web application is implemented that uses key functions of R‐package SynergyFinder, and provides not only the flexibility of using multiple synergy scoring models, but also a user‐friendly interface for visualizing the drug combination landscapes in an interactive manner.

...read moreread less

Abstract: Summary Rational design of drug combinations has become a promising strategy to tackle the drug sensitivity and resistance problem in cancer treatment. To systematically evaluate the pre-clinical significance of pairwise drug combinations, functional screening assays that probe combination effects in a dose-response matrix assay are commonly used. To facilitate the analysis of such drug combination experiments, we implemented a web application that uses key functions of R-package SynergyFinder, and provides not only the flexibility of using multiple synergy scoring models, but also a user-friendly interface for visualizing the drug combination landscapes in an interactive manner. Availability and implementation The SynergyFinder web application is freely accessible at https://synergyfinder.fimm.fi ; The R-package and its source-code are freely available at http://bioconductor.org/packages/release/bioc/html/synergyfinder.html . Contact jing.tang@helsinki.fi.

...read moreread less

518 citations

Journal Article•10.1038/NMETH.4468•

Accessible, curated metagenomic data through ExperimentHub

[...]

Edoardo Pasolli¹, Lucas Schiffer², Paolo Manghi¹, Audrey Renson², Valerie Obenchain², Duy Tin Truong¹, Francesco Beghini¹, Faizan Malik², Marcel Ramos³, Marcel Ramos², Jennifer Beam Dowd⁴, Jennifer Beam Dowd², Curtis Huttenhower⁵, Curtis Huttenhower⁶, Martin Morgan³, Nicola Segata¹, Levi Waldron² - Show less +13 more•Institutions (6)

University of Trento¹, City University of New York², Roswell Park Cancer Institute³, King's College London⁴, Harvard University⁵, Broad Institute⁶

31 Oct 2017-Nature Methods

TL;DR: This paper aims to demonstrate the efforts towards in-situ applicability of EMMARM, which aims to provide real-time information about the human microbiome and its role in disease and disease progression.

...read moreread less

Abstract: We present curatedMetagenomicData, a Bioconductor and command-line resource providing thousands of metagenomic profiles from the Human Microbiome Project and other publicly available datasets, and ExperimentHub, for convenient cloud-based distribution of data to the R desktop. curatedMetagenomicData provides standardized per-participant metadata linked to bacterial, fungal, archaeal, and viral taxonomic abundances, as well as quantitative metabolic functional profiles, generated by the HUMAnN2 and MetaPhlAn2 pipelines. The resulting datasets can be immediately analyzed with a wide range of statistical methods, requiring a minimum of bioinformatic expertise and no preprocessing of data. We demonstrate exploratory data analysis, an investigation of gut "enterotypes", and a comparison of the accuracy of disease classification from different data types. These documented analyses can be reproduced efficiently on a laptop, without the barriers of working with large-scale, raw sequencing data. The development of curatedMetagenomicData will continue with the addition, curation, and analysis of further microbiome datasets.

...read moreread less

440 citations

Journal Article•10.1093/BIOINFORMATICS/BTX183•

annotatr: genomic regions in context.

[...]

Raymond G. Cavalcante, Maureen A. Sartor¹•Institutions (1)

University of Michigan¹

01 Aug 2017-Bioinformatics

TL;DR: The annotatr Bioconductor package is developed to flexibly and quickly summarize and plot annotations of genomic regions, giving a better understanding of the genomic context of the regions.

...read moreread less

Abstract: Motivation Analysis of next-generation sequencing data often results in a list of genomic regions. These may include differentially methylated CpGs/regions, transcription factor binding sites, interacting chromatin regions, or GWAS-associated SNPs, among others. A common analysis step is to annotate such genomic regions to genomic annotations (promoters, exons, enhancers, etc.). Existing tools are limited by a lack of annotation sources and flexible options, the time it takes to annotate regions, an artificial one-to-one region-to-annotation mapping, a lack of visualization options to easily summarize data, or some combination thereof. Results We developed the annotatr Bioconductor package to flexibly and quickly summarize and plot annotations of genomic regions. The annotatr package reports all intersections of regions and annotations, giving a better understanding of the genomic context of the regions. A variety of graphics functions are implemented to easily plot numerical or categorical data associated with the regions across the annotations, and across annotation intersections, providing insight into how characteristics of the regions differ across the annotations. We demonstrate that annotatr is up to 27× faster than comparable R packages. Overall, annotatr enables a richer biological interpretation of experiments. Availability and implementation http://bioconductor.org/packages/annotatr/ and https://github.com/rcavalcante/annotatr. Contact rcavalca@umich.edu. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

396 citations

Journal Article•10.12688/F1000RESEARCH.11622.3•

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

[...]

Malgorzata Nowicka¹, Malgorzata Nowicka², Carsten Krieg¹, Helena L. Crowell², Helena L. Crowell¹, Lukas M. Weber², Lukas M. Weber¹, Felix J. Hartmann¹, Silvia Guglietta³, Burkhard Becher¹, Mitchell P. Levesque¹, Mark D. Robinson¹, Mark D. Robinson² - Show less +9 more•Institutions (3)

University of Zurich¹, Swiss Institute of Bioinformatics², European Institute of Oncology³

26 May 2017-F1000Research

TL;DR: An updated R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages is presented, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled.

...read moreread less

Abstract: High-dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high-throughput interrogation and characterization of cell populations. Here, we present an updated R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signalling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models or linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g., multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g., plots of aggregated signals).

...read moreread less

329 citations

Journal Article•10.1093/BIOINFORMATICS/BTW580•

DAPAR & ProStaR: software to perform statistical analyses in quantitative discovery proteomics

[...]

Samuel Wieczorek¹, Samuel Wieczorek², Florence Combes¹, Florence Combes², Cosmin Lazar¹, Cosmin Lazar², Quentin Giai Gianetto², Quentin Giai Gianetto¹, Laurent Gatto, Alexia Dorffer¹, Alexia Dorffer², Anne-Marie Hesse², Anne-Marie Hesse¹, Yohann Couté², Yohann Couté¹, Myriam Ferro¹, Myriam Ferro², Christophe Bruley¹, Christophe Bruley², Thomas Burger - Show less +16 more•Institutions (2)

French Institute of Health and Medical Research¹, University of Grenoble²

01 Jan 2017-Bioinformatics

TL;DR: DAPAR and ProStaR are software tools to perform the statistical analysis of label-free XIC-based quantitative discovery proteomics experiments and contain procedures to filter, normalize, impute missing value, aggregate peptide intensities, perform null hypothesis significance tests and select the most likely differentially abundant proteins with a corresponding false discovery rate.

...read moreread less

Abstract: DAPAR and ProStaR are software tools to perform the statistical analysis of label-free XIC-based quantitative discovery proteomics experiments. DAPAR contains procedures to filter, normalize, impute missing value, aggregate peptide intensities, perform null hypothesis significance tests and select the most likely differentially abundant proteins with a corresponding false discovery rate. ProStaR is a graphical user interface that allows friendly access to the DAPAR functionalities through a web browser. AVAILABILITY AND IMPLEMENTATION DAPAR and ProStaR are implemented in the R language and are available on the website of the Bioconductor project (http://www.bioconductor.org/). A complete tutorial and a toy dataset are accompanying the packages. CONTACT samuel.wieczorek@cea.fr, florence.combes@cea.fr, thomas.burger@cea.fr.

...read moreread less

319 citations

Journal Article•10.1093/BIOINFORMATICS/BTX378•

CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization.

[...]

Taosheng Xu¹, Thuc Duy Le², Lin Liu², Su Ning¹, Rujing Wang¹, Bingyu Sun¹, Antonio Colaprico³, Gianluca Bontempi³, Jiuyong Li² - Show less +5 more•Institutions (3)

Hefei Institutes of Physical Science¹, University of South Australia², Université libre de Bruxelles³

01 Oct 2017-Bioinformatics

TL;DR: CancerSubtypes is an R package for identifying cancer subtypes using multi‐omics data, including gene expression, miRNA expression and DNA methylation data that provides a standardized framework for data pre‐processing, feature selection, and result follow‐up analyses.

...read moreread less

Abstract: Summary Identifying molecular cancer subtypes from multi-omics data is an important step in the personalized medicine. We introduce CancerSubtypes, an R package for identifying cancer subtypes using multi-omics data, including gene expression, miRNA expression and DNA methylation data. CancerSubtypes integrates four main computational methods which are highly cited for cancer subtype identification and provides a standardized framework for data pre-processing, feature selection, and result follow-up analyses, including results computing, biology validation and visualization. The input and output of each step in the framework are packaged in the same data format, making it convenience to compare different methods. The package is useful for inferring cancer subtypes from an input genomic dataset, comparing the predictions from different well-known methods and testing new subtype discovery methods, as shown with different application scenarios in the Supplementary Material. Availability and implementation The package is implemented in R and available under GPL-2 license from the Bioconductor website (http://bioconductor.org/packages/CancerSubtypes/). Contact thuc.le@unisa.edu.au or jiuyong.li@unisa.edu.au. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTX094•

Glimma: interactive graphics for gene expression analysis

[...]

Shian Su¹, Charity W. Law¹, Charity W. Law², Casey Ah-Cann², Casey Ah-Cann¹, Marie Liesse Asselin-Labat¹, Marie Liesse Asselin-Labat², Marnie E. Blewitt¹, Marnie E. Blewitt², Matthew E. Ritchie¹, Matthew E. Ritchie² - Show less +7 more•Institutions (2)

Walter and Eliza Hall Institute of Medical Research¹, University of Melbourne²

01 Jul 2017-Bioinformatics

TL;DR: The open‐source Glimma package creates interactive graphics for exploring gene expression analysis with a few simple R commands, and extends popular plots found in the limma package, to allow individual data points to be queried and additional annotation information to be displayed upon hovering or selecting particular points.

...read moreread less

Abstract: Motivation graphics for RNA-sequencing and microarray gene expression analyses may contain upwards of tens of thousands of points. Details about certain genes or samples of interest are easily obscured in such dense summary displays. Incorporating interactivity into summary plots would enable additional information to be displayed on demand and facilitate intuitive data exploration. Results The open-source Glimma package creates interactive graphics for exploring gene expression analysis with a few simple R commands. It extends popular plots found in the limma package, such as multi-dimensional scaling plots and mean-difference plots, to allow individual data points to be queried and additional annotation information to be displayed upon hovering or selecting particular points. It also offers links between plots so that more information can be revealed on demand. Glimma is widely applicable, supporting data analyses from a number of well-established Bioconductor workflows ( limma , edgeR and DESeq2 ) and uses D3/JavaScript to produce HTML pages with interactive displays that enable more effective data exploration by end-users. Results from Glimma can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility. Availability and implementation The Glimma R package is available from http://bioconductor.org/packages/Glimma/ . Contact su.s@wehi.edu.au , law@wehi.edu.au or mritchie@wehi.edu.au.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTX413•

pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R.

[...]

Jiří Hon¹, Tomáš Martínek¹, Jaroslav Zendulka¹, Matej Lexa²•Institutions (2)

Brno University of Technology¹, Masaryk University²

01 Nov 2017-Bioinformatics

TL;DR: A newly developed Bioconductor package for identifying potential quadruplex‐forming sequences (PQS), which allows for sequence searches that accommodate possible divergences from the optimal G4 base composition and demonstrates that the algorithm behind the searches has a 96% accuracy.

...read moreread less

Abstract: Motivation: G-quadruplexes (G4s) are one of the non-B DNA structures easily observed in vitro and assumed to form in vivo. The latest experiments with G4-specific antibodies and G4-unwinding helicase mutants confirm this conjecture. These four-stranded structures have also been shown to influence a range of molecular processes in cells. As G4s are intensively studied, it is often desirable to screen DNA sequences and pinpoint the precise locations where they might form. Results: We describe and have tested a newly-developed Bioconductor package for identifying potential quadruplex-forming sequences (PQS). The package is easy-to-use, flexible and customizable. It allows for sequence searches that accommodate possible divergences from the optimal G4 base composition. A novel aspect of our research was the creation and training (parametrization) of an advanced scoring model which resulted in increased precision compared to similar tools. We demonstrate that the algorithm behind the searches has a 96% accuracy on 392 currently known and experimentally observed G4 structures. We also carried out searches against the recent G4-seq data to verify how well we can identify the structures detected by that technology. The correlation with pqsfinder predictionswas 0.622, higher than the correlation 0.491 obtained with the second best G4Hunter. Availability:http://bioconductor.org/packages/pqsfinder/ This paper is based on pqsfinder-1.4.1.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.1005562•

ROTS: An R package for reproducibility-optimized statistical testing

[...]

Tomi Suomi¹, Tomi Suomi², Fatemeh Seyednasrollah¹, Fatemeh Seyednasrollah², Maria K. Jaakkola², Maria K. Jaakkola¹, Thomas Faux², Laura L. Elo² - Show less +4 more•Institutions (2)

University of Turku¹, Åbo Akademi University²

25 May 2017-PLOS Computational Biology

TL;DR: A Bioconductor R package for performing ROTS analysis conveniently on different types of omics data is introduced, and three case studies, involving proteomics and RNA-seq data from public repositories, are presented.

...read moreread less

Abstract: Differential expression analysis is one of the most common types of analyses performed on various biological data (eg RNA-seq or mass spectrometry proteomics) It is the process that detects features, such as genes or proteins, showing statistically significant differences between the sample groups under comparison A major challenge in the analysis is the choice of an appropriate test statistic, as different statistics have been shown to perform well in different datasets To this end, the reproducibility-optimized test statistic (ROTS) adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on their statistical evidence for differential expression between two groups ROTS has already been successfully applied in a range of different studies from transcriptomics to proteomics, showing competitive performance against other state-of-the-art methods To promote its widespread use, we introduce here a Bioconductor R package for performing ROTS analysis conveniently on different types of omics data To illustrate the benefits of ROTS in various applications, we present three case studies, involving proteomics and RNA-seq data from public repositories, including both bulk and single cell data The package is freely available from Bioconductor (https://wwwbioconductororg/packages/ROTS)

...read moreread less

Journal Article•10.1186/S12885-017-3689-3•

GRcalculator: an online tool for calculating and mining dose–response data

[...]

Nicholas A. Clark¹, Marc Hafner², Michal Kouril³, Elizabeth H. Williams², Jeremy L. Muhlich², Marcin Pilarczyk¹, Mario Niepel², Peter K. Sorger², Mario Medvedovic¹ - Show less +5 more•Institutions (3)

University of Cincinnati¹, Harvard University², Cincinnati Children's Hospital Medical Center³

24 Oct 2017-BMC Cancer

TL;DR: GRcalculator is a powerful, user-friendly, and free tool that provides a unified platform for investigators to analyze dose–response data across diverse cell types and perturbagens and facilitates inclusion of GR metrics calculations within existing R analysis pipelines.

...read moreread less

Abstract: Quantifying the response of cell lines to drugs or other perturbagens is the cornerstone of pre-clinical drug development and pharmacogenomics as well as a means to study factors that contribute to sensitivity and resistance. In dividing cells, traditional metrics derived from dose–response curves such as IC 50 , AUC, and E max , are confounded by the number of cell divisions taking place during the assay, which varies widely for biological and experimental reasons. Hafner et al. (Nat Meth 13:521–627, 2016) recently proposed an alternative way to quantify drug response, normalized growth rate (GR) inhibition, that is robust to such confounders. Adoption of the GR method is expected to improve the reproducibility of dose–response assays and the reliability of pharmacogenomic associations (Hafner et al. 500–502, 2017). We describe here an interactive website ( www.grcalculator.org ) for calculation, analysis, and visualization of dose–response data using the GR approach and for comparison of GR and traditional metrics. Data can be user-supplied or derived from published datasets. The web tools are implemented in the form of three integrated Shiny applications (grcalculator, grbrowser, and grtutorial) deployed through a Shiny server. Intuitive graphical user interfaces (GUIs) allow for interactive analysis and visualization of data. The Shiny applications make use of two R packages (shinyLi and GRmetrics) specifically developed for this purpose. The GRmetrics R package is also available via Bioconductor and can be used for offline data analysis and visualization. Source code for the Shiny applications and associated packages (shinyLi and GRmetrics) can be accessed at www.github.com/uc-bd2k/grcalculator and www.github.com/datarail/gr_metrics . GRcalculator is a powerful, user-friendly, and free tool to facilitate analysis of dose–response data. It generates publication-ready figures and provides a unified platform for investigators to analyze dose–response data across diverse cell types and perturbagens (including drugs, biological ligands, RNAi, etc.). GRcalculator also provides access to data collected by the NIH LINCS Program ( http://www.lincsproject.org /) and other public domain datasets. The GRmetrics Bioconductor package provides computationally trained users with a platform for offline analysis of dose–response data and facilitates inclusion of GR metrics calculations within existing R analysis pipelines. These tools are therefore well suited to users in academia as well as industry.

...read moreread less

Journal Article•10.1158/0008-5472.CAN-17-0344•

Software for the Integration of Multiomics Experiments in Bioconductor.

[...]

Marcel Ramos¹, Marcel Ramos², Lucas Schiffer², Angela Re³, Rimsha Azhar², Azfar Basunia⁴, Carmen Rodríguez², Tiffany Chan², Phil Chapman⁵, Sean Davis, David Gomez-Cabrero⁶, Aedín C. Culhane⁴, Benjamin Haibe-Kains, Kasper D. Hansen⁷, Hanish Kodali², Marie Stephie Louis², Arvind Singh Mer⁸, Markus Riester⁹, Martin Morgan¹, Vincent J. Carey⁴, Vincent J. Carey¹⁰, Levi Waldron² - Show less +18 more•Institutions (10)

Roswell Park Cancer Institute¹, City University of New York², Istituto Italiano di Tecnologia³, Harvard University⁴, University of Manchester⁵, King's College London⁶, Johns Hopkins University⁷, University Health Network⁸, Novartis⁹, Brigham and Women's Hospital¹⁰

01 Nov 2017-Cancer Research

TL;DR: The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable, and reproducible statistical analysis of multiomics data and enhances data science applications of multiple omics datasets.

...read moreread less

Abstract: Multiomics experiments are increasingly commonplace in biomedical research and add layers of complexity to experimental design, data integration, and analysis. R and Bioconductor provide a generic framework for statistical analysis and visualization, as well as specialized data classes for a variety of high-throughput data types, but methods are lacking for integrative analysis of multiomics experiments. The MultiAssayExperiment software package, implemented in R and leveraging Bioconductor software and design principles, provides for the coordinated representation of, storage of, and operation on multiple diverse genomics data. We provide the unrestricted multiple 'omics data for each cancer tissue in The Cancer Genome Atlas as ready-to-analyze MultiAssayExperiment objects and demonstrate in these and other datasets how the software simplifies data representation, statistical analysis, and visualization. The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable, and reproducible statistical analysis of multiomics data and enhances data science applications of multiple omics datasets. Cancer Res; 77(21); e39-42. ©2017 AACR.

...read moreread less

Posted Content•10.1101/096107•

Glimma: interactive graphics for gene expression analysis

[...]

Shian Su¹, Charity W. Law¹, Casey Ah-Cann¹, Marie Liesse Asselin-Labat¹, Marnie E. Blewitt¹, Matthew E. Ritchie¹ - Show less +2 more•Institutions (1)

Walter and Eliza Hall Institute of Medical Research¹

05 Jan 2017-bioRxiv

TL;DR: The open-source Glimma package creates interactive graphics for exploring gene expression analysis with a few simple R commands, and extends popular plots found in the limma package to allow individual data points to be queried and additional annotation information to be displayed upon hovering or selecting particular points.

...read moreread less

Abstract: Motivation: Summary graphics for RNA-sequencing and microarray gene expression analyses may contain upwards of tens of thousands of points. Details about certain genes or samples of interest are easily obscured in such dense summary displays. Incorporating interactivity into summary plots would enable additional information to be displayed on demand and facilitate intuitive data exploration. Results: The open-source Glimma package creates interactive graphics for exploring gene expression analysis with a few simple R commands. It extends popular plots found in the limma package, such as multi-dimensional scaling plots and mean-difference plots, to allow individual data points to be queried and additional annotation information to be displayed upon hovering or selecting particular points. It also offers links between plots so that more information can be revealed on demand. Glimma is widely applicable, supporting data analyses from a number of well established Bioconductor workflows (limma, edgeR and DESeq2) and uses D3/JavaScript to produce HTML pages with interactive displays that enable more effective data exploration by end-users. Results from Glimma can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility. Availability and Implementation: The Glimma R package is available from http://bioconductor.org/packages/devel/bioc/html/Glimma.html.

...read moreread less

Journal Article•10.1093/NAR/GKW852•

Flexible expressed region analysis for RNA-seq with derfinder

[...]

Leonardo Collado Torres¹, Abhinav Nellore¹, Alyssa C. Frazee¹, Christopher Wilks¹, Michael I. Love², Ben Langmead¹, Rafael A. Irizarry², Jeffrey T. Leek¹, Andrew E. Jaffe - Show less +5 more•Institutions (2)

Johns Hopkins University¹, Harvard University²

25 Jan 2017-Nucleic Acids Research

TL;DR: The derfinder software is presented, implementing a computationally efficient bump-hunting approach to identify DERs that permits genome-scale analyses in a large number of samples, and introducing a flexible statistical modeling framework, including multi-group and time-course analyses.

...read moreread less

Abstract: Differential expression analysis of RNA sequencing (RNA-seq) data typically relies on reconstructing transcripts or counting reads that overlap known gene structures. We previously introduced an intermediate statistical approach called differentially expressed region (DER) finder that seeks to identify contiguous regions of the genome showing differential expression signal at single base resolution without relying on existing annotation or potentially inaccurate transcript assembly.We present the derfinder software that improves our annotation-agnostic approach to RNA-seq analysis by: (i) implementing a computationally efficient bump-hunting approach to identify DERs that permits genome-scale analyses in a large number of samples, (ii) introducing a flexible statistical modeling framework, including multi-group and time-course analyses and (iii) introducing a new set of data visualizations for expressed region analysis. We apply this approach to public RNA-seq data from the Genotype-Tissue Expression (GTEx) project and BrainSpan project to show that derfinder permits the analysis of hundreds of samples at base resolution in R, identifies expression outside of known gene boundaries and can be used to visualize expressed regions at base-resolution. In simulations, our base resolution approaches enable discovery in the presence of incomplete annotation and is nearly as powerful as feature-level methods when the annotation is complete.derfinder analysis using expressed region-level and single base-level approaches provides a compromise between full transcript reconstruction and feature-level analysis. The package is available from Bioconductor at www.bioconductor.org/packages/derfinder.

...read moreread less

Journal Article•10.12688/F1000RESEARCH.12544.1•

Easy and efficient ensemble gene set testing with EGSEA

[...]

Monther Alhamdoosh¹, Charity W. Law², Charity W. Law³, Luyi Tian³, Luyi Tian², Julie Sheridan², Julie Sheridan³, Milica Ng¹, Matthew E. Ritchie³, Matthew E. Ritchie² - Show less +6 more•Institutions (3)

CSL Limited¹, Walter and Eliza Hall Institute of Medical Research², University of Melbourne³

14 Nov 2017-F1000Research

TL;DR: This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer.

...read moreread less

Abstract: Gene set enrichment analysis is a popular approach for prioritising the biological processes perturbed in genomic datasets. The Bioconductor project hosts over 80 software packages capable of gene set analysis. Most of these packages search for enriched signatures amongst differentially regulated genes to reveal higher level biological themes that may be missed when focusing only on evidence from individual genes. With so many different methods on offer, choosing the best algorithm and visualization approach can be challenging. The EGSEA package solves this problem by combining results from up to 12 prominent gene set testing algorithms to obtain a consensus ranking of biologically relevant results.This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer. Following data normalization and set-up of an appropriate linear model for differential expression analysis, EGSEA builds gene signature specific indexes that link a wide range of mouse or human gene set collections obtained from MSigDB, GeneSetDB and KEGG to the gene expression data being investigated. EGSEA is then configured and the ensemble enrichment analysis run, returning an object that can be queried using several S4 methods for ranking gene sets and visualizing results via heatmaps, KEGG pathway views, GO graphs, scatter plots and bar plots. Finally, an HTML report that combines these displays can fast-track the sharing of results with collaborators, and thus expedite downstream biological validation. EGSEA is simple to use and can be easily integrated with existing gene expression analysis pipelines for both human and mouse data.

...read moreread less

Journal Article•10.1093/NAR/GKW1193•

QSEA-modelling of genome-wide DNA methylation from sequencing enrichment experiments.

[...]

Matthias Lienhard¹, Sabrina Grasse², Jana Rolff, Steffen Frese, Uwe Schirmer³, Michael Becker, Stefan T. Börno¹, Bernd Timmermann¹, Lukas Chavez³, Holger Sültmann³, Gunda Leschber, Iduna Fichtner, Michal R. Schweiger¹, Michal R. Schweiger², Ralf Herwig¹ - Show less +11 more•Institutions (3)

Max Planck Society¹, Epigenomics AG², German Cancer Research Center³

07 Apr 2017-Nucleic Acids Research

TL;DR: The central part of the workflow is developed a Bayesian statistical model that transforms the enrichment read counts to absolute levels of methylation and, thus, enhances interpretability and facilitates comparison with other methylation assays.

...read moreread less

Abstract: Genome-wide enrichment of methylated DNA followed by sequencing (MeDIP-seq) offers a reasonable compromise between experimental costs and genomic coverage. However, the computational analysis of these experiments is complex, and quantification of the enrichment signals in terms of absolute levels of methylation requires specific transformation. In this work, we present QSEA, Quantitative Sequence Enrichment Analysis, a comprehensive workflow for the modelling and subsequent quantification of MeDIP-seq data. As the central part of the workflow we have developed a Bayesian statistical model that transforms the enrichment read counts to absolute levels of methylation and, thus, enhances interpretability and facilitates comparison with other methylation assays. We suggest several calibration strategies for the critical parameters of the model, either using additional data or fairly general assumptions. By comparing the results with bisulfite sequencing (BS) validation data, we show the improvement of QSEA over existing methods. Additionally, we generated a clinically relevant benchmark data set consisting of methylation enrichment experiments (MeDIP-seq), BS-based validation experiments (Methyl-seq) as well as gene expression experiments (RNA-seq) derived from non-small cell lung cancer patients, and show that the workflow retrieves well-known lung tumour methylation markers that are causative for gene expression changes, demonstrating the applicability of QSEA for clinical studies. QSEA is implemented in R and available from the Bioconductor repository 3.4 (www.bioconductor.org/packages/qsea).

...read moreread less

Journal Article•10.1093/BIB/BBW128•

Transformation and model choice for RNA-seq co-expression analysis.

[...]

Andrea Rau¹, Cathy Maugis-Rabusseau²•Institutions (2)

Université Paris-Saclay¹, Institut de Mathématiques de Toulouse²

08 Jan 2017-Briefings in Bioinformatics

TL;DR: This work investigates the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data.

...read moreread less

Abstract: Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.

...read moreread less

Posted Content•10.1101/235382•

Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq

[...]

Michael B. Cole¹, Davide Risso², Allon Wagner¹, David DeTomaso¹, John Ngai¹, Elizabeth Purdom¹, Sandrine Dudoit¹, Nir Yosef¹ - Show less +4 more•Institutions (2)

University of California, Berkeley¹, Cornell University²

16 Dec 2017-bioRxiv

TL;DR: It is shown that scone is able to correctly rank normalization methods according to their performance in a given dataset and that selecting the best performing normalization leads to higher agreement with independent validation data than lowly-ranked methods.

...read moreread less

Abstract: Systematic measurement biases make data normalization an essential preprocessing step in single-cell RNA sequencing (scRNA-seq) analysis. There may be multiple, competing considerations behind the assessment of normalization performance, some of them study-specific. Because normalization can have a large impact on downstream results (e.g., clustering and differential expression), it is critically important that practitioners assess the performance of competing methods. We have developed scone - a flexible framework for assessing normalization performance based on a comprehensive panel of data-driven metrics. Through graphical summaries and quantitative reports, scone summarizes performance trade-offs and ranks large numbers of normalization methods by aggregate panel performance. The method is implemented in the open-source Bioconductor R software package scone. We demonstrate the effectiveness of scone on a collection of scRNA-seq datasets, generated with different protocols, including Fluidigm C1 and 10x platforms. We show that top-performing normalization methods lead to better agreement with independent validation data.

...read moreread less

Journal Article•10.1186/S12864-017-3746-Y•

GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases

[...]

Lihua Julie Zhu¹, Michael S. Lawrence², Ankit Gupta¹, Hervé Pagès³, Alper Kucukural¹, Manuel Garber¹, Scot A. Wolfe¹ - Show less +3 more•Institutions (3)

University of Massachusetts Medical School¹, Genentech², Fred Hutchinson Cancer Research Center³

15 May 2017-BMC Genomics

TL;DR: The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence and annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off- target sites for further characterization.

...read moreread less

Abstract: Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions These nucleases also hold tremendous promise for treating a variety of genetic disorders In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence This software package has been used successfully to analyze several GUIDE-seq datasets The software, source code and documentation are freely available at http://wwwbioconductororg/packages/release/bioc/html/GUIDEseqhtml

...read moreread less

Journal Article•10.1186/S12859-017-1482-6•

GSAR: Bioconductor package for Gene Set analysis in R

[...]

Yasir Rahmatallah¹, Boris Zybailov¹, Frank Emmert-Streib², Galina V. Glazko¹•Institutions (2)

University of Arkansas for Medical Sciences¹, Tampere University of Technology²

24 Jan 2017-BMC Bioinformatics

TL;DR: Package GSAR provides a set of multivariate non-parametric statistical methods that test a complex null hypothesis against specific alternatives, applicable to any type of omics data that can be represented in a matrix format.

...read moreread less

Abstract: Gene set analysis (in a form of functionally related genes or pathways) has become the method of choice for analyzing omics data in general and gene expression data in particular. There are many statistical methods that either summarize gene-level statistics for a gene set or apply a multivariate statistic that accounts for intergene correlations. Most available methods detect complex departures from the null hypothesis but lack the ability to identify the specific alternative hypothesis that rejects the null. GSAR (Gene Set Analysis in R) is an open-source R/Bioconductor software package for gene set analysis (GSA). It implements self-contained multivariate non-parametric statistical methods testing a complex null hypothesis against specific alternatives, such as differences in mean (shift), variance (scale), or net correlation structure. The package also provides a graphical visualization tool, based on the union of two minimum spanning trees, for correlation networks to examine the change in the correlation structures of a gene set between two conditions and highlight influential genes (hubs). Package GSAR provides a set of multivariate non-parametric statistical methods that test a complex null hypothesis against specific alternatives. The methods in package GSAR are applicable to any type of omics data that can be represented in a matrix format. The package, with detailed instructions and examples, is freely available under the GPL (> = 2) license from the Bioconductor web site.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTW759•

DNABarcodes: an R package for the systematic construction of DNA sample tags.

[...]

Tilo Buschmann

15 Mar 2017-Bioinformatics

TL;DR: An R package that combines known algorithms and innovative methods for the efficient, flexible and near‐optimal generation of robust barcode sets, designed for speed, versatility, provable correctness and large set sizes is prepared.

...read moreread less

Abstract: Motivation DNA barcodes are commonly used for counting and discriminating purposes in molecular and cell biology. Not every set of DNA sequences is equally suitable for this goal. There is a growing demand for more sophisticated barcode designs, with only few tools available. We prepared an R package that combines known algorithms and innovative methods for the efficient, flexible and near-optimal generation of robust barcode sets. Results Our R-software package 'DNABarcodes' generates sets of DNA barcodes from a few basic input parameters (e.g. length, distance metric, minimum distance, chemical properties). It satisfies the specifics of most particular experimental demands in de novo design of barcodes. Additionally, the package allows analysing existing sets of DNA barcodes as well as the generation of subsets of those existing sets to improve their error correction and detection properties. 'DNABarcodes' was designed for speed, versatility, provable correctness and large set sizes. Availability and implementation The DNABarcodes R package is available from Bioconductor at http://bioconductor.org/packages/DNABarcodes under the GPL-2 license. Contact tilo.buschmann@izi.fraunhofer.de. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•10.12688/F1000RESEARCH.12093.2•

bcbioRNASeq: R package for bcbio RNA-seq analysis

[...]

Michael J. Steinbaugh¹, Lorena Pantano¹, Rory Kirchner¹, Victor Barrera¹, Brad Chapman¹, Mary E. Piper¹, Meeta Mistry¹, Radhika S. Khetani¹, Kayleigh Rutherford¹, Oliver Hofmann², John N. Hutchinson¹, Shannan J. Ho Sui¹ - Show less +8 more•Institutions (2)

Harvard University¹, University of Melbourne²

08 Nov 2017-F1000Research

TL;DR: bcbioRNASeq, a Bioconductor package that provides ready-to-render templates, objects and wrapper functions to post-process bcbio RNA sequencing output data, helps automate the generation of high-level RNA-seq reports, facilitating the quality control analyses, identification of differentially expressed genes and functional enrichment analyses.

...read moreread less

Abstract: RNA-seq analysis involves multiple steps, from processing raw sequencing data to identifying, organizing, annotating, and reporting differentially expressed genes. bcbio is an open source, community-maintained framework providing automated and scalable RNA-seq methods for identifying gene abundance counts. We have developed bcbioRNASeq, a Bioconductor package that provides ready-to-render templates, objects and wrapper functions to post-process bcbio RNA sequencing output data. bcbioRNASeq helps automate the generation of high-level RNA-seq reports, facilitating the quality control analyses, identification of differentially expressed genes and functional enrichment analyses.

...read moreread less

Posted Content•10.1101/103085•

Accessible, curated metagenomic data through ExperimentHub

[...]

Edoardo Pasolli¹, Lucas Schiffer², Audrey Renson², Valerie Obenchain³, Paolo Manghi¹, Duy Tin Truong¹, Francesco Beghini¹, Faizan Malik², Marcel Ramos², Jennifer Beam Dowd², Curtis Huttenhower⁴, Martin Morgan³, Nicola Segata¹, Levi Waldron² - Show less +10 more•Institutions (4)

University of Trento¹, City University of New York², Roswell Park Cancer Institute³, Harvard University⁴

27 Jan 2017-bioRxiv

TL;DR: CuratedMetagenomicData provides standardized per-participant metadata linked to bacterial, fungal, archaeal, and viral taxonomic abundances, as well as quantitative metabolic functional profiles, generated by the HUMAnN2 and MetaPhlAn2 pipelines.

...read moreread less

Posted Content•10.1101/071761•

MutationalPatterns: comprehensive genome-wide analysis of mutational processes

[...]

Francis Blokzijl¹, Roel Janssen¹, Ruben van Boxtel¹, Edwin Cuppen¹•Institutions (1)

University Medical Center Utrecht¹

26 Oct 2017-bioRxiv

TL;DR: MutationalPatterns is an R/Bioconductor package that characterizes a broad range of mutational patterns and potential relations with (epi-)genomic features and offers an efficient method to quantify the contribution of known mutational signatures.

...read moreread less

Abstract: Base substitution catalogs represent historical records of mutational processes that have been active in a system. Such processes can be distinguished by typical characteristics, like mutation type, sequence context, transcriptional and replicative strand bias, and distribution throughout the genome. MutationalPatterns is an R/Bioconductor package that characterizes this broad range of mutational patterns and potential relations with (epi-)genomic features. Furthermore, it offers an efficient method to quantify the contribution of known mutational signatures. Such analyses can be used to determine whether certain DNA repair mechanisms are perturbed and to further characterize the processes underlying known mutational signatures. Keywords: R, Base substitutions, Somatic mutations, Mutational signatures, Mutational processes, Transcriptional strand bias. Availability and implementation: The MutationalPatterns R package is freely available for download at https://www.bioconductor.org/packages/release/bioc/html/MutationalPatterns.html. The package documentation provides a detailed description of typical analysis workflows.

...read moreread less

Journal Article•10.1186/S12859-016-1455-1•

MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration.

[...]

Carles Hernandez-Ferrer¹, Carlos Ruiz-Arenas¹, Alba Beltran-Gomila¹, Juan R. González¹•Institutions (1)

Pompeu Fabra University¹

17 Jan 2017-BMC Bioinformatics

TL;DR: MultiDataSet is a suitable class for data integration under R and Bioconductor framework that deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples.

...read moreread less

Abstract: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor’s methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment. MultiDataSet is a suitable class for data integration under R and Bioconductor framework.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTW761•

PanViz: interactive visualization of the structure of functionally annotated pangenomes.

[...]

Thomas Lin Pedersen¹, Intawat Nookaew², Intawat Nookaew³, David W. Ussery³, David W. Ussery², Maria Månsson - Show less +2 more•Institutions (3)

Technical University of Denmark¹, University of Arkansas for Medical Sciences², Oak Ridge National Laboratory³

01 Apr 2017-Bioinformatics

TL;DR: PanViz allows visualization of changes in gene group classification as different subsets of pangenomes are selected, as well as comparisons of individual genomes to pANGenomes with gene ontology based navigation of gene groups.

...read moreread less

Abstract: Summary PanViz is a novel, interactive, visualization tool for pangenome analysis. PanViz allows visualization of changes in gene group (groups of similar genes across genomes) classification as different subsets of pangenomes are selected, as well as comparisons of individual genomes to pangenomes with gene ontology based navigation of gene groups. Furthermore it allows for rich and complex visual querying of gene groups in the pangenome. PanViz visualizations require no external programs and are easily sharable, allowing for rapid pangenome analyses. Availability and implementation PanViz is written entirely in JavaScript and is available on https://github.com/thomasp85/PanViz . A companion R package that facilitates the creation of PanViz visualizations from a range of data formats is released through Bioconductor and is available at https://bioconductor.org/packages/PanVizGenerator . Contact thomasp85@gmail.com. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

...

Expand