Top 31 papers published in the topic of Bioconductor in 2007

Showing papers on "Bioconductor published in 2007"

Journal Article•10.1093/BIOINFORMATICS/BTM254•

GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor

[...]

Sean Davis¹, Paul S. Meltzer¹•Institutions (1)

15 Jul 2007-Bioinformatics

TL;DR: A software tool is developed that allows access to the wealth of information within GEO directly from BioConductor, eliminating many the formatting and parsing problems that have made such analyses labor-intensive in the past.

...read moreread less

Abstract: UNLABELLED Microarray technology has become a standard molecular biology tool. Experimental data have been generated on a huge number of organisms, tissue types, treatment conditions and disease states. The Gene Expression Omnibus (Barrett et al., 2005), developed by the National Center for Bioinformatics (NCBI) at the National Institutes of Health is a repository of nearly 140,000 gene expression experiments. The BioConductor project (Gentleman et al., 2004) is an open-source and open-development software project built in the R statistical programming environment (R Development core Team, 2005) for the analysis and comprehension of genomic data. The tools contained in the BioConductor project represent many state-of-the-art methods for the analysis of microarray and genomics data. We have developed a software tool that allows access to the wealth of information within GEO directly from BioConductor, eliminating many the formatting and parsing problems that have made such analyses labor-intensive in the past. The software, called GEOquery, effectively establishes a bridge between GEO and BioConductor. Easy access to GEO data from BioConductor will likely lead to new analyses of GEO data using novel and rigorous statistical and bioinformatic tools. Facilitating analyses and meta-analyses of microarray data will increase the efficiency with which biologically important conclusions can be drawn from published genomic data. AVAILABILITY GEOquery is available as part of the BioConductor project.

...read moreread less

2,840 citations

Journal Article•10.1093/BIOINFORMATICS/BTL567•

Using GOstats to test gene lists for GO term association

[...]

Seth Falcon¹, Robert Gentleman¹•Institutions (1)

Fred Hutchinson Cancer Research Center¹

05 Jan 2007-Bioinformatics

TL;DR: The capabilities of GOstats, a Bioconductor package written in R, that allows users to test GO terms for over or under-representation using either a classical hypergeometric test or a conditionalhypergeometric that uses the relationships among GO terms to decorrelate the results are discussed.

...read moreread less

Abstract: Motivation: Functional analyses based on the association of Gene Ontology (GO) terms to genes in a selected gene list are useful bioinformatic tools and the GOstats package has been widely used to perform such computations. In this paper we report significant improvements and extensions such as support for conditional testing. Results: We discuss the capabilities of GOstats, a Bioconductor package written in R, that allows users to test GO terms for over or under-representation using either a classical hypergeometric test or a conditional hypergeometric that uses the relationships among GO terms to decorrelate the results. Availability: GOstats is available as an R package from the Bioconductor project: http://bioconductor.org Contact: [email protected]

...read moreread less

2,326 citations

Journal Article•10.1198/JASA.2007.S179•

Bioinformatics and Computational Biology Solutions Using R and Bioconductor

[...]

J. Wade Davis

01 Mar 2007-Journal of the American Statistical Association

TL;DR: In this article, the authors present a Bioinformatics and Computational Biology Solutions Using R and Bioconductor (BIBOS) using R and BIBOS, which is a combination of R and CRF.

...read moreread less

Abstract: (2007). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Journal of the American Statistical Association: Vol. 102, No. 477, pp. 388-389.

...read moreread less

1,774 citations

Journal Article•10.1093/BIOINFORMATICS/BTM069•

pcaMethods—a bioconductor package providing PCA methods for incomplete data

[...]

Wolfram Stacklies¹, Henning Redestig¹, Matthias Scholz¹, Dirk Walther¹, Joachim Selbig¹ - Show less +1 more•Institutions (1)

Max Planck Society¹

06 Mar 2007-Bioinformatics

TL;DR: PcaMethods is a Bioconductor compliant library for computing principal component analysis (PCA) on incomplete data sets that can be analyzed directly or used to estimate missing values to enable the use of missing value sensitive statistical methods.

...read moreread less

Abstract: Summary:pcaMethods is a Bioconductor compliant library for computing principal component analysis (PCA) on incomplete data sets. The results can be analyzed directly or used to estimate missing values to enable the use of missing value sensitive statistical methods. The package was mainly developed with microarray and metabolite data sets in mind, but can be applied to any other incomplete data set as well. Availability: http://www.bioconductor.org Contact: selbig@mpimp-golm.mpg.de Supplementary information: Please visit our webpage at http://bioinformatics.mpimp-golm.mpg.de/

...read moreread less

1,134 citations

Journal Article•10.1093/BIOINFORMATICS/BTL646•

A faster circular binary segmentation algorithm for the analysis of array CGH data

[...]

Ennapadam Venkatraman¹, Adam B. Olshen¹•Institutions (1)

Memorial Sloan Kettering Cancer Center¹

20 Feb 2007-Bioinformatics

TL;DR: A hybrid approach to obtain the P-value of the test statistic in linear time is presented and it is shown that the substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed.

...read moreread less

Abstract: Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm. Results: We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data. Availability: An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher. Contact: venkatre@mskcc.org Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

1,075 citations

Posted Content•

Bioconductor: Open Software Development for Computational Biology and Bioinformatics

[...]

Kurt Hornik¹, Robert Gentleman², Vincent J. Carey³, Douglas M. Bates⁴, Ben Bolstad⁵, Marcel Dettling⁶, Sandrine Dudoit, Byron Ellis², Laurent Gautier⁷, Yongchao Ge⁸, Jeff Gentry², Torsten Hothorn⁹, Wolfgang Huber¹⁰, Stefano Maria Iacus¹¹, Rafael A. Irizarry¹², Friedrich Leisch¹³, Cheng Li², Martin Maechler, A. J. Rossini¹⁴, Günther Sawitzki¹⁵, Colin A. Smith, Gordon K. Smyth, Luke Tierney¹⁶, Jean Y.H. Yang¹⁷, Jean Y.H. Yang¹⁸, Jianhua Zhang - Show less +22 more•Institutions (18)

Vienna University of Economics and Business¹, Harvard University², Brigham and Women's Hospital³, University of Wisconsin-Madison⁴, University of California, Berkeley⁵, ETH Zurich⁶, Technical University of Denmark⁷, Icahn School of Medicine at Mount Sinai⁸, University of Erlangen-Nuremberg⁹, German Cancer Research Center¹⁰, University of Milan¹¹, Johns Hopkins University¹², Vienna University of Technology¹³, University of Washington¹⁴, Heidelberg University¹⁵, University of Iowa¹⁶, University of Sydney¹⁷, University of Sydney School of Mathematics and Statistics¹⁸

26 Dec 2007-Social Science Research Network

TL;DR: The Bioconductor project as discussed by the authors is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics, which aims to foster collaborative development and widespread use of innovative software, reduce barriers to entry into interdisciplinary scientific research, and promote the achievement of remote reproducibility of research results.

...read moreread less

Abstract: The Bioconductor project is an initiative for the collaborative creation of the extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methodes, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.

...read moreread less

745 citations

Journal Article•10.1093/BIOSTATISTICS/KXL042•

Exploration, normalization, and genotype calls of high density oligonucleotide snp array data

[...]

Benilton S. Carvalho¹, Henrik Bengtsson², Terence P. Speed³, Rafael A. Irizarry¹•Institutions (3)

Johns Hopkins University¹, University of California, Berkeley², Walter and Eliza Hall Institute of Medical Research³

01 Apr 2007-Biostatistics

TL;DR: A preprocessing methodology for a technology designed for the identification of DNA sequence variants in specific genes or regions of the human genome that are associated with phenotypes of interest such as disease is described.

...read moreread less

Abstract: SUMMARY In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists, and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of the gene expression measurements, relative to ad hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications of microarrays are becoming more and more popular. In this paper, we describe a preprocessing methodology for a technology designed for the identification of DNA sequence variants in specific genes or regions of the human genome that are associated with phenotypes of interest such as disease. In particular, we describe a methodology useful for preprocessing Affymetrix single-nucleotide polymorphism chips and obtaining genotype calls with the preprocessed data. We demonstrate how our procedure improves existing approaches using data from 3 relatively large studies including the one in which large numbers of independent calls are available. The proposed methods are implemented in the package oligo available from Bioconductor.

...read moreread less

298 citations

Journal Article•10.1093/NAR/GKM779•

X:Map: annotation and visualization of genome structure for Affymetrix exon array analysis

[...]

Tim Yates¹, Michal J. Okoniewski¹, Crispin J. Miller¹•Institutions (1)

University of Manchester¹

11 Oct 2007-Nucleic Acids Research

TL;DR: X:Map is a genome annotation database that provides information needed to associate each reporter on the exon array with the features of the genome it is targeting, and to relate these to gene and genome structure.

...read moreread less

Abstract: Affymetrix exon arrays aim to target every known and predicted exon in the human, mouse or rat genomes, and have reporters that extend beyond protein coding regions to other areas of the transcribed genome. This combination of increased coverage and precision is important because a substantial proportion of protein coding genes are predicted to be alternatively spliced, and because many non-coding genes are known also to be of biological significance. In order to fully exploit these arrays, it is necessary to associate each reporter on the array with the features of the genome it is targeting, and to relate these to gene and genome structure. X:Map is a genome annotation database that provides this information. Data can be browsed using a novel Google-maps based interface, and analysed and further visualized through an associated BioConductor package. The database can be found at http://xmap.picr.man.ac.uk.

...read moreread less

88 citations

Journal Article•10.1186/1471-2105-8-364•

A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray Experiments

[...]

Hyungwon Choi¹, Ronglai Shen¹, Arul M. Chinnaiyan¹, Debashis Ghosh²•Institutions (2)

University of Michigan¹, Pennsylvania State University²

27 Sep 2007-BMC Bioinformatics

TL;DR: A general probabilistic framework for combining high-throughput genomic data from several related microarray experiments using mixture models, which considers two methods for estimation of an index termed the probability of expression (POE).

...read moreread less

Abstract: Background: With the explosion in data generated using microarray technology by different investigators working on similar experiments, it is of interest to combine results across multiple studies. Results: In this article, we describe a general probabilistic framework for combining highthroughput genomic data from several related microarray experiments using mixture models. A key feature of the model is the use of latent variables that represent quantities that can be combined across diverse platforms. We consider two methods for estimation of an index termed the probability of expression (POE). The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques. The second method is a faster algorithm based on the expectation-maximization (EM) algorithm. The methods are illustrated with application to a meta-analysis of datasets for metastatic cancer. Conclusion: The statistical methods described in the paper are available as an R package, metaArray 1.8.1, which is at Bioconductor, whose URL is http://www.bioconductor.org/.

...read moreread less

72 citations

Journal Article•10.1093/BIB/BBM043•

A microarray analysis for differential gene expression in the soybean genome using Bioconductor and R

[...]

W. Gregory Alvord, Jean Roayaei, Octavio A. Quiñones, Katherine T. Schneider

01 Nov 2007-Briefings in Bioinformatics

TL;DR: In this paper, specific procedures for conducting quality assessment of Affymetrix GeneChip(R) soybean genome data and for performing analyses to determine differential gene expression using the open-source R programming environment in conjunction with the open source Bioconductor software are described.

...read moreread less

Abstract: This article describes specific procedures for conducting quality assessment of Affymetrix GeneChip(R) soybean genome data and for performing analyses to determine differential gene expression using the open-source R programming environment in conjunction with the open-source Bioconductor software. We describe procedures for extracting those Affymetrix probe set IDs related specifically to the soybean genome on the Affymetrix soybean chip and demonstrate the use of exploratory plots including images of raw probe-level data, boxplots, density plots and M versus A plots. RNA degradation and recommended procedures from Affymetrix for quality control are discussed. An appropriate probe-level model provides an excellent quality assessment tool. To demonstrate this, we discuss and display chip pseudo-images of weights, residuals and signed residuals and additional probe-level modeling plots that may be used to identify aberrant chips. The Robust Multichip Averaging (RMA) procedure was used for background correction, normalization and summarization of the AffyBatch probe-level data to obtain expression level data and to discover differentially expressed genes. Examples of boxplots and MA plots are presented for the expression level data. Volcano plots and heatmaps are used to demonstrate the use of (log) fold changes in conjunction with ordinary and moderated t-statistics for determining interesting genes. We show, with real data, how implementation of functions in R and Bioconductor successfully identified differentially expressed genes that may play a role in soybean resistance to a fungal pathogen, Phakopsora pachyrhizi. Complete source code for performing all quality assessment and statistical procedures may be downloaded from our web source: http://css.ncifcrf.gov/services/download/MicroarraySoybean.zip.

...read moreread less

66 citations

Journal Article•10.1093/BIOINFORMATICS/BTM092•

Domain-enhanced analysis of microarray data using GO annotations

[...]

Jiajun Liu¹, Jacqueline M. Hughes-Oliver¹, J. Alan Menius¹•Institutions (1)

North Carolina State University¹

22 Mar 2007-Bioinformatics

TL;DR: This work uses a 'top-down' approach to perform domain aggregation by first combining gene expressions before testing for differentially expressed patterns, in contrast to the more standard 'bottom-up' approach.

...read moreread less

Abstract: Motivation: New biological systems technologies give scientists the ability to measure thousands of bio-molecules including genes, proteins, lipids and metabolites. We use domain knowledge, e.g. the Gene Ontology, to guide analysis of such data. By focusing on domain-aggregated results at, say the molecular function level, increased interpretability is available to biological scientists beyond what is possible if results are presented at the gene level. Results: We use a ‘top–down’ approach to perform domain aggregation by first combining gene expressions before testing for differentially expressed patterns. This is in contrast to the more standard ‘bottom–up’ approach, where genes are first tested individually then aggregated by domain knowledge. The benefits are greater sensitivity for detecting signals. Our method, domain-enhanced analysis (DEA) is assessed and compared to other methods using simulation studies and analysis of two publicly available leukemia data sets. Availability: Our DEA method uses functions available in R ( http://www.r-project.org/) and SAS (http://www.sas.com/). The two experimental data sets used in our analysis are available in R as Bioconductor packages, ‘ALL’ and ‘golubEsets’ (http://www.bioconductor.org/). Contact: jliu6@stat.ncsu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTM072•

Codelink: an R package for analysis of GE healthcare gene expression bioarrays

[...]

Diego Diez¹, Rebeca Álvarez¹, Ana Dopazo¹•Institutions (1)

Spanish National Research Council¹

01 May 2007-Bioinformatics

TL;DR: The availability of a package for reading and analyzing data from GE Healthcare Gene Expression Bioarrays within the R environment is reported, which is implemented in the R language and available for download free of charge.

...read moreread less

Abstract: Motivation: Microarray-based expression profiles have become a standard methodology in any high-throughput analysis. Several commercial platforms are available, each with its strengths and weaknesses. The R platform for statistical analysis and graphics is a powerful environment for the analysis of microarray data, because it has many integrated statistical methods available as well as the specialized microarray analysis project Bioconductor. Many packages have been added in the last few years increasing the range of possible analysis. Here, we report the availability of a package for reading and analyzing data from GE Healthcare Gene Expression Bioarrays within the R environment. Availability: The software is implemented in the R language, is open source and available for download free of charge through the Bioconductor (http://www.bioconductor.org) project. Contact: diez@kuicr.kyoto-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTL638•

SNPchip: R classes and methods for SNP array data

[...]

Robert B. Scharpf¹, Jason C. Ting¹, Jonathan Pevsner¹, Ingo Ruczinski¹•Institutions (1)

Johns Hopkins University¹

10 Feb 2007-Bioinformatics

TL;DR: The R package SNPchip contains classes and methods useful for storing, visualizing and analyzing high density SNP data, including the ability to build statistical models for SNP-level data that operate on instances of the class, and to communicate with other R packages that add additional functionality.

...read moreread less

Abstract: Summary: High-density single nucleotide polymorphism microarrays (SNP chips) provide information on a subject's genome, such as copy number and genotype (heterozygosity/homozygosity) at a SNP. While fluorescence in situ hybridization and karyotyping reveal many abnormalities, SNP chips provide a higher resolution map of the human genome that can be used to detect, e.g., aneuploidies, microdeletions, microduplications and loss of heterozygosity (LOH). As a variety of diseases are linked to such chromosomal abnormalities, SNP chips promise new insights for these diseases by aiding in the discovery of such regions, and may suggest targets for intervention. The R package SNPchip contains classes and methods useful for storing, visualizing and analyzing high density SNP data. Originally developed from the SNPscan web-tool, SNPchip utilizes S4 classes and extends other open source R tools available at Bioconductor. This has numerous advantages, including the ability to build statistical models for SNP-level data that operate on instances of the class, and to communicate with other R packages that add additional functionality. Availability: The package is available from the Bioconductor web page at www.bioconductor.org Contact: ingo@jhu.edu Supplementary information: The supplementary material as described in this article (case studies, installation guidelines and R code) is available from http://biostat.jhsph.edu/~iruczins/publications/sm/

...read moreread less

Journal Article•10.1186/1471-2105-8-443•

Erratum to: Ringo – an R/Bioconductor package for analyzing ChIP-chip readouts

[...]

Joern Toedling¹, Oleg Skylar¹, Tammo Krueger², Jenny J. Fischer², Silke Sperling², Wolfgang Huber¹ - Show less +2 more•Institutions (2)

European Bioinformatics Institute¹, Max Planck Society²

15 Nov 2007-BMC Bioinformatics

TL;DR: This work has shown that chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is a high-throughput assay for DNA-protein-binding or post-translational chromatin/histone modifications that needs to be bioinformatically annotated and compared to related datasets by statistical methods.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTL628•

GGtools: analysis of genetics of gene expression in bioconductor

[...]

Vincent J. Carey¹, Martin Morgan², Seth Falcon², Ross Lazarus¹, Robert Gentleman² - Show less +1 more•Institutions (2)

Brigham and Women's Hospital¹, Fred Hutchinson Cancer Research Center²

01 Feb 2007-Bioinformatics

TL;DR: The central concepts and implementation of data structures and methods for studying genetics of gene expression with the GGtools package of Bioconductor are reviewed.

...read moreread less

Abstract: Summary: This paper reviews the central concepts and implementation of data structures and methods for studying genetics of gene expression with the GGtools package of Bioconductor. Illustration with a HapMap+expression dataset is provided. Availability: Package GGtools is part of Bioconductor 1.9 (http://bioconductor.org). Open source with Artistic License. Contact: stvjc@channing.harvard.edu

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTM159•

CALIB: a Bioconductor package for estimating absolute expression levels from two-color microarray data

[...]

Hui Zhao¹, Kristof Engelen¹, Bart De Moor¹, Kathleen Marchal¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jul 2007-Bioinformatics

TL;DR: A new Bioconductor package 'CALIB' for normalization of two-color microarray data is described, based on the measurements of external controls and estimates an absolute target level for each gene and condition pair, as opposed to working with log-ratios as a relative measure of expression.

...read moreread less

Abstract: In this article we describe a new Bioconductor package ‘CALIB’ for normalization of two-color microarray data. This approach is based on the measurements of external controls and estimates an absolute target level for each gene and condition pair, as opposed to working with log-ratios as a relative measure of expression. Moreover, this method makes no assumptions regarding the distribution of gene expression divergence. Availability: http://bioconductor.org/packages/2.0/bioc Open Source Contact: Kathleen.marchal@biw.kuleuven.be

...read moreread less

Book Chapter•10.1007/978-3-540-71681-5_10•

Estimating genome-wide copy number using allele specific mixture models

[...]

Wenyi Wang¹, Benilton S. Carvalho¹, Nathaniel D. Miller², Jonathan Pevsner², Aravinda Chakravarti¹, Rafael A. Irizarry¹ - Show less +2 more•Institutions (2)

Johns Hopkins University¹, Kennedy Krieger Institute²

21 Apr 2007

TL;DR: This paper proposes a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology and uses a 314 sample database, constructed with public datasets, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy numbers.

...read moreread less

Abstract: Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to be an underlying cause of cancer. More than one decade ago comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a highthroughput fashion. However, this technology only provides a 10 MB resolution which limits the ability to detect copy number alterations spanning small regions. It is widely believed that a copy number alteration as small as one base can have significant downstream effects, thus microarray manufacturers have developed technologies that provide much higher resolution. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. CGH arrays use a two-color hybridization, usually comparing a sample of interest to a reference sample, which to some degree removes the probe effect. However, the resolution is not nearly high enough to provide single-point copy number estimates. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing the resolution. Recently, regression-type models that account for probe-effect have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, constructed with public datasets, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy numbers. With the estimated models in place we can compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (http://www.bioconductor.org).

...read moreread less

Proceedings Article•10.1142/9789812776136_0016•

SGDI: system for genomic data integration.

[...]

Vincent J. Carey¹, Jeff Gentry, Deepayan Sarkar, Robert Gentleman, Srini Ramaswamy - Show less +1 more•Institutions (1)

Brigham and Women's Hospital¹

1 Dec 2007

TL;DR: The approach consists of data capture and modeling processes rooted in R/Bioconductor, sample annotation and sequence constituent ontology management based in R, secure data archiving in PostgreSQL, and browser-based workspace creation and management rooted in Zope.

...read moreread less

Abstract: This paper describes a framework for collecting, annotating, and archiving high-throughput assays from multiple experiments conducted on one or more series of samples. Specific applications include support for large-scale surveys of related transcriptional profiling studies, for investigations of the genetics of gene expression and for joint analysis of copy number variation and mRNA abundance. Our approach consists of data capture and modeling processes rooted in R/Bioconductor, sample annotation and sequence constituent ontology management based in R, secure data archiving in PostgreSQL, and browser-based workspace creation and management rooted in Zope. This effort has generated a completely transparent, extensible, and customizable interface to large archives of high-throughput assays. Sources and prototype interfaces are accessible at www.sgdi.org/software.

...read moreread less

Sample Size Estimation for Microarray Experiments Using the ssize package.

[...]

Gregory R. Warnes

1 Jan 2007

TL;DR: A simple method for performing and visualizing sample size calculations for microarray experiments as implemented in the ssize R package, which is available from the Bioconductor project (http://www.bioconductor.org) web site.

...read moreread less

Abstract: RNA Expression Microarray technology is widely applied in biomedical and pharmaceutical research. The huge number of RNA concentrations estimated for each sample make it dicult to apply traditional sample size calculation techniques and has left most practitioners to rely on rule-of-thumb techniques. In this paper, we briefly describe and then demonstrate a simple method for performing and visualizing sample size calculations for microarray experiments as implemented in the ssize R package, which is available from the Bioconductor project (http://www.bioconductor.org) web site.

...read moreread less

apt-get install cran bioc: On automated builds of 1700 R packages for Debian

[...]

Dirk Eddelbuettel, David Vernazobres¹, Albrecht Gebhardt•Institutions (1)

University of Wisconsin-Madison¹

1 Jan 2007

TL;DR: An effort to bring the R package repositories and the Debian Linux distribution together provides a unique statistical environment: essentially all CRAN, BioConductor and Omegahat packages can be installed automatically onto Debian (or Ubuntu) from pre-built binary packages with a single command.

...read moreread less

Abstract: Within the world of the R system, language and environment, the CRAN and BioConductor archives have achieved remarkable success in attracting a consistent inflow of new packages of high quality contributions and extensions to the R system. At the same time, the Debian distribution (and its derivatives such as Ubuntu) has continued to make it easier for users to obtain a consistent and complete software installation. In Debian’s case, this has resulted in an unprecedented ten installable architectures. For Ubuntu, a focus on easier installation and added polish means that the ’barriers to entry’ for new users have been lowered, which has resulted in increased market- and mind share for Debian and Ubuntu. This paper presents an effort to bring the R package repositories and the Debian Linux distribution together. This provides a unique statistical environment: essentially all CRAN, BioConductor and Omegahat packages can be installed automatically onto Debian (or Ubuntu) from pre-built binary packages with a single command. Our initial reference builds cover well over 1700 packages taken from the CRAN, BioConductor and Omegahat repositories.

...read moreread less

Ringo - R Investigation of NimbleGen Oligoarrays

[...]

Joern Toedling

1 Jan 2007

TL;DR: The package Ringo deals with the analysis of two-color oligonucleotide microarrays used in ChIP-chip projects and employs functions from other packages of the Bioconductor project and provides additional Chip-chip-specific and NimbleGen-specific functionalities.

...read moreread less

Abstract: The package Ringo deals with the analysis of two-color oligonucleotide microarrays used in ChIP-chip projects. The package was started to facilitate the analysis of two-color microarrays from the company NimbleGen1, but the package has a modular design, such that the platform-specific functionality is encapsulated and analogous two-color tiling array platforms can also be processed. The package employs functions from other packages of the Bioconductor project (Gentleman et al., 2004) and provides additional ChIP-chip-specific and NimbleGen-specific functionalities.

...read moreread less

Review: Practical Design and Analysis of 2-Colour cDNA Microarray Experiments

[...]

Ben Routley, Mark Muldoon

23 Aug 2007

TL;DR: This review aims to make current techniques of statistical design, normalisation and linear analysis of cDNA microarray experiments accessible to a wider community.

...read moreread less

Abstract: This review paper, is aimed at biological researchers who are interested in or have begun to use cDNA microarrays for their investigations. Large microarray studies typically involve a multi-disciplinary team with various groups performing different aspects of the same experiment. This approach means that microarrays are less accessible to new researchers than more traditional biological techniques. This review aims to make current techniques of statistical design, normalisation and linear analysis of cDNA microarray experiments accessible to a wider community. These methods will be illustrated with examples that use freely-available packages implemented in Bioconductor and R.

...read moreread less

Journal Article•10.1186/1471-2105-8-446•

Novel definition files for human GeneChips based on GeneAnnot

[...]

Francesco Ferrari¹, Stefania Bortoluzzi², Alessandro Coppe², Alexandra Sirota³, Marilyn Safran³, Michael Shmoish⁴, Sergio Ferrari¹, Doron Lancet³, Gian Antonio Danieli², Silvio Bicciato² - Show less +6 more•Institutions (4)

University of Modena and Reggio Emilia¹, University of Padua², Weizmann Institute of Science³, Technion – Israel Institute of Technology⁴

15 Nov 2007-BMC Bioinformatics

TL;DR: GeneAnnot-based custom CDFs solve the problem of a reliable reconstruction of expression levels and eliminate the existence of more than one probeset per gene, which often leads to discordant expression signals for the same transcript when gene differential expression is the focus of the analysis.

...read moreread less

Abstract: Background: Improvements in genome sequence annotation revealed discrepancies in the original probeset/gene assignment in Affymetrix microarray and the existence of differences between annotations and effective alignments of probes and transcription products. In the current generation of Affymetrix human GeneChips, most probesets include probes matching transcripts from more than one gene and probes which do not match any transcribed sequence. Results: We developed a novel set of custom Chip Definition Files (CDF) and the corresponding Bioconductor libraries for Affymetrix human GeneChips, based on the information contained in the GeneAnnot database. GeneAnnot-based CDFs are composed of unique custom-probesets, including only probes matching a single gene. Conclusion: GeneAnnot-based custom CDFs solve the problem of a reliable reconstruction of expression levels and eliminate the existence of more than one probeset per gene, which often leads to discordant expression signals for the same transcript when gene differential expression is the focus of the analysis. GeneAnnot CDFs are freely distributed and fully compliant with Affymetrix standards and all available software for gene expression analysis. The CDF libraries are available from http://www.xlab.unimo.it/GA_CDF, along with supplementary information (CDF libraries, installation guidelines and R code, CDF statistics, and analysis results).

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTM469•

oneChannelGUI: a graphical interface to Bioconductor tools, designed for life scientists who are not familiar with R language

[...]

Remo Sanges¹, Francesca Cordero², Raffaele A. Calogero²•Institutions (2)

AREA Science Park¹, University of Turin²

15 Dec 2007-Bioinformatics

TL;DR: UNLABELLED OneChannelGUI is an add-on Bioconductor package providing a new set of functions extending the capability of the affylmGUI package, providing a graphical interface (GUI) forBioconductor libraries to be used for quality control, normalization, filtering, statistical validation and data mining for single channel microarrays.

...read moreread less

Abstract: Summary: OneChannelGUI is an add-on Bioconductor package providing a new set of functions extending the capability of the affylmGUI package. This library provides a graphical interface (GUI) for Bioconductor libraries to be used for quality control, normalization, filtering, statistical validation and data mining for single channel microarrays. Affymetrix 3 0 expression (IVT) arrays as well as the new whole transcript expression arrays, i.e. gene/exon 1.0 ST, are actually implemented. oneChannelGUI is available for most platforms on which R runs, i.e. Windows and Unix-like machines. Availability: http://www.bioconductor.org/packages/2.0/bioc/html/ oneChannelGUI.html

...read moreread less

Journal Article•10.1186/1471-2105-8-S6-S8•

Graphs in molecular biology

[...]

Wolfgang Huber¹, Vincent J. Carey², Li Long³, Seth Falcon⁴, Robert Gentleman⁴ - Show less +1 more•Institutions (4)

European Bioinformatics Institute¹, Brigham and Women's Hospital², Swiss Institute of Bioinformatics³, Fred Hutchinson Cancer Research Center⁴

27 Sep 2007-BMC Bioinformatics

TL;DR: Graph theoretical concepts are given a brief introduction into some of the concepts and their areas of application in molecular biology and a simple application to the integration of a protein-protein interaction and a co-expression network is presented.

...read moreread less

Abstract: Graph theoretical concepts are useful for the description and analysis of interactions and relationships in biological systems. We give a brief introduction into some of the concepts and their areas of application in molecular biology. We discuss software that is available through the Bioconductor project and present a simple example application to the integration of a protein-protein interaction and a co-expression network.

...read moreread less

Journal Article•10.1186/GB-2007-8-5-R79•

An annotation infrastructure for the analysis and interpretation of Affymetrix exon array data

[...]

Michal J. Okoniewski¹, Tim Yates¹, Siân Dibben¹, Crispin J. Miller¹•Institutions (1)

University of Manchester¹

11 May 2007-Genome Biology

TL;DR: TheAffymetrix exon arrays contain probesets intended to target every known and predicted exon in the entire genome, posing significant challenges for high-throughput genome-wide data analysis.

...read moreread less

Abstract: Affymetrix exon arrays contain probesets intended to target every known and predicted exon in the entire genome, posing significant challenges for high-throughput genome-wide data analysis. X:MAP http://xmap.picr.man.ac.uk, an annotation database, and exonmap http://www.bioconductor.org/packages/2.0/bioc/html/exonmap.html, a BioConductor/R package, are designed to support fine-grained analysis of exon array data. The system supports the application of standard statistical techniques, prior to the use of genome scale annotation to provide gene-, transcript- and exon-level summaries and visualization tools.

...read moreread less

Journal Article•10.1186/1471-2105-8-221•

Ringo--an R/Bioconductor package for analyzing ChIP-chip readouts.

[...]

Joern Toedling¹, Oleg Sklyar¹, Wolfgang Huber¹•Institutions (1)

European Bioinformatics Institute¹

26 Jun 2007-BMC Bioinformatics

TL;DR: A free, open-source R package Ringo is presented that facilitates the analysis of ChIP-chip experiments by providing functionality for data import, quality assessment, normalization and visualization of the data, and the detection of Chip-enriched genomic regions.

...read moreread less

Abstract: Background Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is a high-throughput assay for DNA-protein-binding or post-translational chromatin/histone modifications. However, the raw microarray intensity readings themselves are not immediately useful to researchers, but require a number of bioinformatic analysis steps. Identified enriched regions need to be bioinformatically annotated and compared to related datasets by statistical methods.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTM357•

RefPlus: an R package extending the RMA Algorithm.

[...]

Chris Harbron¹, Kai-Ming Chang¹, Marie C. South¹•Institutions (1)

AstraZeneca¹

15 Sep 2007-Bioinformatics

TL;DR: The availability of the RefPlus package containing functions to perform the Extrapolation Strategy and extrapolation Averaging algorithms which address issues of RMA are reported.

...read moreread less

Abstract: Summary: RMA has become a widely used methodology to preprocess Affymetrix gene expression microarrays. A limitation of RMA is that the calculated probeset intensities change when a set of microarrays is re-pre-processed after the inclusion of additional microarrays into the analysis set. Here we report the availability of the RefPlus package containing functions to perform the Extrapolation Strategy and Extrapolation Averaging algorithms which address these issues. Availability: The software is implemented in the R language and can be downloaded from the Bioconductor project website (http://

...read moreread less

Journal Article•10.1186/1471-2105-8-439•

BGX: a Bioconductor package for the Bayesian integrated analysis of Affymetrix GeneChips.

[...]

Ernest Turro¹, Natalia Bochkina², Anne-Mette K Hein³, Sylvia Richardson¹•Institutions (3)

Imperial College London¹, University of Edinburgh², Aarhus University³

12 Nov 2007-BMC Bioinformatics

TL;DR: BGX is a new Bioconductor R package that implements an integrated Bayesian approach to the analysis of 3' GeneChip data that performs well relative to other widely used methods at estimating expression levels and fold changes.

...read moreread less

Abstract: Affymetrix 3' GeneChip microarrays are widely used to profile the expression of thousands of genes simultaneously They differ from many other microarray types in that GeneChips are hybridised using a single labelled extract and because they contain multiple 'match' and 'mismatch' sequences for each transcript Most algorithms extract the signal from GeneChip experiments in a sequence of separate steps, including background correction and normalisation, which inhibits the simultaneous use of all available information They principally provide a point estimate of gene expression and, in contrast to BGX, do not fully integrate the uncertainty arising from potentially heterogeneous responses of the probes BGX is a new Bioconductor R package that implements an integrated Bayesian approach to the analysis of 3' GeneChip data The software takes into account additive and multiplicative error, non-specific hybridisation and replicate summarisation in the spirit of the model outlined in [1] It also provides a posterior distribution for the expression of each gene Moreover, BGX can take into account probe affinity effects from probe sequence information where available The package employs a novel adaptive Markov chain Monte Carlo (MCMC) algorithm that raises considerably the efficiency with which the posterior distributions are sampled from Finally, BGX incorporates various ways to analyse the results, such as ranking genes by expression level as well as statistically based methods for estimating the amount of up and down regulated genes between two conditions BGX performs well relative to other widely used methods at estimating expression levels and fold changes It has the advantage that it provides a statistically sound measure of uncertainty for its estimates BGX includes various analysis functions to visualise and exploit the rich output that is produced by the Bayesian model

...read moreread less

Proceedings Article•10.1109/BIBE.2007.4375552•

PANP - a New Method of Gene Detection on Oligonucleotide Expression Arrays

[...]

P. Warren¹, Deanne Taylor¹, P.G.V. Martini¹, J. Jackson¹, J. Bienkowska¹ - Show less +1 more•Institutions (1)

Merck Serono¹

5 Nov 2007

TL;DR: A statistical method in R, called presence-absence calls with negative probesets (PANP) which uses sets of Affymetrix-reported probes with no known hybridization partners on two chip sets: HG- U133A and HG-U133 Plus 2.0.

...read moreread less

Abstract: The method currently most used for probeset detection calls on Affymetrix GeneChipreg Human Genome Arrays is provided as part of the MAS5 software. The MAS method uses Wilcoxon statistics for determining presence-absence (MAS-P/A) calls. However, MAS-P/A is only usable with MAS5 processing, which requires the use of both perfect match (PM) and mismatch (MM) probe data in order to call the resulting probeset present or absent. A considerable amount of recent research has convincingly shown that using MM data in gene expression analysis may be problematic. The RMA method, which uses PM data only, is one method that has been developed in response to this. However, there is no publicly available method that works with PM-only expression data to establish presence or absence of genes from the probesets in microarray data. It seems desirable to decouple the method used to generate gene expression values from the method used to make gene detection calls. We have therefore developed a statistical method in R, called presence-absence calls with negative probesets (PANP) which uses sets of Affymetrix-reported probes with no known hybridization partners on two chip sets: HG-U133A and HG-U133 Plus 2.0. PANP allows the use of any Affymetrix microarray data preprocessing method to generate expression values, including PM-only methods as well as PM and MM methods. We present our results on PANP and its performance using the set of 28 HG-U133A chips from a published Affymetrix Latin squares spike-in dataset as well as an internal TaqMan-validated human tissue dataset on the HG-U133 Plus 2.0 chipsets. We And that using these datasets, PANP out-performs the MAS-PA method in several metrics of accuracy and precision using a variety of preprocessing methods: RMA, GCRMA, and even MAS5 itself. PANP out-performs MAS-P/A in probeset detection across a full range of concentrations, especially with low concentration transcripts. An R software package has been prepared for PANP and is available in R as part of the Bioconductor package release at http://www.bioconductor.org.

...read moreread less