Top 25 papers published in the topic of Bioconductor in 2005

Showing papers on "Bioconductor published in 2005"

Journal Article•10.1093/BIOINFORMATICS/BTI525•

BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis

[...]

Steffen Durinck¹, Yves Moreau¹, Arek Kasprzyk², Sean Davis³, Bart De Moor¹, Alvis Brazma², Wolfgang Huber² - Show less +3 more•Institutions (3)

Katholieke Universiteit Leuven¹, Wellcome Trust², National Institutes of Health³

15 Aug 2005-Bioinformatics

TL;DR: The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.

...read moreread less

Abstract: Summary:biomaRt is a new Bioconductor package that integrates BioMart data resources with data analysis software in Bioconductor. It can annotate a wide range of gene or gene product identifiers (e.g. Entrez-Gene and Affymetrix probe identifiers) with information such as gene symbol, chromosomal coordinates, Gene Ontology and OMIM annotation. Furthermore biomaRt enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis. Fast and up-to-date data retrieval is possible as the package executes direct SQL queries to the BioMart databases (e.g. Ensembl). The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining. Availability:http://www.bioconductor.org. LGPL Contact: steffen.durinck@esat.kuleuven.ac.be

...read moreread less

2,312 citations

Journal Article•10.1093/BIOINFORMATICS/BTI394•

MADE4: an R package for multivariate analysis of gene expression data

[...]

Aedín C. Culhane¹, Jean Thioulouse², Guy Perrière², Desmond G. Higgins¹•Institutions (2)

University College Dublin¹, Claude Bernard University Lyon 1²

01 Jun 2005-Bioinformatics

TL;DR: MADE4 takes advantage of the extensive multivariate statistical and graphical functions in the R package ade4, extending these for application to microarray data and provides new graphical and visualization tools that aid in interpretation of multivariate analysis of micro array data.

...read moreread less

Abstract: Summary: MADE4, microarray ade4, is a software package that facilitates multivariate analysis of microarray gene-expression data. MADE4 accepts a wide variety of gene-expression data formats. MADE4 takes advantage of the extensive multivariate statistical and graphical functions in the R package ade4, extending these for application to microarray data. In addition, MADE4 provides new graphical and visualization tools that aid in interpretation of multivariate analysis of microarray data. Availability: The R package MADE4 is available from Bioconductor http://bioinf.vcd.ie/software and from Bioconductor http://www.bioconductor.org Contact: aedin.culhane@ucd.ie Supplementary information: MADE4 is well documented. There are tutorials, in the form of vignettes, which describe typical analyses. In addition, the MADE4 manual provides descriptions and examples for each function.

...read moreread less

402 citations

Book•

Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health)

[...]

Robert Gentleman, Vincent J. Carey, Wolfgang Huber, Rafael A. Irizarry, Sandrine Dudoit - Show less +1 more

1 Aug 2005

TL;DR: Code underlying all of the computations that are shown is made available on a companion website, and readers can reproduce every number, figure, and table on their own computers.

...read moreread less

Abstract: Full four-color book. Some of the editors created the Bioconductor project and Robert Gentleman is one of the two originators of R. All methods are illustrated with publicly available data, and a major section of the book is devoted to fully worked case studies. Code underlying all of the computations that are shown is made available on a companion website, and readers can reproduce every number, figure, and table on their own computers.

...read moreread less

330 citations

Book Chapter•10.1007/0-387-29362-0_15•

Multiple Testing Procedures: the multtest Package and Applications to Genomics

[...]

Katherine S. Pollard¹, Sandrine Dudoit², M. J. van der Laan²•Institutions (2)

University of California, San Francisco¹, University of California, Berkeley²

1 Jan 2005

TL;DR: The multtest package as discussed by the authors implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTPs) for controlling a broad class of Type I error rates.

...read moreread less

Abstract: The Bioconductor R package multtest implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates. The current version of multtest provides MTPs for tests concerning means, differences in means, and regression parameters in linear and Cox proportional hazards models. Typical testing scenarios are illustrated by applying various MTPs implemented in multtest to the Acute Lymphoblastic Leukemia (ALL) data set of Chiaretti et al. (2004), with the aim of identifying genes whose expression measures are associated with (possibly censored) biological and clinical outcomes.

...read moreread less

303 citations

Journal Article•10.1093/BIOINFORMATICS/BTI457•

Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information

[...]

Fatima Al-Shahrour, Ramon Diaz-Uriarte, Joaquín Dopazo

01 Jul 2005-Bioinformatics

TL;DR: A simple procedure is presented which combines experimental measurements with available biological information in a way that genes are simultaneously tested in groups related by common functional properties and constitutes a very sensitive tool for selecting genes with significant differential behaviour in the experimental conditions tested.

...read moreread less

Abstract: Motivation: The analysis of genome-scale data from different high throughput techniques can be used to obtain lists of genes ordered according to their different behaviours under distinct experimental conditions corresponding to different phenotypes (e.g. differential gene expression between diseased samples and controls, different response to a drug, etc.). The order in which the genes appear in the list is a consequence of the biological roles that the genes play within the cell, which account, at molecular scale, for the macroscopic differences observed between the phenotypes studied. Typically, two steps are followed for understanding the biological processes that differentiate phenotypes at molecular level: first, genes with significant differential expression are selected on the basis of their experimental values and subsequently, the functional properties of these genes are analysed. Instead, we present a simple procedure which combines experimental measurements with available biological information in a way that genes are simultaneously tested in groups related by common functional properties. The method proposed constitutes a very sensitive tool for selecting genes with significant differential behaviour in the experimental conditions tested. Results: We propose the use of a method to scan ordered lists of genes. The method allows the understanding of the biological processes operating at molecular level behind the macroscopic experiment from which the list was generated. This procedure can be useful in situations where it is not possible to obtain statistically significant differences based on the experimental measurements (e.g. low prevalence diseases, etc.). Two examples demonstrate its application in two microarray experiments and the type of information that can be extracted. Availability: The software used for the association of significant Gene Ontology (GO) terms to sets of genes is available at http://www.fatigo.org and http://www.babelomics.org. Software for ranking genes according to phenotypes is available in GEPAS (http://www.gepas.org). The multtest program from the bioconductor package is available at http://www.bioconductor.org/repository/devel/package/html/multtest.html. Contact: jdopazo@ochoa.fib.es

...read moreread less

134 citations

Journal Article•10.1093/BIOINFORMATICS/BTH458•

Network structures and algorithms in Bioconductor

[...]

Vincent J. Carey¹, Jeff Gentry², Elizabeth Whalen², Robert Gentleman²•Institutions (2)

Brigham and Women's Hospital¹, Harvard University²

01 Jan 2005-Bioinformatics

TL;DR: Interfaces to open source resources for visualization and network algorithms have been developed to support analysis of graphical structures in genomics and computational biology.

...read moreread less

Abstract: Summary: In this paper, we review the central concepts and implementations of tools for working with network structures in Bioconductor. Interfaces to open source resources for visualization (AT&T Graphviz) and network algorithms (Boost) have been developed to support analysis of graphical structures in genomics and computational biology. Availability: Packages graph, Rgraphviz, RBGL of Bioconductor (www.bioconductor.org). Contact: stvjc@channing.harvard.edu

...read moreread less

85 citations

Book Chapter•10.1002/047001153X.G405208•

Differential expression with the Bioconductor Project

[...]

Anja von Heydebreck¹, Wolfgang Huber², Robert Gentleman³•Institutions (3)

Max Planck Society¹, German Cancer Research Center², Harvard University³

15 Nov 2005

TL;DR: Different approaches to the identification of changes in gene expression that are associated with particular biological conditions are discussed and how they can be applied using software from the Bioconductor Project is illustrated.

...read moreread less

Abstract: A basic, yet challenging task in the analysis of microarray gene expression data is the identification of changes in gene expression that are associated with particular biological conditions. We discuss different approaches to this task and illustrate how they can be applied using software from the Bioconductor Project. A central problem is the high dimensionality of gene expression space, which prohibits a comprehensive statistical analysis without focusing on particular aspects of the joint distribution of the genes' expression levels. Possible strategies are to do univariate gene-by-gene analysis, and to perform data-driven nonspecific filtering of genes before the actual statistical analysis. However, more focused strategies that make use of biologically relevant knowledge are more likely to increase our understanding of the data. Keywords: differential gene expression; microarrays; multiple testing; statistical software; biological metadata

...read moreread less

58 citations

Journal Article•10.1093/BIOINFORMATICS/BTI397•

A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data

[...]

A. L. Tarca¹, J. E. K. Cooke¹•Institutions (1)

Laval University¹

01 Jun 2005-Bioinformatics

TL;DR: A comparison of the robust neural network method with other published methods demonstrates its potential in reducing both intensity- dependent bias and spatial-dependent bias, which translates to more reliable identification of truly regulated genes.

...read moreread less

Abstract: Motivation: Microarray experiments are affected by numerous sources of non-biological variation that contribute systematic bias to the resulting data. In a dual-label (two-color) cDNA or long-oligonucleotide microarray, these systematic biases are often manifested as an imbalance of measured fluorescent intensities corresponding to Sample A versus those corresponding to Sample B. Systematic biases also affect between-slide comparisons. Making effective corrections for these systematic biases is a requisite for detecting the underlying biological variation between samples. Effective data normalization is therefore an essential step in the confident identification of biologically relevant differences in gene expression profiles. Several normalization methods for the correction of systemic bias have been described. While many of these methods have addressed intensity-dependent bias, few have addressed both intensity-dependent and spatiality-dependent bias. Results: We present a neural network-based normalization method for correcting the intensity- and spatiality-dependent bias in cDNA microarray datasets. In this normalization method, the dependence of the log-intensity ratio (M) on the average log-intensity (A) as well as on the spatial coordinates (X,Y) of spots is approximated with a feed-forward neural network function. Resistance to outliers is provided by assigning weights to each spot based on how distant their M values is from the median over the spots whose A values are similar, as well as by using pseudospatial coordinates instead of spot row and column indices. A comparison of the robust neural network method with other published methods demonstrates its potential in reducing both intensity-dependent bias and spatial-dependent bias, which translates to more reliable identification of truly regulated genes. Availability: The normalization method described in this paper is available as the library nnNorm in the BioConductor project (http://www.bioconductor.org). Scripts used to load the freely available data and generate some of the figures in this paper are available in the documentation accompanying this library. Contact: ltarca@rsvs.ulaval.ca

...read moreread less

47 citations

Book Chapter•10.1007/0-387-29362-0_14•

Analysis of Differential Gene Expression Studies

[...]

D. Scholtens, A. von Heydebreck

1 Jan 2005

TL;DR: This chapter discusses strategies for geneat-a-time analyses, nonspecific and meta-data driven prefiltering techniques, and commonly used test statistics for detecting differential expression, and demonstrates the use of factorial models for probing complex biological systems.

...read moreread less

Abstract: In this chapter, we focus on the analysis of differential gene expression studies. Many microarray studies are designed to detect genes associated with different phenotypes, for example, the comparison of cancer tumors and normal cells. In some multifactor experiments, genetic networks are perturbed with various treatments to understand the effects of those treatments and their interactions with each other in the dynamic cellular network. For even the simplest experiments, investigators must consider several issues for appropriate gene selection. We discuss strategies for geneat-a-time analyses, nonspecific and meta-data driven prefiltering techniques, and commonly used test statistics for detecting differential expression. We show how these strategies and statistical tools are implemented and used in Bioconductor. We also demonstrate the use of factorial models for probing complex biological systems and highlight the importance of carefully coordinating known cellular behavior with statistical modeling to make biologically relevant inference from microarray studies.

...read moreread less

43 citations

Book Chapter•10.1007/0-387-29362-0_25•

From CEL Files to Annotated Lists of Interesting Genes

[...]

R. A. Irizarry

1 Jan 2005

TL;DR: In this article, the authors demonstrate Bioconductor tools useful for creating such lists, starting from the raw probe level data (CEL files) and conclude with the creation of annotated reports.

...read moreread less

Abstract: One of the most popular applications of microarray technology is the identification of genes that are differentially expressed in two populations.With Affymetrix GeneChip technology, there are several steps between hybridization and the selection of interesting genes. The steps of preprocessing to improve signal to noise ratios, choosing a summary statistic for appropriate ranking of genes, and deciding on a final filter for candidate genes are largely statistical in nature. In this chapter, we demonstrate Bioconductor tools useful for creating such lists. We start from the raw probe level data (CEL files) and conclude with the creation of annotated reports.

...read moreread less

17 citations

Book Chapter•10.1007/0-387-29362-0_16•

Machine Learning Concepts and Tools for Statistical Genomics

[...]

Vincent J. Carey

1 Jan 2005

TL;DR: The most widely used families of machine learning methods are described, along with various approaches to learner assessment, and key problems of model selection and interpretation are reviewed in examples.

...read moreread less

Abstract: In this chapter, supervised machine learning methods are described in the context of microarray applications. The most widely used families of machine learning methods are described, along with various approaches to learner assessment. The Bioconductor interfaces to machine learning tools are described and illustrated. Key problems of model selection and interpretation are reviewed in examples.

...read moreread less

BIOC 2005 Lab: From CEL Files to Annotated Lists of Interesting Genes

[...]

James W. MacDonald, Rafael A. Irizarry

1 Jan 2005

TL;DR: This chapter demonstrates Bioconductor tools useful for creating lists of genes that are differentially expressed in two populations and starts from the raw probe level data (CEL files) and concludes with the creation of annotated reports.

...read moreread less

Abstract: The predominant use for microarrays is the measurement of genome-wide expression levels, and the most commonly used microarray platform is the Affymetrix GeneChip Affymetrix GeneChip arrays use short oligonucleotides to probe for genes in an RNA sample Genes are represented by a set of oligonucleotide probes each with a length of 25 bases Because of their short length, multiple probes are used to improve specificity Affymetrix arrays typically use between 11 and 20 probe pairs, referred to as a probeset, for each gene One component of these pairs is referred to as a perfect match probe (PM) and is designed to hybridize only with transcripts from the intended gene (specific hybridization) However, hybridization to the PM probes by other mRNA species (non-specific hybridization) is unavoidable Therefore, the observed intensities need to be adjusted to be accurately quantified The other component of a probe pair, the mismatch probe (MM), is constructed with the intention of measuring only the nonspecific component of the corresponding PM probe Affymetrix’s strategy is to make MM probes identical to their PM counterpart except that the 13-th base is exchanged with its complement The identification of genes that are differentially expressed in two populations is a popular application of Affymetrix GeneChip technology Due to the cost of this technology, experiments using a small number of arrays are common A situation we often see is the case where three arrays are used for each population In this lab, we give an example of how to quickly create lists of genes that are interesting in the sense that they appear to be differentially expressed, starting from the raw probe level data (CEL files) In Section 2, we briefly describe the functions necessary to import the data into Bioconductor In Section 3 we talk about preprocessing In Section 4, we describe ways to rank genes and decide on a cutoff Finally, in Section 5 we describe how to make annotated reports and examine the PubMed literature related to the genes in our list

...read moreread less

Book Chapter•10.1007/0-387-29362-0_21•

Bioconductor Software for Graphs

[...]

Vincent J. Carey, Robert Gentleman, Wolfgang Huber, Jeff Gentry

1 Jan 2005

TL;DR: This chapter describes software tools for creating, manipulating, and visualizing graphs in the Bioconductor project and gives the rationale for the design decisions and brief outlines of how to make use of these tools.

...read moreread less

Abstract: We describe software tools for creating, manipulating, and visualizing graphs in the Bioconductor project. We give the rationale for our design decisions and provide brief outlines of how to make use of these tools. The discussion mirrors that of Chapter 20 where the different mathematical constructs were described. It is worth differentiating between packages that are mainly infrastructure (sets of tools that can be used to create other pieces of software) and packages that are designed to provide an end-user application. The packages graph, RBGL, and Rgraphviz are infrastructure packages. Software developers may use these packages to construct tools aimed at specific applications areas, such as the GOstats package.

...read moreread less

Journal Article•10.1186/1471-2105-6-211•

stam – a Bioconductor compliant R package for structured analysis of microarray data

[...]

Claudio Lottaz¹, Rainer Spang¹•Institutions (1)

Max Planck Society¹

25 Aug 2005-BMC Bioinformatics

TL;DR: Stam, a computational tool for semi-supervised molecular disease entity detection, automatically discovers molecular heterogeneities in phenotypically defined disease entities and suggests alternative molecular sub-entities of clinical phenotypes using both gene expression data and functional gene annotations.

...read moreread less

Abstract: Genome wide microarray studies have the potential to unveil novel disease entities. Clinically homogeneous groups of patients can have diverse gene expression profiles. The definition of novel subclasses based on gene expression is a difficult problem not addressed systematically by currently available software tools. We present a computational tool for semi-supervised molecular disease entity detection. It automatically discovers molecular heterogeneities in phenotypically defined disease entities and suggests alternative molecular sub-entities of clinical phenotypes. This is done using both gene expression data and functional gene annotations. We provide stam, a Bioconductor compliant software package for the statistical programming environment R. We demonstrate that our tool detects gene expression patterns, which are characteristic for only a subset of patients from an established disease entity. We call such expression patterns molecular symptoms. Furthermore, stam finds novel sub-group stratifications of patients according to the absence or presence of molecular symptoms. Our software is easy to install and can be applied to a wide range of datasets. It provides the potential to reveal so far indistinguishable patient sub-groups of clinical relevance.

...read moreread less

Book Chapter•10.1007/0-387-29362-0_18•

Browser-based Affymetrix Analysis and Annotation

[...]

Colin A. Smith

1 Jan 2005

TL;DR: This chapter will discuss the appropriate circumstances under which webbioc should be deployed and the pros and cons of using it versus the typical command line environment of R.

...read moreread less

Abstract: webbioc is a CGI-based interface to Bioconductor methods for preprocessing and analyzing Affymetrix data. It wraps up the functionality of a number of Bioconductor packages into a consistent environment that can be deployed for use by small groups or large departments. Without ever seeing a command prompt, it will take the user from raw data to annotated lists of the most significantly differentially expressed genes. It will optionally make use of a back-end computer cluster for batch processing. This chapter will discuss the appropriate circumstances under which webbioc should be deployed and the pros and cons of using it versus the typical command line environment of R. Installation and configuration will be fully covered. Use of theWeb-based interface will be visually demonstrated. Finally, we will describe how to expand the interface by adding additional analysis modules.

...read moreread less

Book Chapter•10.1002/047001153X.G409207•

Bioconductor: software and development strategies for statistical genomics

[...]

Robert Gentleman¹, Vincent J. Carey²•Institutions (2)

Harvard University¹, Brigham and Women's Hospital²

15 Apr 2005

TL;DR: The requirements, language features, and methodology of design and development guiding the evolution of this project are described, which are expected to foster the propagation of standards of transparency and explicit reproducibility from wet-lab science, to in silico biology, where explicit reproduction of important published results is often very difficult.

...read moreread less

Abstract: Bioconductor is an open source initiative for the creation and dissemination of methods in statistical genomics and computational biology based on R. This article describes the requirements, language features, and methodology of design and development guiding the evolution of this project. Commitments to software interoperability, computable task-oriented documentation, and full transparency of algorithm development and use are found to be valuable in reducing barriers to access faced by statistical, computational, or biological researchers attempting interdisciplinary work. These commitments are expected to foster the propagation of standards of transparency and explicit reproducibility from wet-lab science, where they are well accepted, to in silico biology, where explicit reproduction of important published results is often very difficult. Keywords: computational biology; open source software; object-oriented programming; documentation; network algorithms; software quality assurance; reproducible research

...read moreread less

Book Chapter•10.1007/0-387-29362-0_7•

Meta-data Resources and Tools in Bioconductor

[...]

Robert Gentleman, Vincent J. Carey, J. Zhang

1 Jan 2005

TL;DR: This section considers some of the different sources of biological information as well as the software tools that can be used to access these data and to integrate them into an analysis.

...read moreread less

Abstract: Closing the gap between knowledge of sequence and knowledge of function requires aggressive, integrative use of biological research databases of many different types. For greatest effectiveness, analysis processes and interpretation of analytic results must be guided using relevant knowledge about the systems under investigation. However, this knowledge is often widely scattered and encoded in a variety of formats. In this section, we consider some of the different sources of biological information as well as the software tools that can be used to access these data and to integrate them into an analysis. Bioconductor provides tools for creating, distributing, and accessing annotation resources in ways that have been found effective in workflows for statistical analysis of microarray and other high-throughput assays.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTI574•

goCluster integrates statistical analysis and functional interpretation of microarray expression data

[...]

Gunnar Wrobel¹, Frédéric Chalmel¹, Michael Primig¹•Institutions (1)

Swiss Institute of Bioinformatics¹

01 Sep 2005-Bioinformatics

TL;DR: The software package provides four clustering algorithms and GeneOntology terms as prototype annotation data and the functional analysis is based on the hypergeometric distribution whereby the Bonferroni correction or the false discovery rate can be used to correct for multiple testing.

...read moreread less

Abstract: Motivation: Several tools that facilitate the interpretation of transcriptional profiles using gene annotation data are available but most of them combine a particular statistical analysis strategy with functional information. goCluster extends this concept by providing a modular framework that facilitates integration of statistical and functional microarray data analysis with data interpretation. Results: goCluster enables scientists to employ annotation information, clustering algorithms and visualization tools in their array data analysis and interpretation strategy. The package provides four clustering algorithms and GeneOntology terms as prototype annotation data. The functional analysis is based on the hypergeometric distribution whereby the Bonferroni correction or the false discovery rate can be used to correct for multiple testing. The approach implemented in goCluster was successfully applied to interpret the results of complex mammalian and yeast expression data obtained with high density oligonucleotide microarrays (GeneChips). Availability: goCluster is available via the BioConductor portal at www.bioconductor.org. The software package, detailed documentation, user- and developer guides as well as other background information are also accessible via a web portal at http://www.bioz.unibas.ch/gocluster. Contact: michael.primig@unibas.ch

...read moreread less

Journal Article•10.1093/NAR/GKI497•

MIDAW: a web tool for statistical analysis of microarray data

[...]

Chiara Romualdi¹, Nicola Vitulo¹, Micky Del Favero¹, Gerolamo Lanfranchi¹•Institutions (1)

University of Padua¹

01 Jul 2005-Nucleic Acids Research

TL;DR: MIDAW (microarray data analysis web tool) is a web interface integrating a series of statistical algorithms that can be used for processing and interpretation of microarray data.

...read moreread less

Abstract: MIDAW (microarray data analysis web tool) is a web interface integrating a series of statistical algorithms that can be used for processing and interpretation of microarray data. MIDAW consists of two main sections: data normalization and data analysis. In the normalization phase the simultaneous processing of several experiments with background correction, global and local mean and variance normalization are carried out. The data analysis section allows graphical display of expression data for descriptive purposes, estimation of missing values, reduction of data dimension, discriminant analysis and identification of marker genes. The statistical results are organized in dynamic web pages and tables, where the transcript/gene probes contained in a specific microarray platform can be linked (according to user choice) to external databases (GenBank, Entrez Gene, UniGene). Tutorial files help the user throughout the statistical analysis to ensure that the forms are filled out correctly. MIDAW has been developed using Perl and PHP and it uses R/Bioconductor languages and routines. MIDAW is GPL licensed and freely accessible at http://muscle.cribi.unipd.it/midaw/. Perl and PHP source codes are available from the authors upon request.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTI108•

Identifying differentially expressed genes from microarray experiments via statistic synthesis

[...]

Yee Hwa Yang¹, Yuanyuan Xiao¹, Mark R. Segal¹•Institutions (1)

University of California, San Francisco¹

01 Apr 2005-Bioinformatics

TL;DR: Mark et al. as discussed by the authors proposed a distance synthesis scheme for identifying differentially expressed genes using a set of spike-in datasets, in which known genes are known, and demonstrated that their method compares favorably with the best individual statistics, while achieving robustness properties lacked by the individual statistics.

...read moreread less

Abstract: Motivation: A common objective of microarray experiments is the detection of differential gene expression between samples obtained under different conditions. The task of identifying differentially expressed genes consists of two aspects: ranking and selection. Numerous statistics have been proposed to rank genes in order of evidence for differential expression. However, no one statistic is universally optimal and there is seldom any basis or guidance that can direct toward a particular statistic of choice. Results: Our new approach, which addresses both ranking and selection of differentially expressed genes, integrates differing statistics via a distance synthesis scheme. Using a set of (Affymetrix) spike-in datasets, in which differentially expressed genes are known, we demonstrate that our method compares favorably with the best individual statistics, while achieving robustness properties lacked by the individual statistics. We further evaluate performance on one other microarray study. Availability: The approach is implemented in an R package called DEDS, which is available for download from the Bioconductor website (http://www.bioconductor.org/). Contact: mark@biostat.ucsf.edu

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTI292•

Molecular decomposition of complex clinical phenotypes using biologically structured analysis of microarray data

[...]

Claudio Lottaz¹, Rainer Spang¹•Institutions (1)

Max Planck Society¹

01 May 2005-Bioinformatics

TL;DR: A novel algorithm called Structured Analysis of Microarrays (StAM), which accounts for molecular heterogeneity of complex clinical phenotypes and goes beyond established methodology in several aspects: in addition to the expression data, it exploits functional annotations from the Gene Ontology database to build biologically focussed classifiers.

...read moreread less

Abstract: Motivation: Today, the characterization of clinical phenotypes by gene-expression patterns is widely used in clinical research. If the investigated phenotype is complex from the molecular point of view, new challanges arise and these have not been adressed systematically. For instance, the same clinical phenotype can be caused by various molecular disorders, such that one observes different characteristic expression patterns in different patients. Results: In this paper we describe a novel algorithm called Structured Analysis of Microarrays (StAM), which accounts for molecular heterogeneity of complex clinical phenotypes. Our algorithm goes beyond established methodology in several aspects: in addition to the expression data, it exploits functional annotations from the Gene Ontology database to build biologically focussed classifiers. These are used to uncover potential molecular disease subentities and associate them to biological processes without compromising overall prediction accuracy. Availability: Bioconductor compliant R package Contact: Claudio.Lottaz@molgen.mpg.de Supplementary information: Complete analyses are available at http://compdiag.molgen.mpg.de/supplements/lottaz05

...read moreread less

Book Chapter•10.1007/0-387-29362-0_2•

Preprocessing High-density Oligonucleotide Arrays

[...]

Benjamin M. Bolstad, Rafael A. Irizarry, Laurent Gautier, Z. Wu

1 Jan 2005

TL;DR: This chapter begins by describing how to import probe-level data into the system and how these data can be examined using the facilities of the AffyBatch class, and describes background adjustment, normalization, and summarization methods.

...read moreread less

Abstract: High-density oligonucleotide expression arrays are a widely used microarray platform. Affymetrix GeneChip arrays dominate this market. An important distinction between the GeneChip and other technologies is that on GeneChips, multiple short probes are used to measure gene expression levels. This makes preprocessing particularly important when using this platform. This chapter begins by describing how to import probe-level data into the system and how these data can be examined using the facilities of the AffyBatch class. Then we will describe background adjustment, normalization, and summarization methods. Functionality for GeneChip probe-level data is provided by the affy, affyPLM, affycomp, gcrma, and affypdnn packages. All these tools are useful for preprocessing probe-level data stored in an AffyBatch object into expression-level data stored in an exprSet object. Because there are many competing methods for this preprocessing step, it is useful to have a way to assess the differences. In Bioconductor, this can be carried out using the affycomp package, which we discuss briefly.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTI605•

Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis

[...]

Claire Wilson, Crispin J. Miller

15 Sep 2005-Bioinformatics

TL;DR: Simpleaffy is a BioConductor package that provides access to a variety of QC metrics for assessing the quality of RNA samples and of the intermediate stages of sample preparation and hybridization.

...read moreread less

Abstract: Summary: Quality Control is a fundamental aspect of successful microarray data analysis. Simpleaffy is a BioConductor package that provides access to a variety of QC metrics for assessing the quality of RNA samples and of the intermediate stages of sample preparation and hybridization. Simpleaffy also offers fast implementations of popular algorithms for generating expression summaries and detection calls. Availability: Simpleaffy can be downloaded from http://www.bioconductor.org Contact: cmiller@picr.man.ac.uk Supplementary information: Additional information can be found on the supplementary website located at http://bioinformatics.picr.man.ac.uk

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTI436•

twilight; a Bioconductor package for estimating the local false discovery rate

[...]

Stefanie Scheid¹, Rainer Spang¹•Institutions (1)

Max Planck Society¹

15 Jun 2005-Bioinformatics

TL;DR: twilight as mentioned in this paper is a Bioconductor compatible package for analysing the statistical significance of differentially expressed genes, which is based on the concept of the local false discovery rate (FDR), a generalization of the frequently used global FDR.

...read moreread less

Abstract: Summary: twilight is a Bioconductor compatible package for analysing the statistical significance of differentially expressed genes. It is based on the concept of the local false discovery rate (FDR), a generalization of the frequently used global FDR. twilight implements the heuristic search algorithm for estimating the local FDR introduced in our earlier work. In addition to the raw significance measures, it produces diagnostic plots, which provide insight into the extent of differential expression across genes. Availability: http://www.bioconductor.org Contact: stefanie.scheid@molgen.mpg.de Supplementary information: Please visit our software webpage on http://compdiag.molgen.mpg.de/software

...read moreread less

Journal Article•10.1186/1471-2105-6-306•

WebArray: an online platform for microarray data analysis.

[...]

Xiao-Qin Xia, Michael McClelland, Yipeng Wang

21 Dec 2005-BMC Bioinformatics

TL;DR: This work developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments, and provides a user-friendly interface for accessing a wide range of key functions of limma and others.

...read moreread less

Abstract: Background Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments.

...read moreread less