TL;DR: Efficient Common Array Dye Swap (eCADS) is proposed for normalizing two-channel microarrays that accounts for both experimental design and intensity-dependent biases and preserves differential expression relationships.
Abstract: In normalizing two-channel expression arrays, the ANOVA approach explicitly incorporates the experimental design in its model, and the MA plot-based approach accounts for intensity-dependent biases. However, both approaches can lead to inaccurate normalization in fairly common scenarios. We propose a method called efficient Common Array Dye Swap (eCADS) for normalizing two-channel microarrays that accounts for both experimental design and intensity-dependent biases. Under reasonable experimental designs, eCADS preserves differential expression relationships and requires only a single array per sample pair.
TL;DR: Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+, without any programming skills.
Abstract: Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way. PreP+07 is a standalone Windows XP application that presents a friendly interface for spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error removal and statistical analyses. Additionally, it contains two unique implementation of the procedures – double scan and Supervised Lowess-, a complete set of graphical representations – MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor Limma packages were statistically identical when the data set was only normalized; however, a slight variability was appreciated when the data was both normalized and scaled. PreP+07 implementation provides a high degree of freedom in selecting and organizing a small set of widely used data processing protocols, and can handle many data formats. Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+07 without any programming skills. All of this gives support to scientists that have been using previous PreP releases since its first version in 2003.
TL;DR: This study determined that methods based on invariant sets are better able to resolve the problem of asymmetry, and KDL and KDQ in combination with GCRMA provided the best performance among all approaches.
Abstract: Normalization of gene expression data has been studied for many years and various strategies have been formulated to deal with various types of data. Most normalization algorithms rely on the assumption that the number of up-regulated genes and the number of down-regulated genes are roughly the same. However, the well-known Golden Spike experiment presents a unique situation in which differentially regulated genes are biased toward one direction, thereby challenging the conclusions of previous bench mark studies. This study proposes two novel approaches, KDL and KDQ, based on kernel density estimation to improve upon the basic idea of invariant set selection. The key concept is to provide various importance scores to data points on the MA plot according to their proximity to the cluster of the null genes under the assumption that null genes are more densely distributed than those that are differentially regulated. The comparison is demonstrated in the Golden Spike experiment as well as with simulation data using the ROC curves and compression rates. KDL and KDQ in combination with GCRMA provided the best performance among all approaches. This study determined that methods based on invariant sets are better able to resolve the problem of asymmetry. Normalization, either before or after expression summary for probesets, improves performance to a similar degree.
TL;DR: When median and quantile normalization were applied, both methods showed similar normalization effect and the final CNV calls were also similar in terms of number and size, which may suggest that RMA background correction may help to detect more CNVs compared to no correction.
Abstract: Precise and reliable identification of CNV is still im-portant to fully understand the effect of CNV on genetic diversity and background of complex diseases. SNP marker has been used frequently to detect CNVs, but the analysis of SNP chip data for identifying CNV has not been well established. We compared various nor-malization methods for CNV analysis and suggest opti-mal normalization procedure for reliable CNV call. Four normal Koreans and NA10851 HapMap male samples were genotyped using Affymetrix Genome-Wide Human SNP array 5.0. We evaluated the effect of median and quantile normalization to find the optimal normalization for CNV detection based on SNP array data. We also explored the effect of Robust Multichip Average (RMA) background correction for each normalization process. In total, the following 4 combinations of normalization were tried: 1) Median normalization without RMA back-ground correction, 2) Quantile normalization without RMA background correction, 3) Median normalization with RMA background correction, and 4) Quantile nor-malization with RMA background correction. CNV was called using SW-ARRAY algorithm. We applied 4 differ-ent combinations of normalization and compared the ef-fect using intensity ratio profile, box plot, and MA plot. When we applied median and quantile normalizations without RMA background correction, both methods showed similar normalization effect and the final CNV calls were also similar in terms of number and size. In both median and quantile normalizations, RMA back-ground correction resulted in widening the range of in-tensity ratio distribution, which may suggest that RMA background correction may help to detect more CNVs compared to no correction.Keywords: copy number variation, normalization, SNP, Robust Multiarray Average
TL;DR: In this article, the authors discuss the concern that intensity-dependent normalization can give biased estimates of differential expression and therefore can misclassify some moderately important genes as unexpressed.
Abstract: We discuss the concern that intensity-dependent normalization can give biased estimates of differential expression and, therefore, can misclassify some moderately important genes as unexpressed.