TL;DR: It is demonstrated that the mean expression value outperforms the current normalization strategy in terms of better reduction of technical variation and more accurate appreciation of biological changes.
Abstract: Gene expression analysis of microRNA molecules is becoming increasingly important. In this study we assess the use of the mean expression value of all expressed microRNAs in a given sample as a normalization factor for microRNA real-time quantitative PCR data and compare its performance to the currently adopted approach. We demonstrate that the mean expression value outperforms the current normalization strategy in terms of better reduction of technical variation and more accurate appreciation of biological changes.
TL;DR: The concept of normalization in transcript quantification is introduced here in an attempt to convince molecular biologists, and non-specialists, that systematic validation of reference genes is essential for producing accurate, reliable data in qRT-PCR analyses, and thus should be an integral component of them.
Abstract: Quantitative RT-PCR (reverse transcription polymerase chain reaction, also known as qRT-PCR or real-time RT-PCR) has been used in large proportions of transcriptome analyses published to date. The accuracy of the results obtained by this method strongly depends on accurate transcript normalization using stably expressed genes, known as references. Statistical algorithms have been developed recently to help validate reference genes but, surprisingly, this robust approach is under-utilized in plants. Instead, putative 'housekeeping' genes tend to be used as references without any proper validation. The concept of normalization in transcript quantification is introduced here and the factors affecting its reliability in qRT-PCR are discussed in an attempt to convince molecular biologists, and non-specialists, that systematic validation of reference genes is essential for producing accurate, reliable data in qRT-PCR analyses, and thus should be an integral component of them.
TL;DR: This work fills in a long open gap in the characterization of the minimax rate for the multi-armed bandit prob- lem and proposes a new family of randomized algorithms based on an implicit normalization, as well as a new analysis.
Abstract: We fill in a long open gap in the characterization of the minimax rate for the multi-armed bandit prob- lem Concretely, we remove an extraneous loga- rithmic factor in the previously known upper bound and propose a new family of randomized algorithms based on an implicit normalization, as well as a new analysis We also consider the stochastic case, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al, 2002) achieves the distribution-free optimal rate while still having a distribution-dependent rate log- arithmic in the number of plays
TL;DR: An organized study of 16 external validation measures for K-means clustering by introducing the importance of measure normalization in the evaluation of the clustering performance on data with imbalanced class distributions and revealing the interrelationships among these external measures.
Abstract: Clustering validation is a long standing challenge in the clustering literature. While many validation measures have been developed for evaluating the performance of clustering algorithms, these measures often provide inconsistent information about the clustering performance and the best suitable measures to use in practice remain unknown. This paper thus fills this crucial void by giving an organized study of 16 external validation measures for K-means clustering. Specifically, we first introduce the importance of measure normalization in the evaluation of the clustering performance on data with imbalanced class distributions. We also provide normalization solutions for several measures. In addition, we summarize the major properties of these external measures. These properties can serve as the guidance for the selection of validation measures in different application scenarios. Finally, we reveal the interrelationships among these external measures. By mathematical transformation, we show that some validation measures are equivalent. Also, some measures have consistent validation performances. Most importantly, we provide a guide line to select the most suitable validation measures for K-means clustering.
TL;DR: A method to select nonchanging miRNAs (invariants) and use them to compute linear regression normalization coefficients or variance stabilizing normalization (VSN) parameters is developed and can be applied to other data sets including those from one color miRNA microarray platforms, focused gene expression arrays, and gene expression analysis using quantitative PCR.
Abstract: Profiling miRNA levels in cells with miRNA microarrays is becoming a widely used technique. Although normalization methods for mRNA gene expression arrays are well established, miRNA array normalization has so far not been investigated in detail. In this study we investigate the impact of normalization on data generated with the Agilent miRNA array platform. We have developed a method to select nonchanging miRNAs (invariants) and use them to compute linear regression normalization coefficients or variance stabilizing normalization (VSN) parameters. We compared the invariants normalization to normalization by scaling, quantile, and VSN with default parameters as well as to no normalization using samples with strong differential expression of miRNAs (heart–brain comparison) and samples where only a few miRNAs are affected (by p53 overexpression in squamous carcinoma cells versus control). All normalization methods performed better than no normalization. Normalization procedures based on the set of invariants and quantile were the most robust over all experimental conditions tested. Our method of invariant selection and normalization is not limited to Agilent miRNA arrays and can be applied to other data sets including those from one color miRNA microarray platforms, focused gene expression arrays, and gene expression analysis using quantitative PCR.
TL;DR: The debate about which similarity measure one should use for the normalization in the case of Author Co-citation Analysis (ACA) is further complicated when one distinguishes between the symmetrical co-citations--or, more generally, co-occurrence--matrix and the underlying asymmetrical citation--occurrence-matrix.
Abstract: The debate about which similarity measure one should use for the normalization in the case of Author Co-citation Analysis (ACA) is further complicated when one distinguishes between the symmetrical co-citation--or, more generally, co-occurrence--matrix and the underlying asymmetrical citation--occurrence--matrix. In the Web environment, the approach of retrieving original citation data is often not feasible. In that case, one should use the Jaccard index, but preferentially after adding the number of total citations (occurrences) on the main diagonal. Unlike Salton's cosine and the Pearson correlation, the Jaccard index abstracts from the shape of the distributions and focuses only on the intersection and the sum of the two sets. Since the correlations in the co-occurrence matrix may partially be spurious, this property of the Jaccard index can be considered as an advantage in this case.
TL;DR: In this paper, the authors present constraints on σ8 and the total matter density from local cluster counts as a function of X-ray temperature, taking care to incorporate and minimize systematic errors that plagued previous work with this method.
Abstract: The number density of galaxy clusters provides tight statistical constraints on the matter fluctuation power spectrum normalization, traditionally phrased in terms of σ8, the root-mean-square mass fluctuation in spheres with radius 8 h –1 Mpc. We present constraints on σ8 and the total matter density Ωm0 from local cluster counts as a function of X-ray temperature, taking care to incorporate and minimize systematic errors that plagued previous work with this method. In particular, we present new determinations of the cluster luminosity-temperature and mass-temperature relations, including their intrinsic scatter, and a determination of the Jenkins mass function parameters for the same mass definition as the mass-temperature calibration. Marginalizing over the 12 uninteresting parameters associated with this method, we find that the local cluster temperature function implies σ8(Ωm0/0.32)α = 0.86 ± 0.04 with α = 0.30 and 0.41 for Ωm0 ≤ 0.32 and Ωm0 ≥ 0.32, respectively (68% confidence for two parameters). This result agrees with a wide range of recent independent determinations, and we find no evidence of any additional sources of systematic error for the X-ray cluster temperature function determination of the matter power spectrum normalization. The joint WMAP5 + cluster constraints are Ωm0 = 0.30+0.03 –0.02 and σ8 = 0.85+0.04 –0.02 (68% confidence for two parameters).
TL;DR: It is shown that distributions of spatially proximal bandpass filter responses are better described as elliptical than as linearly transformed independent sources, and it is demonstrated that the reduction in dependency achieved by applying RG to either nearby pairs or blocks of bandpass filters is significantly greater than that achieved by ICA.
Abstract: We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, known as independent component analysis (ICA), exists for the case when the signal is generated as a linear transformation of independent nongaussian sources. Here, we examine a complementary case, in which the source is nongaussian and elliptically symmetric. In this case, no invertible linear transform suffices to decompose the signal into independent components, but we show that a simple nonlinear transformation, which we call radial gaussianization (RG), is able to remove all dependencies. We then examine this methodology in the context of natural image statistics. We first show that distributions of spatially proximal bandpass filter responses are better described as elliptical than as linearly transformed independent sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either nearby pairs or blocks of bandpass filter responses is significantly greater than that achieved by ICA. Finally, we show that the RG transformation may be closely approximated by divisive normalization, which has been used to model the nonlinear response properties of visual neurons.
TL;DR: In this paper, a series of Monte Carlo simulations were performed to compare the Blom, Tukey, Van der Waerden and Rankit approximations in terms of achieving the T score's specified mean and standard deviation and unit normal skewness and kurtosis.
Abstract: The purpose of this article is to provide an empirical comparison of rank-based normalization methods for standardized test scores. A series of Monte Carlo simulations were performed to compare the Blom, Tukey, Van der Waerden and Rankit approximations in terms of achieving the T score’s specified mean and standard deviation and unit normal skewness and kurtosis. All four normalization methods were accurate on the mean but were variably inaccurate on the standard deviation. Overall, deviation from the target moments was pronounced for the even moments but slight for the odd moments. Rankit emerged as the most accurate method among all sample sizes and distributions, thus it should be the default selection for score normalization in the social and behavioral sciences. However, small samples and skewed distributions degrade the performance of all methods, and practitioners should take these conditions into account when making decisions based on standardized test scores.
TL;DR: Microarray-based normalization and statistical analysis (significance testing) methods are applied to analyze quantitative proteomics data generated from the metabolic labeling of a marine bacterium to determine an appropriate statistical method for assessing differential abundance.
TL;DR: Two normalization methods are developed that remove technical between‐sample variation by aligning prominent features (landmarks) in the raw data on a per‐channel basis, thereby facilitating the use of automated analyses on large flow cytometry data sets.
Abstract: Between-sample variation in high throughput flow cytometry data poses a significant challenge for analysis of large scale data sets, such as those derived from multi-center clinical trials. It is often hard to match biologically relevant cell populations across samples due to technical variation in sample acquisition and instrumentation differences. Thus normalization of data is a critical step prior to analysis, particularly in large-scale data sets from clinical trials, where group specific differences may be subtle and patient-to-patient variation common. We have developed two normalization methods that remove technical between-sample variation by aligning prominent features (landmarks) in the raw data on a per-channel basis. These algorithms were tested on two independent flow cytometry data sets by comparing manually gated data, either individually for each sample or using static gating templates, before and after normalization. Our results show a marked improvement in the overlap between manual and static gating when the data are normalized, thereby facilitating the use of automated analyses on large flow cytometry data sets. Such automated analyses are essential for high throughput flow cytometry.
TL;DR: This work identifies the various sources of bias and shows that most of them can be eliminated by an appropriate normalization, and introduces a measure based on ranks of distances that outperforms existing distance-based measures concerning both sensitivity and specificity for directional couplings.
Abstract: To detect directional couplings from time series various measures based on distances in reconstructed state spaces were introduced. These measures can, however, be biased by asymmetries in the dynamics' structure, noise color, or noise level, which are ubiquitous in experimental signals. Using theoretical reasoning and results from model systems we identify the various sources of bias and show that most of them can be eliminated by an appropriate normalization. We furthermore diminish the remaining biases by introducing a measure based on ranks of distances. This rank-based measure outperforms existing distance-based measures concerning both sensitivity and specificity for directional couplings. Therefore, our findings are relevant for a reliable detection of directional couplings from experimental signals.
TL;DR: In this paper, a least square algorithm with appropriate normalization is used for solving the over-determined system of equations with noise-polluted data, and proper selection of measured frequency points improved the accuracy and convergence in finite element model updating.
TL;DR: In this paper, the procedures used to calibrate the 532-nm measurements acquired during the nighttime-portions of the CALIPSO orbits are described and compared to validation data acquired by the NASA airborne high-spectral resolution lidar.
Abstract: The Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) mission was launched in April 2006 and has continuously acquired collocated multisensor observations of the spatial and optical properties of clouds and aerosols in the earth’s atmosphere. The primary payload aboard CALIPSO is the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP), which makes range-resolved measurements of elastic backscatter at 532 and 1064 nm and linear depolarization ratios at 532 nm. CALIOP measurements are important in reducing uncertainties that currently limit understanding of the global climate system, and it is essential that these measurements be accurately calibrated. This work describes the procedures used to calibrate the 532-nm measurements acquired during the nighttimeportions of the CALIPSO orbits. Accurate nighttime calibration of the 532-nm parallel-channel data is fundamental to the success of the CALIOP measurement scheme, because the nighttime calibration is used to infer calibration across the day side of the orbits and all other channels are calibrated relative to the 532-nm parallel channel. The theoretical basis of the molecular normalization technique as applied to space-based lidar measurements is reviewed, and a comprehensive overview of the calibration algorithm implementation is provided. Also included is a description of a data filtering procedure that detects and removes spurious high-energy events that would otherwise introduce large errors into the calibration. Error estimates are derived and comparisons are made to validation data acquired by the NASA airborne high‐spectral resolution lidar. Similar analyses are also presented for the 532-nm perpendicular-channel calibration technique.
TL;DR: GeNo is presented, a highly competitive system for gene name normalization, which obtains an F-measure performance of 86.4% (precision: 87.8%, recall: 85.0%) on the BioCreAtIvE-II test set, thus being on a par with the best system on that task.
Abstract: Motivation: The recognition and normalization of textual mentions of gene and protein names is both particularly important and challenging. Its importance lies in the fact that they constitute the crucial conceptual entities in biomedicine. Their recognition and normalization remains a challenging task because of widespread gene name ambiguities within species, across species, with common English words and with medical sublanguage terms.
Results: We present GeNo, a highly competitive system for gene name normalization, which obtains an F-measure performance of 86.4% (precision: 87.8%, recall: 85.0%) on the BioCreAtIvE-II test set, thus being on a par with the best system on that task. Our system tackles the complex gene normalization problem by employing a carefully crafted suite of symbolic and statistical methods, and by fully relying on publicly available software and data resources, including extensive background knowledge based on semantic profiling. A major goal of our work is to present GeNo's architecture in a lucid and perspicuous way to pave the way to full reproducibility of our results.
Availability: GeNo, including its underlying resources, will be available from www.julielab.de. It is also currently deployed in the Semedico search engine at www.semedico.org.
Contact: joachim.wermter@uni-jena.de
TL;DR: EigenMS is an adaptation of the surrogate variable analysis (SVA) algorithm of Leek and Storey, with the adaptations including a novel approach to preventing overfitting that facilitates the incorporation of EigenMS into an existing proteomics analysis pipeline.
Abstract: Motivation: LC-MS allows for the identification and quantification of proteins from biological samples. As with any high-throughput technology, systematic biases are often observed in LC-MS data, making normalization an important preprocessing step. Normalization models need to be flexible enough to capture biases of arbitrary complexity, while avoiding overfitting that would invalidate downstream statistical inference. Careful normalization of MS peak intensities would enable greater accuracy and precision in quantitative comparisons of protein abundance levels.
Results: We propose an algorithm, called EigenMS, that uses singular value decomposition to capture and remove biases from LC-MS peak intensity measurements. EigenMS is an adaptation of the surrogate variable analysis (SVA) algorithm of Leek and Storey, with the adaptations including (i) the handling of the widespread missing measurements that are typical in LC-MS, and (ii) a novel approach to preventing overfitting that facilitates the incorporation of EigenMS into an existing proteomics analysis pipeline. EigenMS is demonstrated using both large-scale calibration measurements and simulations to perform well relative to existing alternatives.
Availability: The software has been made available in the open source proteomics platform DAnTE (Polpitiya et al., 2008)) ( http://omics.pnl.gov/software/), as well as in standalone software available at SourceForge (http://sourceforge.net).
Contact: yuliya@stat.tamu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
TL;DR: In this article, the authors consider the generic problem of performing a global fit to many independent data sets each with a different overall multiplicative normalization uncertainty, and develop a method which is unbiased, based on a self-consistent iterative procedure.
Abstract: We consider the generic problem of performing a global fit to many independent data sets each with a different overall multiplicative normalization uncertainty. We show that the methods in common use to treat multiplicative uncertainties lead to systematic biases. We develop a method which is unbiased, based on a self--consistent iterative procedure. We demonstrate the use of this method by applying it to the determination of parton distribution functions with the NNPDF methodology, which uses a Monte Carlo method for uncertainty estimation.
TL;DR: The resource allocation simulations indicated that marginal improvements in the precision of a group exposure mean would occur above three RVE repeats for EMG collected on one day, or beyond two RVEs for EMg collected on two or more days.
TL;DR: The local appearance-based face recognition approach is found to be robust against errors introduced by face model fitting and shows a significant improvement in accuracy.
Abstract: We focused this work on handling variation in facial appearance caused by 3D head pose. A pose normalization approach based on fitting active appearance models (AAM) on a given face image was investigated. Profile faces with different rotation angles in depth were warped into shape-free frontal view faces. Face recognition experiments were carried out on the pose normalized facial images with a local appearance-based approach. The experimental results showed a significant improvement in accuracy. The local appearance-based face recognition approach is found to be robust against errors introduced by face model fitting.
TL;DR: The results show that the non-linear normalization method can be used to analyze ChIP-seq data across multiple samples and imply that there may be a dysregulation of cell cycle and gene expression control pathways in the tamoxifen-resistant cells.
Abstract: Motivation: Antibody-based Chromatin Immunoprecipitation assay followed by high-throughput sequencing technology (ChIP-seq) is a relatively new method to study the binding patterns of specific protein molecules over the entire genome. ChIP-seq technology allows scientist to get more comprehensive results in shorter time. Here, we present a non-linear normalization algorithm and a mixture modeling method for comparing ChIP-seq data from multiple samples and characterizing genes based on their RNA polymerase II (Pol II) binding patterns.
Results: We apply a two-step non-linear normalization method based on locally weighted regression (LOESS) approach to compare ChIP-seq data across multiple samples and model the difference using an Exponential-NormalK mixture model. Fitted model is used to identify genes associated with differential binding sites based on local false discovery rate (fdr). These genes are then standardized and hierarchically clustered to characterize their Pol II binding patterns. As a case study, we apply the analysis procedure comparing normal breast cancer (MCF7) to tamoxifen-resistant (OHT) cell line. We find enriched regions that are associated with cancer (P < 0.0001). Our findings also imply that there may be a dysregulation of cell cycle and gene expression control pathways in the tamoxifen-resistant cells. These results show that the non-linear normalization method can be used to analyze ChIP-seq data across multiple samples.
Availability: Data are available at http://www.bmi.osu.edu/~khuang/Data/ChIP/RNAPII/
Contact:taslim.2@osu.edu; khuang@bmi.osu.edu
Supplementary information:Supplementary data are available at Bioinformatics online.
TL;DR: It is demonstrated that the physics of antenna arrays and propagation channel should be taken into account when normalization is chosen, so that SNR has proper physical meaning, the conclusions are physical and correspond to realistic systems.
Abstract: Various normalizations of the MIMO channel matrix are discussed from a physical perspective. It is demonstrated that the physics of antenna arrays and propagation channel should be taken into account when normalization is chosen, so that SNR has proper physical meaning, the conclusions are physical and correspond to realistic systems. The antenna array geometry and the transmission strategy (coherent/non-coherent) limits the choice of normalization and determines how the capacity and other performance metrics scale with the number of antennas, which is more pronounced for densely-populated antenna arrays. This is especially important for an asymptotic analysis, when the number of antennas increases to infinity. Limitations of such analysis from the physical perspective are pointed out.
TL;DR: In this paper, an approximate solution of the Schrodinger equation with the Hulthen potential is obtained in D-dimensions with an exponential approximation of the centrifugal term, and the expectation values r−2, V(r) are also obtained by using the Hellmann-Feynman theorem.
Abstract: An approximate solution of the Schrodinger equation with the Hulthen potential is obtained in D-dimensions with an exponential approximation of the centrifugal term. A solution to the corresponding hyperradial equation is given by using the conventional Nikiforov–Uvarov method. The normalization constants for the Hulthen potential are also computed. The expectation values r−2, V(r) are also obtained by using the Hellmann–Feynman theorem.
TL;DR: Some standards that need to be met for papers in these areas to be seriously considered are described, and it is asked that prospective authors consider these points carefully before submission of their papers to Bioinformatics.
Abstract: Over the last decade or so, there have been large numbers of methods published on approaches for normalization, variable (gene) selection, classification, and clustering of microarray data. As indicated in the scope document for Bioinformatics, this requires papers describing new methods for these problems to meet a very high standard, showing important improvement in results for real biological data, as well as novelty. In this editorial, we describe some standards that need to be met for papers in these areas to be seriously considered. We ask that prospective authors consider these points carefully before submission of their papers to Bioinformatics. The Role of Simulation. Simulation can be useful in investigating the properties of various methods of data analysis. Yet there are important barriers to credible use of simulation in microarray studies, largely due to what we don’t know about the statistical distribution of measured gene expression levels. First, the distribution across transcripts of true expression values is dependent on the biological state of the tissue or cell, and for a given state this is unknown, even in distributional form, and may further exhibit genespecific and platform-specific effects. Second, the correlation within biological replicates of true expression is unknown, and is likely unknowable in detail given that it is
TL;DR: Three major principles for the selection of indicator data normalization methods in multi-attribute evaluation are presented in this paper and a new normalization method for negative indicators is proposed.
Abstract: Three major principles for the selection of indicator data normalization methods in multi-attribute evaluation are presented in this paper. Principle 1: The relative gap between the data for the same indicator should remain constant; Principle 2: The relative gap between different indicators should remain variable ; and Principle 3: The maximum values after normalization should be equal. According to these three major principles, a normalization method for positive indicators is screened out from several alternatives, and a new normalization method for negative indicators is proposed. These two methods are very good for the comparison among panel data,. The requirement for data normalization methods is different when the evaluation goals are different, Ranking-order-based evaluation is insensitive to data normalization methods.
TL;DR: This paper proposes an approach to compute and evaluate view-normalized body part trajectories of pedestrians from monocular video sequences that is fully automatic as it requires neither manual initialization nor camera calibration.
TL;DR: A polynomial-time algorithm is derived for computing the distribution for trees around a given tree and it is shown how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in ldquocherriesrdquo of the given tree.
Abstract: The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees Although the distribution of these distances has been investigated for 20 years, an algorithm that is explicitly polynomial time has yet to be described for computing the distribution for trees around a given tree In this paper, we derive a polynomial-time algorithm for this distribution We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in ldquocherriesrdquo of the given tree We also describe how our results can be used to derive normalization constants that are required in a recently proposed maximum likelihood approach to supertree construction
TL;DR: The proper power alignment of the predistorter following an adequate choice of the normalization gain shows a significant improvement in the measured adjacent channel power ratio at the linearized amplifier output.
Abstract: In this paper, a study of the power alignment issue in digital predistorters is presented. The proper alignment is achieved by adjusting the normalization gain used to synthesize the predistortion function. The dependencies of the linearity and power efficiency of the linearized amplifier upon the gain normalization factor are investigated, and it is shown that the efficiency of the linearized amplifier is almost unaffected by variation of the normalization gain. Conversely, the linearity performance of the linearized power amplifier is found to be dependent on the gain normalization factor, as a consequence of the average power variation through the predistorter. Indeed, the proper power alignment of the predistorter following an adequate choice of the normalization gain shows a significant improvement in the measured adjacent channel power ratio at the linearized amplifier output.
TL;DR: This work proposes a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model, and demonstrates parameter estimation in Ising models, deep belief networks and an independent component analysis model of natural scenes.
Abstract: Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives. Here we propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the KL divergence between the data distribution and the distribution produced by running the dynamics for an infinitesimal time. Score matching, minimum velocity learning, and certain forms of contrastive divergence are shown to be special cases of this learning technique. We demonstrate parameter estimation in Ising models, deep belief networks and an independent component analysis model of natural scenes. In the Ising model case, current state of the art techniques are outperformed by at least an order of magnitude in learning time, with lower error in recovered coupling parameters.
TL;DR: In this paper, the effect of RNA degradation on the quantification of the relative expression of nine genes (18S, ACTB, ATUB, B2M, GAPDH, HPRT, POLR2L, PSMB6 and RPLP0) was evaluated.
Abstract: Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) is the gold standard technique for mRNA quantification, but appropriate normalization is required to obtain reliable data Normalization to accurately quantitated RNA has been proposed as the most reliable method for in vivo biopsies However, this approach does not correct differences in RNA integrity In this study, we evaluated the effect of RNA degradation on the quantification of the relative expression of nine genes (18S, ACTB, ATUB, B2M, GAPDH, HPRT, POLR2L, PSMB6 and RPLP0) that cover a wide expression spectrum Our results show that RNA degradation could introduce up to 100% error in gene expression measurements when RT-qPCR data were normalized to total RNA To achieve greater resolution of small differences in transcript levels in degraded samples, we improved this normalization method by developing a corrective algorithm that compensates for the loss of RNA integrity This approach allowed us to achieve higher accuracy, since the average error for quantitative measurements was reduced to 8% Finally, we applied our normalization strategy to the quantification of EGFR, HER2 and HER3 in 104 rectal cancer biopsies Taken together, our data show that normalization of gene expression measurements by taking into account also RNA degradation allows much more reliable sample comparison We developed a new normalization method of RT-qPCR data that compensates for loss of RNA integrity and therefore allows accurate gene expression quantification in human biopsies
TL;DR: The genome alteration detection analysis approach introduced in previous work is extended to a multiple sample model and the copy number component is independent for each sample and uses a sparse Bayesian prior, while the reference hybridization level is not necessarily sparse but identical on all samples.
Abstract: Motivation: The complexity of a large number of recently discovered copy number polymorphisms is much higher than initially thought, thus making it more difficult to detect them in the presence of significant measurement noise. In this scenario, separate normalization and segmentation is prone to lead to many false detections of changes in copy number. New approaches capable of jointly modeling the copy number and the non-copy number (noise) hybridization effects across multiple samples will potentially lead to more accurate results.
Methods: In this article, the genome alteration detection analysis (GADA) approach introduced in our previous work is extended to a multiple sample model. The copy number component is independent for each sample and uses a sparse Bayesian prior, while the reference hybridization level is not necessarily sparse but identical on all samples. The expectation maximization (EM) algorithm used to fit the model iteratively determines whether the observed hybridization levels are more likely due to a copy number variation or to a shared hybridization bias.
Results: The new proposed approach is compared with the currently used strategy of separate normalization followed by independent segmentation of each array. Real microarray data obtained from HapMap samples are randomly partitioned to create different reference sets. Using the new approach, copy number and reference intensity estimates are significantly less variable if the reference set changes; and a higher consistency on copy numbers detected within HapMap family trios is obtained. Finally, the running time to fit the model grows linearly in the number samples and probes.
Availability: http://biron.usc.edu/~piquereg/GADA
Contact:rpique@ieee.org; shahab@chla.usc.edu
Supplementary information:Supplementary data are available at Bioinformatics online.