Top 420 papers published in the topic of Normalization (statistics) in 2011

Showing papers on "Normalization (statistics) published in 2011"

Proceedings Article•

Analysis of i-vector Length Normalization in Speaker Recognition Systems.

[...]

Daniel Garcia-Romero¹, Carol Y. Espy-Wilson¹•Institutions (1)

1 Jan 2011

TL;DR: The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization, which allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions.

...read moreread less

Abstract: We present a method to boost the performance of probabilistic generative models that work with i-vector representations. The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization. This non-linear transformation allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions. Significant performance improvements are demonstrated on the telephone portion of NIST SRE 2010.

...read moreread less

1,182 citations

Journal Article•10.1186/1471-2105-12-480•

GC-content normalization for RNA-Seq data.

[...]

Davide Risso¹, Katja Schwartz², Gavin Sherlock², Sandrine Dudoit³•Institutions (3)

University of Padua¹, Stanford University², University of California, Berkeley³

17 Dec 2011-BMC Bioinformatics

TL;DR: The authors' within-lane normalization procedures, followed by between-lanenormalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression.

...read moreread less

Abstract: Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.

...read moreread less

941 citations

Journal Article•10.1038/NN.2815•

A normalization model of multisensory integration

[...]

Tomokazu Ohshiro¹, Dora E. Angelaki², Gregory C. DeAngelis¹•Institutions (2)

University of Rochester¹, Washington University in St. Louis²

01 Jun 2011-Nature Neuroscience

TL;DR: This model, which uses a simple functional operation (normalization) for which there is considerable experimental support, also accounts for the recent observation that the mathematical rule by which multisensory neurons combine their inputs changes with cue reliability.

...read moreread less

Abstract: Responses of neurons that integrate multiple sensory inputs are traditionally characterized in terms of a set of empirical principles. However, a simple computational framework that accounts for these empirical features of multisensory integration has not been established. We propose that divisive normalization, acting at the stage of multisensory integration, can account for many of the empirical principles of multisensory integration shown by single neurons, such as the principle of inverse effectiveness and the spatial principle. This model, which uses a simple functional operation (normalization) for which there is considerable experimental support, also accounts for the recent observation that the mathematical rule by which multisensory neurons combine their inputs changes with cue reliability. The normalization model, which makes a strong testable prediction regarding cross-modal suppression, may therefore provide a simple unifying computational account of the important features of multisensory integration by neurons.

...read moreread less

343 citations

Journal Article•10.1523/JNEUROSCI.1237-11.2011•

Reward Value-Based Gain Control: Divisive Normalization in Parietal Cortex

[...]

Kenway Louie¹, Lauren E. Grattan², Paul W. Glimcher•Institutions (2)

Center for Neural Science¹, New York University²

20 Jul 2011-The Journal of Neuroscience

TL;DR: It is shown that neurons in the monkey lateral intraparietal cortex encode a relative form of saccadic value, explicitly dependent on the values of the other available alternatives, which provides a possible mechanistic basis for behavioral context-dependent violations of rationality.

...read moreread less

Abstract: The representation of value is a critical component of decision making. Rational choice theory assumes that options are assigned absolute values, independent of the value or existence of other alternatives. However, context-dependent choice behavior in both animals and humans violates this assumption, suggesting that biological decision processes rely on comparative evaluation. Here we show that neurons in the monkey lateral intraparietal cortex encode a relative form of saccadic value, explicitly dependent on the values of the other available alternatives. Analogous to extra-classical receptive field effects in visual cortex, this relative representation incorporates target values outside the response field and is observed in both stimulus-driven activity and baseline firing rates. This context-dependent modulation is precisely described by divisive normalization, indicating that this standard form of sensory gain control may be a general mechanism of cortical computation. Such normalization in decision circuits effectively implements an adaptive gain control for value coding and provides a possible mechanistic basis for behavioral context-dependent violations of rationality.

...read moreread less

333 citations

Posted Content•

Normalized Mutual Information to evaluate overlapping community finding algorithms

[...]

Aaron McDaid, Derek Greene, Neil Hurley

11 Oct 2011-arXiv: Physics and Society

TL;DR: A measure based on normalized mutual information, which demonstrates unintuitive behaviour of this measure, and shows how this can be corrected by using a more conventional normalization, is demonstrated.

...read moreread less

Abstract: Given the increasing popularity of algorithms for overlapping clustering, in particular in social network analysis, quantitative measures are needed to measure the accuracy of a method. Given a set of true clusters, and the set of clusters found by an algorithm, these sets of clusters must be compared to see how similar or different the sets are. A normalized measure is desirable in many contexts, for example assigning a value of 0 where the two sets are totally dissimilar, and 1 where they are identical. A measure based on normalized mutual information, [1], has recently become popular. We demonstrate unintuitive behaviour of this measure, and show how this can be corrected by using a more conventional normalization. We compare the results to that of other measures, such as the Omega index [2].

...read moreread less

332 citations

Journal Article•10.7763/IJCTE.2011.V3.288•

Statistical Normalization and Back Propagationfor Classification

[...]

T. Jayalakshmi, A. Santhakumaran

01 Jan 2011-International Journal of Computer Theory and Engineering

298 citations

Journal Article•10.1109/LSP.2011.2158998•

Illumination Normalization Based on Weber's Law With Application to Face Recognition

[...]

Biao Wang¹, Weifeng Li¹, Wenming Yang¹, Qingmin Liao¹•Institutions (1)

Tsinghua University¹

09 Jun 2011-IEEE Signal Processing Letters

TL;DR: A novel illumination insensitive representation of face images under varying illuminations is exploited via a ratio image, called “Weber-face,” where a ratio between local intensity variation and the background is computed.

...read moreread less

Abstract: Weber's law suggests that for a stimulus, the ratio between the smallest perceptual change and the background is a constant, which implies stimuli are perceived not in absolute terms but in relative terms. Inspired from this, we exploit and analyze a novel illumination insensitive representation of face images under varying illuminations via a ratio image, called “Weber-face,” where a ratio between local intensity variation and the background is computed. Experimental results on both CMU-PIE and Yale B face databases show that Weber-face performs better than the existing representative approaches.

...read moreread less

273 citations

Journal Article•10.1007/S00216-011-4929-Z•

Normalization in MALDI-TOF imaging datasets of proteins: practical considerations

[...]

Sören-Oliver Deininger, Dale S. Cornett¹, Rainer Paape, Michael Becker, Charles Pineau², Sandra Rauser, Axel Walch, Eryk Wolski - Show less +4 more•Institutions (2)

Bruker¹, French Institute of Health and Medical Research²

12 Apr 2011-Analytical and Bioanalytical Chemistry

TL;DR: This work investigated whether normalization can be improved if dominant signals are excluded from the calculation and found two alternatives: normalization on the spectra noise level or on the median of signal intensities in the spectrum to be significantly more robust against artifact generation.

...read moreread less

Abstract: Normalization is critically important for the proper interpretation of matrix-assisted laser desorption/ionization (MALDI) imaging datasets. The effects of the commonly used normalization techniques based on total ion count (TIC) or vector norm normalization are significant, and they are frequently beneficial. In certain cases, however, these normalization algorithms may produce misleading results and possibly lead to wrong conclusions, e.g. regarding to potential biomarker distributions. This is typical for tissues in which signals of prominent abundance are present in confined areas, such as insulin in the pancreas or β-amyloid peptides in the brain. In this work, we investigated whether normalization can be improved if dominant signals are excluded from the calculation. Because manual interaction with the data (e.g., defining the abundant signals) is not desired for routine analysis, we investigated two alternatives: normalization on the spectra noise level or on the median of signal intensities in the spectrum. Normalization on the median and the noise level was found to be significantly more robust against artifact generation compared to normalization on the TIC. Therefore, we propose to include these normalization methods in the standard “toolbox” of MALDI imaging for reliable results under conditions of automation.

...read moreread less

225 citations

Journal Article•10.1016/J.PROENV.2011.12.040•

A method of SVM with Normalization in Intrusion Detection

[...]

Weijun li¹, Zhenyu Liu²•Institutions (2)

Guangdong University of Technology¹, South China University of Technology²

01 Jan 2011-Procedia environmental sciences

TL;DR: Experiments results show that the method using SVM with normalization has much better performance compared to the method use SVM without normalization in classing intrusion data of KDD99 and Min-Max Normalization has better performance in speed, accuracy of cross validation and quantity of support vectors than other normalization methods.

...read moreread less

Abstract: Network intrusion is always hidden in a mass of routine data and the differences between these data are very large. Normalization can help to speed up the learning phase and avoiding numerical problems such as precision loss from arithmetic overflows. Some normalization methods are analyzed and simulated. Experiments results show that the method using SVM with normalization has much better performance compared to the method using SVM without normalization in classing intrusion data of KDD99 and Min-Max Normalization has better performance in speed, accuracy of cross validation and quantity of support vectors than other normalization methods.

...read moreread less

205 citations

Proceedings Article•

An Investigation of Depressed Speech Detection: Features and Normalization.

[...]

Nicholas Cummins¹, Julien Epps¹, Michael Breakspear¹, Roland Goecke•Institutions (1)

University of New South Wales¹

27 Aug 2011

TL;DR: Questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speaker-specific nature of mental disorders are addressed empirically using classifier configurations employed in emotion recognition from speech.

...read moreread less

Abstract: In recent years, the problem of automatic detection of mental illness from the speech signal has gained some initial interest, however questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speaker-specific nature of mental disorders. In this paper, these questions are addressed empirically using classifier configurations employed in emotion recognition from speech, evaluated on a 47-speaker depressed/neutral read sentence speech database. Results demonstrate that (1) detailed spectral features are well suited to the task, (2) speaker normalization provides benefits mainly for less detailed features, and (3) dynamic information appears to provide little benefit. Classification accuracy using a combination of MFCC and formant based features approached 80% for this database.

...read moreread less

197 citations

Proceedings Article•10.1145/2063576.2063584•

Lower-bounding term frequency normalization

[...]

Yuanhua Lv¹, ChengXiang Zhai¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

24 Oct 2011

TL;DR: This paper proposes a general and efficient method to introduce a sufficiently large lower bound for TF normalization which can be shown analytically to fix or alleviate the problem of very long documents being overly penalized.

...read moreread less

Abstract: In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is not lower-bounded properly; as a result, very long documents tend to be overly penalized. In order to analytically diagnose this problem, we propose two desirable formal constraints to capture the heuristic of lower-bounding TF, and use constraint analysis to examine several representative retrieval functions. Analysis results show that all these retrieval functions can only satisfy the constraints for a certain range of parameter values and/or for a particular set of query terms. Empirical results further show that the retrieval performance tends to be poor when the parameter is out of the range or the query term is not in the particular set. To solve this common problem, we propose a general and efficient method to introduce a sufficiently large lower bound for TF normalization which can be shown analytically to fix or alleviate the problem. Our experimental results demonstrate that the proposed method, incurring almost no additional computational cost, can be applied to state-of-the-art retrieval functions, such as Okapi BM25, language models, and the divergence from randomness approach, to significantly improve the average precision, especially for verbose queries.

...read moreread less

Journal Article•10.1371/JOURNAL.PONE.0017762•

Robust RT-qPCR Data Normalization: Validation and Selection of Internal Reference Genes during Post-Experimental Data Analysis

[...]

Daijun Ling¹, Paul M. Salvaterra¹•Institutions (1)

Beckman Research Institute¹

15 Mar 2011-PLOS ONE

TL;DR: This work measured the expression of 20 candidate reference genes and 7 target genes in 15 Drosophila head cDNA samples using RT-qPCR to establish a method for determination of the most stable normalizing factor (NF) across samples for robust data normalization.

...read moreread less

Abstract: Reverse transcription and real-time PCR (RT-qPCR) has been widely used for rapid quantification of relative gene expression. To offset technical confounding variations, stably-expressed internal reference genes are measured simultaneously along with target genes for data normalization. Statistic methods have been developed for reference validation; however normalization of RT-qPCR data still remains arbitrary due to pre-experimental determination of particular reference genes. To establish a method for determination of the most stable normalizing factor (NF) across samples for robust data normalization, we measured the expression of 20 candidate reference genes and 7 target genes in 15 Drosophila head cDNA samples using RT-qPCR. The 20 reference genes exhibit sample-specific variation in their expression stability. Unexpectedly the NF variation across samples does not exhibit a continuous decrease with pairwise inclusion of more reference genes, suggesting that either too few or too many reference genes may detriment the robustness of data normalization. The optimal number of reference genes predicted by the minimal and most stable NF variation differs greatly from 1 to more than 10 based on particular sample sets. We also found that GstD1, InR and Hsp70 expression exhibits an age-dependent increase in fly heads; however their relative expression levels are significantly affected by NF using different numbers of reference genes. Due to highly dependent on actual data, RT-qPCR reference genes thus have to be validated and selected at post-experimental data analysis stage rather than by pre-experimental determination.

...read moreread less

Journal Article•10.1016/J.CSDA.2010.12.012•

Hyper least squares fitting of circles and ellipses

[...]

Kenichi Kanatani¹, Prasanna Rangarajan²•Institutions (2)

Okayama University¹, Southern Methodist University²

01 Jun 2011-Computational Statistics & Data Analysis

TL;DR: This work extends the circle fitting method of Rangarajan and Kanatani (2009) to accommodate ellipse fitting and relies on algebraic distance minimization with a carefully chosen scale normalization to derive an estimator far superior to the standard LS and slightly better than the Taubin estimator.

...read moreread less

Journal Article•10.1186/1471-2105-12-250•

NORMA-Gene: a simple and robust method for qPCR normalization based on target gene data.

[...]

Lars-Henrik Heckmann¹, Peter Sørensen¹, Paul Henning Krogh¹, Jesper Givskov Sørensen¹•Institutions (1)

Aarhus University¹

21 Jun 2011-BMC Bioinformatics

TL;DR: The NORMA-Gene algorithm is presented, which is based on a data-driven normalization and is useful for as little as five target genes comprising the data-set, allowing researchers to focus their efforts on studying target genes of biological relevance.

...read moreread less

Abstract: Normalization of target gene expression, measured by real-time quantitative PCR (qPCR), is a requirement for reducing experimental bias and thereby improving data quality. The currently used normalization approach is based on using one or more reference genes. Yet, this approach extends the experimental work load and suffers from assumptions that may be difficult to meet and to validate. We developed a data driven normalization algorithm (NORMA-Gene). An analysis of the performance of NORMA-Gene compared to reference gene normalization on artificially generated data-sets showed that the NORMA-Gene normalization yielded more precise results under a large range of parameters tested. Furthermore, when tested on three very different real qPCR data-sets NORMA-Gene was shown to be best at reducing variance due to experimental bias in all three data-sets compared to normalization based on the use of reference gene(s). Here we present the NORMA-Gene algorithm that is applicable to all biological and biomedical qPCR studies, especially those that are based on a limited number of assayed genes. The method is based on a data-driven normalization and is useful for as little as five target genes comprising the data-set. NORMA-Gene does not require the identification and validation of reference genes allowing researchers to focus their efforts on studying target genes of biological relevance.

...read moreread less

Journal Article•10.1186/1755-8794-4-84•

Batch effect correction for genome-wide methylation data with Illumina Infinium platform

[...]

Zhifu Sun¹, High Seng Chai¹, Yanhong Wu¹, Wendy M. White¹, Krishna Vanaja Donkena¹, Christopher J. Klein¹, Vesna D. Garovic¹, Terry M. Therneau¹, Jean-Pierre A. Kocher¹ - Show less +5 more•Institutions (1)

Mayo Clinic¹

16 Dec 2011-BMC Medical Genomics

TL;DR: Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions, and EB correction along with normalization is recommended for effective batch effect removal.

...read moreread less

Abstract: Genome-wide methylation profiling has led to more comprehensive insights into gene regulation mechanisms and potential therapeutic targets. Illumina Human Methylation BeadChip is one of the most commonly used genome-wide methylation platforms. Similar to other microarray experiments, methylation data is susceptible to various technical artifacts, particularly batch effects. To date, little attention has been given to issues related to normalization and batch effect correction for this kind of data. We evaluated three common normalization approaches and investigated their performance in batch effect removal using three datasets with different degrees of batch effects generated from HumanMethylation27 platform: quantile normalization at average β value (QNβ); two step quantile normalization at probe signals implemented in "lumi" package of R (lumi); and quantile normalization of A and B signal separately (ABnorm). Subsequent Empirical Bayes (EB) batch adjustment was also evaluated. Each normalization could remove a portion of batch effects and their effectiveness differed depending on the severity of batch effects in a dataset. For the dataset with minor batch effects (Dataset 1), normalization alone appeared adequate and "lumi" showed the best performance. However, all methods left substantial batch effects intact in the datasets with obvious batch effects and further correction was necessary. Without any correction, 50 and 66 percent of CpGs were associated with batch effects in Dataset 2 and 3, respectively. After QNβ, lumi or ABnorm, the number of CpGs associated with batch effects were reduced to 24, 32, and 26 percent for Dataset 2; and 37, 46, and 35 percent for Dataset 3, respectively. Additional EB correction effectively removed such remaining non-biological effects. More importantly, the two-step procedure almost tripled the numbers of CpGs associated with the outcome of interest for the two datasets. Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions. Normalization can reduce part but not all batch effects. EB correction along with normalization is recommended for effective batch effect removal.

...read moreread less

Journal Article•10.1186/1465-6906-12-S1-P17•

Metastats: an improved statistical method for analysis of metagenomic data

[...]

Joseph N. Paulson¹, Mihai Pop¹, Héctor Corrada Bravo¹•Institutions (1)

University of Maryland, College Park¹

19 Sep 2011-Genome Biology

TL;DR: New approaches for data normalization that allow a more accurate assessment of differential abundance by reducing the covariance between individual features implicitly introduced by the traditionally used ratio-based normalization are described.

...read moreread less

Abstract: Metagenomic studies were originally focused on exploratory/validation projects but are rapidly being applied in a clinical setting. In this setting, researchers are interested in finding characteristics of the microbiome that correlate with the clinical status of the corresponding sample. Comparatively few computational/statistical tools have been developed that can assist in this process. Rather, most developments in the metagenomics community have focused on methods that compare samples as a whole. Specifically, the focus has been on developing robust methods for determining the level of similarity or difference between samples, rather than on identifying the specific characteristics that distinguish different samples from each other. Metastats [1] was the first statistical method developed specifically to address the questions asked in clinical studies. Metastats allows a comparison of metagenomic samples (represented as counts of individual features such as organisms, genes and functional groups) from two treatment populations (for example, healthy versus disease) and identifies those features that statistically distinguish the two populations. Here, we present major improvements to the Metastats software and the underlying statistical methods. First, we describe new approaches for data normalization that allow a more accurate assessment of differential abundance by reducing the covariance between individual features implicitly introduced by the traditionally used ratio-based normalization. These normalization techniques are also of interest for time-series analyses or in the estimation of microbial networks. A second extension of Metastats is a mixed-model zero-inflated Gaussian distribution that allows Metastats to account for a common characteristic of metagenomic data: the presence of many features with zero counts owing to undersampling of the community. The number of ‘missing features’ (zero counts) correlates with the amount of sequencing performed, thereby biasing abundance measurements and the differential abundance statistics derived from them. Using simulated and real data, we show that these methods significantly improve the accuracy of Metastats. We also describe the addition of several new statistical tests to our code (including presence/absence and the corresponding odds ratio, and penetrance calculations) that improve the usability of our software in clinical practice.

...read moreread less

Journal Article•10.1128/AEM.05491-11•

Evaluation of subsampling-based normalization strategies for tagged high-throughput sequencing data sets from gut microbiomes

[...]

Daniel Aguirre de Cárcer¹, Stuart E. Denman¹, Christopher S. McSweeney¹, Mark Morrison¹•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

15 Dec 2011-Applied and Environmental Microbiology

TL;DR: Several subsampling-based normalization strategies were applied to different high-throughput sequencing data sets originating from human and murine gut environments and their effects on the data sets' characteristics and normalization efficiencies, as measured by several β-diversity metrics were compared.

...read moreread less

Abstract: Several subsampling-based normalization strategies were applied to different high-throughput sequencing data sets originating from human and murine gut environments. Their effects on the data sets' characteristics and normalization efficiencies, as measured by several β-diversity metrics, were compared. For both data sets, subsampling to the median rather than the minimum number appeared to improve the analysis.

...read moreread less

Journal Article•10.1186/1471-2105-12-467•

Empirical comparison of cross-platform normalization methods for gene expression data

[...]

Jason Rudy¹, Faramarz Valafar¹•Institutions (1)

San Diego State University¹

07 Dec 2011-BMC Bioinformatics

TL;DR: Of the four successful methods, XPN generally shows the highest inter-platform concordance when treatment groups are equally sized, while DWD is most robust to differently sized treatment groups and consistently shows the smallest loss in gene detection.

...read moreread less

Abstract: Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the combination and comparison of data sets. Researchers wishing to perform cross platform normalization face two major obstacles. First, a choice must be made about which method or methods to employ. Nine are currently available, and no rigorous comparison exists. Second, software for the selected method must be obtained and incorporated into a data analysis workflow.

...read moreread less

Journal Article•10.1002/ASI.21424•

The source normalized impact per paper is a valid and sophisticated indicator of journal citation impact

[...]

Henk F. Moed¹•Institutions (1)

Elsevier¹

01 Jan 2011-Journal of the Association for Information Science and Technology

TL;DR: In this article, a reply to the article ''Scopus's Source Normalized Impact per Paper (SNIP) versus a Journal Impact Factor based on Fractional Counting of Citations\", published by Loet Leydesdorff and Tobias Opthof (arXiv:1004.3580v2 [cs.DL]).

...read moreread less

Abstract: This paper is a reply to the article \"Scopus's Source Normalized Impact per Paper (SNIP) versus a Journal Impact Factor based on Fractional Counting of Citations\", published by Loet Leydesdorff and Tobias Opthof (arXiv:1004.3580v2 [cs.DL]). It clarifies the relationship between SNIP and Elsevier's Scopus. Since Leydesdorff and Opthof's description of SNIP is not complete, it indicates four key differences between SNIP and the indicator proposed by the two authors, and argues why the former is more valid than the latter. Nevertheless, the idea of fractional citation counting deserves further exploration. The paper discusses difficulties that arise if one attempts to apply this principle at the level of individual (citing) papers.

...read moreread less

Journal Article•10.1002/PMIC.201100078•

A Statistical Selection Strategy for Normalization Procedures in LC-MS Proteomics Experiments through Dataset Dependent Ranking of Normalization Scaling Factors

[...]

Bobbie-Jo M. Webb-Robertson¹, Melissa M. Matzke¹, Jon M. Jacobs¹, Joel G. Pounds¹, Katrina M. Waters¹ - Show less +1 more•Institutions (1)

Pacific Northwest National Laboratory¹

01 Dec 2011-Proteomics

TL;DR: A novel approach is presented to evaluate normalization strategies, which includes the peptide selection component associated with the derivation of normalization values, which improves the structure of the data without introducing bias into the normalized peak intensities.

...read moreread less

Abstract: Quantification of LC-MS peak intensities assigned during peptide identification in a typical comparative proteomics experiment will deviate from run-to-run of the instrument due to both technical and biological variation. Thus, normalization of peak intensities across an LC-MS proteomics dataset is a fundamental step in pre-processing. However, the downstream analysis of LC-MS proteomics data can be dramatically affected by the normalization method selected. Current normalization procedures for LC-MS proteomics data are presented in the context of normalization values derived from subsets of the full collection of identified peptides. The distribution of these normalization values is unknown a priori. If they are not independent from the biological factors associated with the experiment the normalization process can introduce bias into the data, possibly affecting downstream statistical biomarker discovery. We present a novel approach to evaluate normalization strategies, which includes the peptide selection component associated with the derivation of normalization values. Our approach evaluates the effect of normalization on the between-group variance structure in order to identify the most appropriate normalization methods that improve the structure of the data without introducing bias into the normalized peak intensities.

...read moreread less

Journal Article•10.1016/J.ISPRSJPRS.2011.03.003•

Pre-processing of a sample of multi-scene and multi-date Landsat imagery used to monitor forest cover changes over the tropics

[...]

Catherine Bodart, Hugh Eva, René Beuchle, Rastislav Raši, Dario Simonetti, Hans-Jürgen Stibig, Andreas Brink, Erik Lindquist¹, Frédéric Achard - Show less +5 more•Institutions (1)

Food and Agriculture Organization¹

01 Sep 2011-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: In this article, the TREES-3 project has processed more than 12,000 Landsat TM and ETM+ data subsets systematically distributed over the tropics, and the results show that the haze correction algorithm has improved the visual appearance of the image and significantly corrected the digital numbers for the red band.

...read moreread less

Abstract: In support to the Remote Sensing Survey of the global Forest Resource Assessment 2010, the TREES-3 project has processed more than 12,000 Landsat TM and ETM+ data subsets systematically distributed over the tropics. The project aims at deriving area estimates of tropical forest cover change for the periods 1990–2000–2005. The paper presents the pre-processing steps applied in an operational and robust manner to this large amount of multi-date and multi-scene imagery: conversion to top-of-atmosphere reflectance, cloud and cloud shadow detection, haze correction and image radiometric normalization. The results show that the haze correction algorithm has improved the visual appearance of the image and significantly corrected the digital numbers for Landsat visible bands, especially the red band. The impact of the normalization procedures (forest normalization and relative normalization) was assessed on 210 image pairs: in all cases the correlation between the spectral values of the same land cover in both images was improved. The developed automatic pre-processing chain provided a consistent multi-temporal data set across the tropics that will constitute the basis for an automatic object-based supervised classification.

...read moreread less

Journal Article•10.1002/ASI.21511•

How to evaluate universities in terms of their relative citation impacts: Fractional counting of citations and the normalization of differences among disciplines

[...]

Loet Leydesdorff¹, Jung Cheol Shin²•Institutions (2)

University of Amsterdam¹, Seoul National University²

01 Jun 2011-Journal of the Association for Information Science and Technology

TL;DR: Using publication and citation data of seven Korean research universities, the advantages and the differences in the rankings are demonstrated, the possible statistics are explained, and ways to visualize the Differences in (citing) audiences in terms of a network are suggested.

...read moreread less

Abstract: Fractional counting of citations can improve on ranking of multidisciplinary research units (such as universities) by normalizing the differences among fields of science in terms of differences in citation behavior. Furthermore, normalization in terms of citing papers abolishes the unsolved questions in scientometrics about the delineation of fields of science in terms of journals and normalization when comparing among different (sets of) journals. Using publication and citation data of seven Korean research universities, we demonstrate the advantages and the differences in the rankings, explain the possible statistics, and suggest ways to visualize the differences in (citing) audiences in terms of a network. © 2011 Wiley Periodicals, Inc.

...read moreread less

Journal Article•10.1016/J.JBIOMECH.2010.09.015•

Methods to temporally align gait cycle data

[...]

Nathaniel E. Helwig¹, Sungjin Hong¹, Elizabeth T. Hsiao-Wecksler¹, John D. Polk¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

03 Feb 2011-Journal of Biomechanics

TL;DR: It is demonstrated that piecewise temporal alignment techniques outperform other commonly used alignment methods (normalization to percent gait cycle, dynamic time warping, and derivative dynamic time Warping) in typical biomechanical and clinical alignment tasks.

...read moreread less

Journal Article•10.5194/HESS-15-1387-2011•

An objective approach for feature extraction: distribution analysis and statistical descriptors for scale choice and channel network identification

[...]

Giulia Sofia¹, Paolo Tarolli¹, Federico Cazorzi², G. Dalla Fontana¹•Institutions (2)

University of Padua¹, University of Udine²

06 May 2011-Hydrology and Earth System Sciences

TL;DR: The advantage of the proposed methodology, and the efficiency and accurate localization of extracted features are demonstrated using LiDAR data of two different areas and comparing both extractions with field surveyed networks.

...read moreread less

Abstract: . A statistical approach to LiDAR derived topographic attributes for the automatic extraction of channel network and for the choice of the scale to apply for parameter evaluation is presented in this paper. The basis of this approach is to use distribution analysis and statistical descriptors to identify channels where terrain geometry denotes significant convergences. Two case study areas with different morphology and degree of organization are used with their 1 m LiDAR Digital Terrain Models (DTMs). Topographic attribute maps (curvature and openness) for various window sizes are derived from the DTMs in order to detect surface convergences. A statistical analysis on value distributions considering each window size is carried out for the choice of the optimum kernel. We propose a three-step method to extract the network based (a) on the normalization and overlapping of openness and minimum curvature to highlight the more likely surface convergences, (b) a weighting of the upslope area according to these normalized maps to identify drainage flow paths and flow accumulation consistent with terrain geometry, (c) the standard score normalization of the weighted upslope area and the use of standard score values as non subjective threshold for channel network identification. As a final step for optimal definition and representation of the whole network, a noise-filtering and connection procedure is applied. The advantage of the proposed methodology, and the efficiency and accurate localization of extracted features are demonstrated using LiDAR data of two different areas and comparing both extractions with field surveyed networks.

...read moreread less

Journal Article•10.1186/1471-2105-12-405•

An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data.

[...]

Trung Nghia Vu¹, Dirk Valkenborg², Dirk Valkenborg³, Koen Smets¹, Kim A. Verwaest¹, Roger Dommisse¹, Filip Lemière¹, Alain Verschoren¹, Bart Goethals¹, Kris Laukens¹ - Show less +6 more•Institutions (3)

University of Antwerp¹, Flemish Institute for Technological Research², University of Hasselt³

20 Oct 2011-BMC Bioinformatics

TL;DR: A novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data is introduced, embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation").

...read moreread less

Abstract: Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/ .

...read moreread less

Journal Article•10.1007/S13361-011-0237-2•

Evaluation of normalization methods on GeLC-MS/MS label-free spectral counting data to correct for variation during proteomic workflows.

[...]

Emine Gokce¹, Christopher M. Shuford¹, William L. Franck¹, Ralph A. Dean¹, David C. Muddiman¹ - Show less +1 more•Institutions (1)

North Carolina State University¹

24 Sep 2011-Journal of the American Society for Mass Spectrometry

TL;DR: The correlation between SpCs of the same proteins across the different data sets was investigated and it was reported that TSpC normalization and NSAF normalization yielded almost ideal slopes of unity for normalized SpC versus average normalized SpCs, while NSP did not afford effective corrections of the unnormalized data.

...read moreread less

Journal Article•10.1039/C1JA10194C•

A simplified spectrum standardization method for laser-induced breakdown spectroscopy measurements

[...]

Lizhi Li¹, Zhe Wang¹, Tingbi Yuan¹, Zongyu Hou¹, Zheng Li¹, Weidou Ni¹ - Show less +2 more•Institutions (1)

Tsinghua University¹

01 Nov 2011-Journal of Analytical Atomic Spectrometry

TL;DR: In this paper, the Taylor expansion was applied near the standard plasma condition to obtain the standard state value of the characteristic line intensity from theory, and the results showed that measurement precision and accuracy can be greatly improved by the application of this normalization method in measuring the Cu concentration for 29 brass alloy samples.

...read moreread less

Abstract: Relatively high uncertainty (or low repeatability) is one of the main bottlenecks for wide application of LIBS quantitative measurements. The change of plasma temperature and electron number density from pulse to pulse weakens the correlation between the ablation mass and total or part of the spectral area for the same sample, making the normally applied normalization method not effective enough for uncertainty reduction. In the present work, it was assumed that there existed a standard state for samples with similar matrix, where there is a standard plasma temperature, electron number density, and total number density of the element of interest. Therefore, Taylor expansion can be applied near the standard plasma condition to obtain the standard state value of the characteristic line intensity from theory. The temperature variation was regarded to be proportional to the variation of the logarithm of the ratio of two spectral line intensities of the interested element, the variation of electron number density was regarded to be proportional to the variation of the full width at half maximum (FWHM), and the variation of total number density was regarded to be proportional to the variation of the sum of the multiple spectral line intensities of the measured element. Based on these assumptions, the calibration model was established. The results show that measurement precision and accuracy can be greatly improved by the application of this normalization method in measuring the Cu concentration for 29 brass alloy samples. The average relative standard deviation (RSD) value, the coefficient of determination (R2), the root mean square error of prediction (RMSEP), and average value of the maximum relative error were 2.92%, 0.99, 1.46%, 8.42%, respectively, while the values for normalization with the whole spectrum area were: 8.61%, 0.95, 3.28%, 29.19%, respectively, showing significant improvement.

...read moreread less

Proceedings Article•

Minimum Probability Flow Learning

[...]

Jascha Sohl-Dickstein¹, Peter Battaglino¹, Michael R. DeWeese¹•Institutions (1)

University of California, Berkeley¹

28 Jun 2011

TL;DR: In this paper, the authors propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model, which is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the KL divergence between the data distribution and the distribution produced by running the dynamics for an infinitesimal time.

...read moreread less

Abstract: Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives. Here we propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the KL divergence between the data distribution and the distribution produced by running the dynamics for an infinitesimal time. Score matching, minimum velocity learning, and certain forms of contrastive divergence are shown to be special cases of this learning technique. We demonstrate parameter estimation in Ising models, deep belief networks and an independent component analysis model of natural scenes. In the Ising model case, current state of the art techniques are outperformed by at least an order of magnitude in learning time, with lower error in recovered coupling parameters.

...read moreread less

Journal Article•10.1007/S10107-009-0300-Y•

On the separation of disjunctive cuts

[...]

Matteo Fischetti¹, Andrea Lodi², Andrea Tramontani²•Institutions (2)

University of Padua¹, University of Bologna²

01 Jun 2011-Mathematical Programming

TL;DR: This paper investigates the main ingredients of a disjunctive cut separation procedure, and analyzes their impact on the quality of the root-node bound for a set of instances taken from MIPLIB library.

...read moreread less

Abstract: Disjunctive cuts for Mixed-Integer Linear Programs (MIPs) were introduced by Egon Balas in the late 1970s and have been successfully exploited in practice since the late 1990s. In this paper we investigate the main ingredients of a disjunctive cut separation procedure, and analyze their impact on the quality of the root-node bound for a set of instances taken from MIPLIB library. We compare alternative normalization conditions, and try to better understand their role. In particular, we point out that constraints that become redundant (because of the disjunction used) can produce over-weak cuts, and analyze this property with respect to the normalization used. Finally, we introduce a new normalization condition and analyze its theoretical properties and computational behavior. Along the way, we make use of a number of small numerical examples to illustrate some basic (and often misinterpreted) disjunctive programming features.

...read moreread less

Journal Article•10.1016/J.BIOPSYCH.2010.05.023•

Changed relative to what? Housekeeping genes and normalization strategies in human brain gene expression studies

[...]

Elizabeth M. Tunbridge¹, Sharon L. Eastwood¹, Paul Harrison¹•Institutions (1)

University of Oxford¹

15 Jan 2011-Biological Psychiatry

TL;DR: The rationales for normalization are reviewed and it is argued that in well-conducted psychiatric gene expression studies using human brain tissue, it is reducing intersubject variability rather than experimental error that is the major benefit of normalization.

...read moreread less

...

Expand