Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Normalization (statistics)
  4. 2011
  1. Home
  2. Topics
  3. Normalization (statistics)
  4. 2011
Showing papers on "Normalization (statistics) published in 2011"
Proceedings Article•
Analysis of i-vector Length Normalization in Speaker Recognition Systems.

[...]

Daniel Garcia-Romero1, Carol Y. Espy-Wilson1•
University of Maryland, College Park1
1 Jan 2011
TL;DR: The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization, which allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions.
Abstract: We present a method to boost the performance of probabilistic generative models that work with i-vector representations. The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization. This non-linear transformation allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions. Significant performance improvements are demonstrated on the telephone portion of NIST SRE 2010.

1,182 citations

Journal Article•10.1186/1471-2105-12-480•
GC-content normalization for RNA-Seq data.

[...]

Davide Risso1, Katja Schwartz2, Gavin Sherlock2, Sandrine Dudoit3•
University of Padua1, Stanford University2, University of California, Berkeley3
17 Dec 2011-BMC Bioinformatics
TL;DR: The authors' within-lane normalization procedures, followed by between-lanenormalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression.
Abstract: Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.

941 citations

Journal Article•10.1038/NN.2815•
A normalization model of multisensory integration

[...]

Tomokazu Ohshiro1, Dora E. Angelaki2, Gregory C. DeAngelis1•
University of Rochester1, Washington University in St. Louis2
01 Jun 2011-Nature Neuroscience
TL;DR: This model, which uses a simple functional operation (normalization) for which there is considerable experimental support, also accounts for the recent observation that the mathematical rule by which multisensory neurons combine their inputs changes with cue reliability.
Abstract: Responses of neurons that integrate multiple sensory inputs are traditionally characterized in terms of a set of empirical principles. However, a simple computational framework that accounts for these empirical features of multisensory integration has not been established. We propose that divisive normalization, acting at the stage of multisensory integration, can account for many of the empirical principles of multisensory integration shown by single neurons, such as the principle of inverse effectiveness and the spatial principle. This model, which uses a simple functional operation (normalization) for which there is considerable experimental support, also accounts for the recent observation that the mathematical rule by which multisensory neurons combine their inputs changes with cue reliability. The normalization model, which makes a strong testable prediction regarding cross-modal suppression, may therefore provide a simple unifying computational account of the important features of multisensory integration by neurons.

343 citations

Journal Article•10.1523/JNEUROSCI.1237-11.2011•
Reward Value-Based Gain Control: Divisive Normalization in Parietal Cortex

[...]

Kenway Louie1, Lauren E. Grattan2, Paul W. Glimcher•
Center for Neural Science1, New York University2
20 Jul 2011-The Journal of Neuroscience
TL;DR: It is shown that neurons in the monkey lateral intraparietal cortex encode a relative form of saccadic value, explicitly dependent on the values of the other available alternatives, which provides a possible mechanistic basis for behavioral context-dependent violations of rationality.
Abstract: The representation of value is a critical component of decision making. Rational choice theory assumes that options are assigned absolute values, independent of the value or existence of other alternatives. However, context-dependent choice behavior in both animals and humans violates this assumption, suggesting that biological decision processes rely on comparative evaluation. Here we show that neurons in the monkey lateral intraparietal cortex encode a relative form of saccadic value, explicitly dependent on the values of the other available alternatives. Analogous to extra-classical receptive field effects in visual cortex, this relative representation incorporates target values outside the response field and is observed in both stimulus-driven activity and baseline firing rates. This context-dependent modulation is precisely described by divisive normalization, indicating that this standard form of sensory gain control may be a general mechanism of cortical computation. Such normalization in decision circuits effectively implements an adaptive gain control for value coding and provides a possible mechanistic basis for behavioral context-dependent violations of rationality.

333 citations

Posted Content•
Normalized Mutual Information to evaluate overlapping community finding algorithms

[...]

Aaron McDaid, Derek Greene, Neil Hurley
11 Oct 2011-arXiv: Physics and Society
TL;DR: A measure based on normalized mutual information, which demonstrates unintuitive behaviour of this measure, and shows how this can be corrected by using a more conventional normalization, is demonstrated.
Abstract: Given the increasing popularity of algorithms for overlapping clustering, in particular in social network analysis, quantitative measures are needed to measure the accuracy of a method. Given a set of true clusters, and the set of clusters found by an algorithm, these sets of clusters must be compared to see how similar or different the sets are. A normalized measure is desirable in many contexts, for example assigning a value of 0 where the two sets are totally dissimilar, and 1 where they are identical. A measure based on normalized mutual information, [1], has recently become popular. We demonstrate unintuitive behaviour of this measure, and show how this can be corrected by using a more conventional normalization. We compare the results to that of other measures, such as the Omega index [2].

332 citations

Journal Article•10.7763/IJCTE.2011.V3.288•
Statistical Normalization and Back Propagationfor Classification

[...]

T. Jayalakshmi, A. Santhakumaran
01 Jan 2011-International Journal of Computer Theory and Engineering

298 citations

Journal Article•10.1109/LSP.2011.2158998•
Illumination Normalization Based on Weber's Law With Application to Face Recognition

[...]

Biao Wang1, Weifeng Li1, Wenming Yang1, Qingmin Liao1•
Tsinghua University1
09 Jun 2011-IEEE Signal Processing Letters
TL;DR: A novel illumination insensitive representation of face images under varying illuminations is exploited via a ratio image, called “Weber-face,” where a ratio between local intensity variation and the background is computed.
Abstract: Weber's law suggests that for a stimulus, the ratio between the smallest perceptual change and the background is a constant, which implies stimuli are perceived not in absolute terms but in relative terms. Inspired from this, we exploit and analyze a novel illumination insensitive representation of face images under varying illuminations via a ratio image, called “Weber-face,” where a ratio between local intensity variation and the background is computed. Experimental results on both CMU-PIE and Yale B face databases show that Weber-face performs better than the existing representative approaches.

273 citations

Journal Article•10.1007/S00216-011-4929-Z•
Normalization in MALDI-TOF imaging datasets of proteins: practical considerations

[...]

Sören-Oliver Deininger, Dale S. Cornett1, Rainer Paape, Michael Becker, Charles Pineau2, Sandra Rauser, Axel Walch, Eryk Wolski •
Bruker1, French Institute of Health and Medical Research2
12 Apr 2011-Analytical and Bioanalytical Chemistry
TL;DR: This work investigated whether normalization can be improved if dominant signals are excluded from the calculation and found two alternatives: normalization on the spectra noise level or on the median of signal intensities in the spectrum to be significantly more robust against artifact generation.
Abstract: Normalization is critically important for the proper interpretation of matrix-assisted laser desorption/ionization (MALDI) imaging datasets. The effects of the commonly used normalization techniques based on total ion count (TIC) or vector norm normalization are significant, and they are frequently beneficial. In certain cases, however, these normalization algorithms may produce misleading results and possibly lead to wrong conclusions, e.g. regarding to potential biomarker distributions. This is typical for tissues in which signals of prominent abundance are present in confined areas, such as insulin in the pancreas or β-amyloid peptides in the brain. In this work, we investigated whether normalization can be improved if dominant signals are excluded from the calculation. Because manual interaction with the data (e.g., defining the abundant signals) is not desired for routine analysis, we investigated two alternatives: normalization on the spectra noise level or on the median of signal intensities in the spectrum. Normalization on the median and the noise level was found to be significantly more robust against artifact generation compared to normalization on the TIC. Therefore, we propose to include these normalization methods in the standard “toolbox” of MALDI imaging for reliable results under conditions of automation.

225 citations

Journal Article•10.1016/J.PROENV.2011.12.040•
A method of SVM with Normalization in Intrusion Detection

[...]

Weijun li1, Zhenyu Liu2•
Guangdong University of Technology1, South China University of Technology2
01 Jan 2011-Procedia environmental sciences
TL;DR: Experiments results show that the method using SVM with normalization has much better performance compared to the method use SVM without normalization in classing intrusion data of KDD99 and Min-Max Normalization has better performance in speed, accuracy of cross validation and quantity of support vectors than other normalization methods.
Abstract: Network intrusion is always hidden in a mass of routine data and the differences between these data are very large. Normalization can help to speed up the learning phase and avoiding numerical problems such as precision loss from arithmetic overflows. Some normalization methods are analyzed and simulated. Experiments results show that the method using SVM with normalization has much better performance compared to the method using SVM without normalization in classing intrusion data of KDD99 and Min-Max Normalization has better performance in speed, accuracy of cross validation and quantity of support vectors than other normalization methods.

205 citations

Proceedings Article•
An Investigation of Depressed Speech Detection: Features and Normalization.

[...]

Nicholas Cummins1, Julien Epps1, Michael Breakspear1, Roland Goecke•
University of New South Wales1
27 Aug 2011
TL;DR: Questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speaker-specific nature of mental disorders are addressed empirically using classifier configurations employed in emotion recognition from speech.
Abstract: In recent years, the problem of automatic detection of mental illness from the speech signal has gained some initial interest, however questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speaker-specific nature of mental disorders. In this paper, these questions are addressed empirically using classifier configurations employed in emotion recognition from speech, evaluated on a 47-speaker depressed/neutral read sentence speech database. Results demonstrate that (1) detailed spectral features are well suited to the task, (2) speaker normalization provides benefits mainly for less detailed features, and (3) dynamic information appears to provide little benefit. Classification accuracy using a combination of MFCC and formant based features approached 80% for this database.

197 citations

Proceedings Article•10.1145/2063576.2063584•
Lower-bounding term frequency normalization

[...]

Yuanhua Lv1, ChengXiang Zhai1•
University of Illinois at Urbana–Champaign1
24 Oct 2011
TL;DR: This paper proposes a general and efficient method to introduce a sufficiently large lower bound for TF normalization which can be shown analytically to fix or alleviate the problem of very long documents being overly penalized.
Abstract: In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is not lower-bounded properly; as a result, very long documents tend to be overly penalized. In order to analytically diagnose this problem, we propose two desirable formal constraints to capture the heuristic of lower-bounding TF, and use constraint analysis to examine several representative retrieval functions. Analysis results show that all these retrieval functions can only satisfy the constraints for a certain range of parameter values and/or for a particular set of query terms. Empirical results further show that the retrieval performance tends to be poor when the parameter is out of the range or the query term is not in the particular set. To solve this common problem, we propose a general and efficient method to introduce a sufficiently large lower bound for TF normalization which can be shown analytically to fix or alleviate the problem. Our experimental results demonstrate that the proposed method, incurring almost no additional computational cost, can be applied to state-of-the-art retrieval functions, such as Okapi BM25, language models, and the divergence from randomness approach, to significantly improve the average precision, especially for verbose queries.
Journal Article•10.1371/JOURNAL.PONE.0017762•
Robust RT-qPCR Data Normalization: Validation and Selection of Internal Reference Genes during Post-Experimental Data Analysis

[...]

Daijun Ling1, Paul M. Salvaterra1•
Beckman Research Institute1
15 Mar 2011-PLOS ONE
TL;DR: This work measured the expression of 20 candidate reference genes and 7 target genes in 15 Drosophila head cDNA samples using RT-qPCR to establish a method for determination of the most stable normalizing factor (NF) across samples for robust data normalization.
Abstract: Reverse transcription and real-time PCR (RT-qPCR) has been widely used for rapid quantification of relative gene expression. To offset technical confounding variations, stably-expressed internal reference genes are measured simultaneously along with target genes for data normalization. Statistic methods have been developed for reference validation; however normalization of RT-qPCR data still remains arbitrary due to pre-experimental determination of particular reference genes. To establish a method for determination of the most stable normalizing factor (NF) across samples for robust data normalization, we measured the expression of 20 candidate reference genes and 7 target genes in 15 Drosophila head cDNA samples using RT-qPCR. The 20 reference genes exhibit sample-specific variation in their expression stability. Unexpectedly the NF variation across samples does not exhibit a continuous decrease with pairwise inclusion of more reference genes, suggesting that either too few or too many reference genes may detriment the robustness of data normalization. The optimal number of reference genes predicted by the minimal and most stable NF variation differs greatly from 1 to more than 10 based on particular sample sets. We also found that GstD1, InR and Hsp70 expression exhibits an age-dependent increase in fly heads; however their relative expression levels are significantly affected by NF using different numbers of reference genes. Due to highly dependent on actual data, RT-qPCR reference genes thus have to be validated and selected at post-experimental data analysis stage rather than by pre-experimental determination.
Journal Article•10.1016/J.CSDA.2010.12.012•
Hyper least squares fitting of circles and ellipses

[...]

Kenichi Kanatani1, Prasanna Rangarajan2•
Okayama University1, Southern Methodist University2
01 Jun 2011-Computational Statistics & Data Analysis
TL;DR: This work extends the circle fitting method of Rangarajan and Kanatani (2009) to accommodate ellipse fitting and relies on algebraic distance minimization with a carefully chosen scale normalization to derive an estimator far superior to the standard LS and slightly better than the Taubin estimator.
Journal Article•10.1186/1471-2105-12-250•
NORMA-Gene: a simple and robust method for qPCR normalization based on target gene data.

[...]

Lars-Henrik Heckmann1, Peter Sørensen1, Paul Henning Krogh1, Jesper Givskov Sørensen1•
Aarhus University1
21 Jun 2011-BMC Bioinformatics
TL;DR: The NORMA-Gene algorithm is presented, which is based on a data-driven normalization and is useful for as little as five target genes comprising the data-set, allowing researchers to focus their efforts on studying target genes of biological relevance.
Abstract: Normalization of target gene expression, measured by real-time quantitative PCR (qPCR), is a requirement for reducing experimental bias and thereby improving data quality. The currently used normalization approach is based on using one or more reference genes. Yet, this approach extends the experimental work load and suffers from assumptions that may be difficult to meet and to validate. We developed a data driven normalization algorithm (NORMA-Gene). An analysis of the performance of NORMA-Gene compared to reference gene normalization on artificially generated data-sets showed that the NORMA-Gene normalization yielded more precise results under a large range of parameters tested. Furthermore, when tested on three very different real qPCR data-sets NORMA-Gene was shown to be best at reducing variance due to experimental bias in all three data-sets compared to normalization based on the use of reference gene(s). Here we present the NORMA-Gene algorithm that is applicable to all biological and biomedical qPCR studies, especially those that are based on a limited number of assayed genes. The method is based on a data-driven normalization and is useful for as little as five target genes comprising the data-set. NORMA-Gene does not require the identification and validation of reference genes allowing researchers to focus their efforts on studying target genes of biological relevance.
Journal Article•10.1186/1755-8794-4-84•
Batch effect correction for genome-wide methylation data with Illumina Infinium platform

[...]

Zhifu Sun1, High Seng Chai1, Yanhong Wu1, Wendy M. White1, Krishna Vanaja Donkena1, Christopher J. Klein1, Vesna D. Garovic1, Terry M. Therneau1, Jean-Pierre A. Kocher1 •
Mayo Clinic1
16 Dec 2011-BMC Medical Genomics
TL;DR: Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions, and EB correction along with normalization is recommended for effective batch effect removal.
Abstract: Genome-wide methylation profiling has led to more comprehensive insights into gene regulation mechanisms and potential therapeutic targets. Illumina Human Methylation BeadChip is one of the most commonly used genome-wide methylation platforms. Similar to other microarray experiments, methylation data is susceptible to various technical artifacts, particularly batch effects. To date, little attention has been given to issues related to normalization and batch effect correction for this kind of data. We evaluated three common normalization approaches and investigated their performance in batch effect removal using three datasets with different degrees of batch effects generated from HumanMethylation27 platform: quantile normalization at average β value (QNβ); two step quantile normalization at probe signals implemented in "lumi" package of R (lumi); and quantile normalization of A and B signal separately (ABnorm). Subsequent Empirical Bayes (EB) batch adjustment was also evaluated. Each normalization could remove a portion of batch effects and their effectiveness differed depending on the severity of batch effects in a dataset. For the dataset with minor batch effects (Dataset 1), normalization alone appeared adequate and "lumi" showed the best performance. However, all methods left substantial batch effects intact in the datasets with obvious batch effects and further correction was necessary. Without any correction, 50 and 66 percent of CpGs were associated with batch effects in Dataset 2 and 3, respectively. After QNβ, lumi or ABnorm, the number of CpGs associated with batch effects were reduced to 24, 32, and 26 percent for Dataset 2; and 37, 46, and 35 percent for Dataset 3, respectively. Additional EB correction effectively removed such remaining non-biological effects. More importantly, the two-step procedure almost tripled the numbers of CpGs associated with the outcome of interest for the two datasets. Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions. Normalization can reduce part but not all batch effects. EB correction along with normalization is recommended for effective batch effect removal.
Journal Article•10.1186/1465-6906-12-S1-P17•
Metastats: an improved statistical method for analysis of metagenomic data

[...]

Joseph N. Paulson1, Mihai Pop1, Héctor Corrada Bravo1•
University of Maryland, College Park1
19 Sep 2011-Genome Biology
TL;DR: New approaches for data normalization that allow a more accurate assessment of differential abundance by reducing the covariance between individual features implicitly introduced by the traditionally used ratio-based normalization are described.
Abstract: Metagenomic studies were originally focused on exploratory/validation projects but are rapidly being applied in a clinical setting. In this setting, researchers are interested in finding characteristics of the microbiome that correlate with the clinical status of the corresponding sample. Comparatively few computational/statistical tools have been developed that can assist in this process. Rather, most developments in the metagenomics community have focused on methods that compare samples as a whole. Specifically, the focus has been on developing robust methods for determining the level of similarity or difference between samples, rather than on identifying the specific characteristics that distinguish different samples from each other. Metastats [1] was the first statistical method developed specifically to address the questions asked in clinical studies. Metastats allows a comparison of metagenomic samples (represented as counts of individual features such as organisms, genes and functional groups) from two treatment populations (for example, healthy versus disease) and identifies those features that statistically distinguish the two populations. Here, we present major improvements to the Metastats software and the underlying statistical methods. First, we describe new approaches for data normalization that allow a more accurate assessment of differential abundance by reducing the covariance between individual features implicitly introduced by the traditionally used ratio-based normalization. These normalization techniques are also of interest for time-series analyses or in the estimation of microbial networks. A second extension of Metastats is a mixed-model zero-inflated Gaussian distribution that allows Metastats to account for a common characteristic of metagenomic data: the presence of many features with zero counts owing to undersampling of the community. The number of ‘missing features’ (zero counts) correlates with the amount of sequencing performed, thereby biasing abundance measurements and the differential abundance statistics derived from them. Using simulated and real data, we show that these methods significantly improve the accuracy of Metastats. We also describe the addition of several new statistical tests to our code (including presence/absence and the corresponding odds ratio, and penetrance calculations) that improve the usability of our software in clinical practice.
Journal Article•10.1128/AEM.05491-11•
Evaluation of subsampling-based normalization strategies for tagged high-throughput sequencing data sets from gut microbiomes

[...]

Daniel Aguirre de Cárcer1, Stuart E. Denman1, Christopher S. McSweeney1, Mark Morrison1•
Commonwealth Scientific and Industrial Research Organisation1
15 Dec 2011-Applied and Environmental Microbiology
TL;DR: Several subsampling-based normalization strategies were applied to different high-throughput sequencing data sets originating from human and murine gut environments and their effects on the data sets' characteristics and normalization efficiencies, as measured by several β-diversity metrics were compared.
Abstract: Several subsampling-based normalization strategies were applied to different high-throughput sequencing data sets originating from human and murine gut environments. Their effects on the data sets' characteristics and normalization efficiencies, as measured by several β-diversity metrics, were compared. For both data sets, subsampling to the median rather than the minimum number appeared to improve the analysis.
Journal Article•10.1186/1471-2105-12-467•
Empirical comparison of cross-platform normalization methods for gene expression data

[...]

Jason Rudy1, Faramarz Valafar1•
San Diego State University1
07 Dec 2011-BMC Bioinformatics
TL;DR: Of the four successful methods, XPN generally shows the highest inter-platform concordance when treatment groups are equally sized, while DWD is most robust to differently sized treatment groups and consistently shows the smallest loss in gene detection.
Abstract: Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the combination and comparison of data sets. Researchers wishing to perform cross platform normalization face two major obstacles. First, a choice must be made about which method or methods to employ. Nine are currently available, and no rigorous comparison exists. Second, software for the selected method must be obtained and incorporated into a data analysis workflow.
Journal Article•10.1002/ASI.21424•
The source normalized impact per paper is a valid and sophisticated indicator of journal citation impact

[...]

Henk F. Moed1•
Elsevier1
01 Jan 2011-Journal of the Association for Information Science and Technology
TL;DR: In this article, a reply to the article ''Scopus's Source Normalized Impact per Paper (SNIP) versus a Journal Impact Factor based on Fractional Counting of Citations\", published by Loet Leydesdorff and Tobias Opthof (arXiv:1004.3580v2 [cs.DL]).
Abstract: This paper is a reply to the article \"Scopus's Source Normalized Impact per Paper (SNIP) versus a Journal Impact Factor based on Fractional Counting of Citations\", published by Loet Leydesdorff and Tobias Opthof (arXiv:1004.3580v2 [cs.DL]). It clarifies the relationship between SNIP and Elsevier's Scopus. Since Leydesdorff and Opthof's description of SNIP is not complete, it indicates four key differences between SNIP and the indicator proposed by the two authors, and argues why the former is more valid than the latter. Nevertheless, the idea of fractional citation counting deserves further exploration. The paper discusses difficulties that arise if one attempts to apply this principle at the level of individual (citing) papers.
Journal Article•10.1002/PMIC.201100078•
A Statistical Selection Strategy for Normalization Procedures in LC-MS Proteomics Experiments through Dataset Dependent Ranking of Normalization Scaling Factors

[...]

Bobbie-Jo M. Webb-Robertson1, Melissa M. Matzke1, Jon M. Jacobs1, Joel G. Pounds1, Katrina M. Waters1 •
Pacific Northwest National Laboratory1
01 Dec 2011-Proteomics
TL;DR: A novel approach is presented to evaluate normalization strategies, which includes the peptide selection component associated with the derivation of normalization values, which improves the structure of the data without introducing bias into the normalized peak intensities.
Abstract: Quantification of LC-MS peak intensities assigned during peptide identification in a typical comparative proteomics experiment will deviate from run-to-run of the instrument due to both technical and biological variation. Thus, normalization of peak intensities across an LC-MS proteomics dataset is a fundamental step in pre-processing. However, the downstream analysis of LC-MS proteomics data can be dramatically affected by the normalization method selected. Current normalization procedures for LC-MS proteomics data are presented in the context of normalization values derived from subsets of the full collection of identified peptides. The distribution of these normalization values is unknown a priori. If they are not independent from the biological factors associated with the experiment the normalization process can introduce bias into the data, possibly affecting downstream statistical biomarker discovery. We present a novel approach to evaluate normalization strategies, which includes the peptide selection component associated with the derivation of normalization values. Our approach evaluates the effect of normalization on the between-group variance structure in order to identify the most appropriate normalization methods that improve the structure of the data without introducing bias into the normalized peak intensities.
Journal Article•10.1016/J.ISPRSJPRS.2011.03.003•
Pre-processing of a sample of multi-scene and multi-date Landsat imagery used to monitor forest cover changes over the tropics

[...]

Catherine Bodart, Hugh Eva, René Beuchle, Rastislav Raši, Dario Simonetti, Hans-Jürgen Stibig, Andreas Brink, Erik Lindquist1, Frédéric Achard •
Food and Agriculture Organization1
01 Sep 2011-Isprs Journal of Photogrammetry and Remote Sensing
TL;DR: In this article, the TREES-3 project has processed more than 12,000 Landsat TM and ETM+ data subsets systematically distributed over the tropics, and the results show that the haze correction algorithm has improved the visual appearance of the image and significantly corrected the digital numbers for the red band.
Abstract: In support to the Remote Sensing Survey of the global Forest Resource Assessment 2010, the TREES-3 project has processed more than 12,000 Landsat TM and ETM+ data subsets systematically distributed over the tropics. The project aims at deriving area estimates of tropical forest cover change for the periods 1990–2000–2005. The paper presents the pre-processing steps applied in an operational and robust manner to this large amount of multi-date and multi-scene imagery: conversion to top-of-atmosphere reflectance, cloud and cloud shadow detection, haze correction and image radiometric normalization. The results show that the haze correction algorithm has improved the visual appearance of the image and significantly corrected the digital numbers for Landsat visible bands, especially the red band. The impact of the normalization procedures (forest normalization and relative normalization) was assessed on 210 image pairs: in all cases the correlation between the spectral values of the same land cover in both images was improved. The developed automatic pre-processing chain provided a consistent multi-temporal data set across the tropics that will constitute the basis for an automatic object-based supervised classification.
Journal Article•10.1002/ASI.21511•
How to evaluate universities in terms of their relative citation impacts: Fractional counting of citations and the normalization of differences among disciplines

[...]

Loet Leydesdorff1, Jung Cheol Shin2•
University of Amsterdam1, Seoul National University2
01 Jun 2011-Journal of the Association for Information Science and Technology
TL;DR: Using publication and citation data of seven Korean research universities, the advantages and the differences in the rankings are demonstrated, the possible statistics are explained, and ways to visualize the Differences in (citing) audiences in terms of a network are suggested.
Abstract: Fractional counting of citations can improve on ranking of multidisciplinary research units (such as universities) by normalizing the differences among fields of science in terms of differences in citation behavior. Furthermore, normalization in terms of citing papers abolishes the unsolved questions in scientometrics about the delineation of fields of science in terms of journals and normalization when comparing among different (sets of) journals. Using publication and citation data of seven Korean research universities, we demonstrate the advantages and the differences in the rankings, explain the possible statistics, and suggest ways to visualize the differences in (citing) audiences in terms of a network. © 2011 Wiley Periodicals, Inc.
Journal Article•10.1016/J.JBIOMECH.2010.09.015•
Methods to temporally align gait cycle data

[...]

Nathaniel E. Helwig1, Sungjin Hong1, Elizabeth T. Hsiao-Wecksler1, John D. Polk1•
University of Illinois at Urbana–Champaign1
03 Feb 2011-Journal of Biomechanics
TL;DR: It is demonstrated that piecewise temporal alignment techniques outperform other commonly used alignment methods (normalization to percent gait cycle, dynamic time warping, and derivative dynamic time Warping) in typical biomechanical and clinical alignment tasks.
Journal Article•10.5194/HESS-15-1387-2011•
An objective approach for feature extraction: distribution analysis and statistical descriptors for scale choice and channel network identification

[...]

Giulia Sofia1, Paolo Tarolli1, Federico Cazorzi2, G. Dalla Fontana1•
University of Padua1, University of Udine2
06 May 2011-Hydrology and Earth System Sciences
TL;DR: The advantage of the proposed methodology, and the efficiency and accurate localization of extracted features are demonstrated using LiDAR data of two different areas and comparing both extractions with field surveyed networks.
Abstract: . A statistical approach to LiDAR derived topographic attributes for the automatic extraction of channel network and for the choice of the scale to apply for parameter evaluation is presented in this paper. The basis of this approach is to use distribution analysis and statistical descriptors to identify channels where terrain geometry denotes significant convergences. Two case study areas with different morphology and degree of organization are used with their 1 m LiDAR Digital Terrain Models (DTMs). Topographic attribute maps (curvature and openness) for various window sizes are derived from the DTMs in order to detect surface convergences. A statistical analysis on value distributions considering each window size is carried out for the choice of the optimum kernel. We propose a three-step method to extract the network based (a) on the normalization and overlapping of openness and minimum curvature to highlight the more likely surface convergences, (b) a weighting of the upslope area according to these normalized maps to identify drainage flow paths and flow accumulation consistent with terrain geometry, (c) the standard score normalization of the weighted upslope area and the use of standard score values as non subjective threshold for channel network identification. As a final step for optimal definition and representation of the whole network, a noise-filtering and connection procedure is applied. The advantage of the proposed methodology, and the efficiency and accurate localization of extracted features are demonstrated using LiDAR data of two different areas and comparing both extractions with field surveyed networks.
Journal Article•10.1186/1471-2105-12-405•
An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data.

[...]

Trung Nghia Vu1, Dirk Valkenborg2, Dirk Valkenborg3, Koen Smets1, Kim A. Verwaest1, Roger Dommisse1, Filip Lemière1, Alain Verschoren1, Bart Goethals1, Kris Laukens1 •
University of Antwerp1, Flemish Institute for Technological Research2, University of Hasselt3
20 Oct 2011-BMC Bioinformatics
TL;DR: A novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data is introduced, embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation").
Abstract: Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/ .
Journal Article•10.1007/S13361-011-0237-2•
Evaluation of normalization methods on GeLC-MS/MS label-free spectral counting data to correct for variation during proteomic workflows.

[...]

Emine Gokce1, Christopher M. Shuford1, William L. Franck1, Ralph A. Dean1, David C. Muddiman1 •
North Carolina State University1
24 Sep 2011-Journal of the American Society for Mass Spectrometry
TL;DR: The correlation between SpCs of the same proteins across the different data sets was investigated and it was reported that TSpC normalization and NSAF normalization yielded almost ideal slopes of unity for normalized SpC versus average normalized SpCs, while NSP did not afford effective corrections of the unnormalized data.
Journal Article•10.1039/C1JA10194C•
A simplified spectrum standardization method for laser-induced breakdown spectroscopy measurements

[...]

Lizhi Li1, Zhe Wang1, Tingbi Yuan1, Zongyu Hou1, Zheng Li1, Weidou Ni1 •
Tsinghua University1
01 Nov 2011-Journal of Analytical Atomic Spectrometry
TL;DR: In this paper, the Taylor expansion was applied near the standard plasma condition to obtain the standard state value of the characteristic line intensity from theory, and the results showed that measurement precision and accuracy can be greatly improved by the application of this normalization method in measuring the Cu concentration for 29 brass alloy samples.
Abstract: Relatively high uncertainty (or low repeatability) is one of the main bottlenecks for wide application of LIBS quantitative measurements. The change of plasma temperature and electron number density from pulse to pulse weakens the correlation between the ablation mass and total or part of the spectral area for the same sample, making the normally applied normalization method not effective enough for uncertainty reduction. In the present work, it was assumed that there existed a standard state for samples with similar matrix, where there is a standard plasma temperature, electron number density, and total number density of the element of interest. Therefore, Taylor expansion can be applied near the standard plasma condition to obtain the standard state value of the characteristic line intensity from theory. The temperature variation was regarded to be proportional to the variation of the logarithm of the ratio of two spectral line intensities of the interested element, the variation of electron number density was regarded to be proportional to the variation of the full width at half maximum (FWHM), and the variation of total number density was regarded to be proportional to the variation of the sum of the multiple spectral line intensities of the measured element. Based on these assumptions, the calibration model was established. The results show that measurement precision and accuracy can be greatly improved by the application of this normalization method in measuring the Cu concentration for 29 brass alloy samples. The average relative standard deviation (RSD) value, the coefficient of determination (R2), the root mean square error of prediction (RMSEP), and average value of the maximum relative error were 2.92%, 0.99, 1.46%, 8.42%, respectively, while the values for normalization with the whole spectrum area were: 8.61%, 0.95, 3.28%, 29.19%, respectively, showing significant improvement.
Proceedings Article•
Minimum Probability Flow Learning

[...]

Jascha Sohl-Dickstein1, Peter Battaglino1, Michael R. DeWeese1•
University of California, Berkeley1
28 Jun 2011
TL;DR: In this paper, the authors propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model, which is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the KL divergence between the data distribution and the distribution produced by running the dynamics for an infinitesimal time.
Abstract: Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives. Here we propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the KL divergence between the data distribution and the distribution produced by running the dynamics for an infinitesimal time. Score matching, minimum velocity learning, and certain forms of contrastive divergence are shown to be special cases of this learning technique. We demonstrate parameter estimation in Ising models, deep belief networks and an independent component analysis model of natural scenes. In the Ising model case, current state of the art techniques are outperformed by at least an order of magnitude in learning time, with lower error in recovered coupling parameters.
Journal Article•10.1007/S10107-009-0300-Y•
On the separation of disjunctive cuts

[...]

Matteo Fischetti1, Andrea Lodi2, Andrea Tramontani2•
University of Padua1, University of Bologna2
01 Jun 2011-Mathematical Programming
TL;DR: This paper investigates the main ingredients of a disjunctive cut separation procedure, and analyzes their impact on the quality of the root-node bound for a set of instances taken from MIPLIB library.
Abstract: Disjunctive cuts for Mixed-Integer Linear Programs (MIPs) were introduced by Egon Balas in the late 1970s and have been successfully exploited in practice since the late 1990s. In this paper we investigate the main ingredients of a disjunctive cut separation procedure, and analyze their impact on the quality of the root-node bound for a set of instances taken from MIPLIB library. We compare alternative normalization conditions, and try to better understand their role. In particular, we point out that constraints that become redundant (because of the disjunction used) can produce over-weak cuts, and analyze this property with respect to the normalization used. Finally, we introduce a new normalization condition and analyze its theoretical properties and computational behavior. Along the way, we make use of a number of small numerical examples to illustrate some basic (and often misinterpreted) disjunctive programming features.
Journal Article•10.1016/J.BIOPSYCH.2010.05.023•
Changed relative to what? Housekeeping genes and normalization strategies in human brain gene expression studies

[...]

Elizabeth M. Tunbridge1, Sharon L. Eastwood1, Paul Harrison1•
University of Oxford1
15 Jan 2011-Biological Psychiatry
TL;DR: The rationales for normalization are reviewed and it is argued that in well-conducted psychiatric gene expression studies using human brain tissue, it is reducing intersubject variability rather than experimental error that is the major benefit of normalization.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve