Top 460 papers published in the topic of Normalization (statistics) in 2013

Showing papers on "Normalization (statistics) published in 2013"

Journal Article•10.1093/BIOINFORMATICS/BTS680•

A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data

[...]

Andrew E. Teschendorff¹, Francesco Marabita¹, Matthias Lechner¹, Thomas E. Bartlett¹, Jesper Tegnér¹, David Gomez-Cabrero¹, Stephan Beck¹ - Show less +3 more•Institutions (1)

University College London¹

01 Jan 2013-Bioinformatics

TL;DR: A novel model-based intra-array normalization strategy for 450 k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes is proposed.

...read moreread less

Abstract: Motivation: The Illumina Infinium 450 k DNA Methylation Beadchip is a prime candidate technology for Epigenome-Wide Association Studies (EWAS). However, a difficulty associated with these beadarrays is that probes come in two different designs, characterized by widely different DNA methylation distributions and dynamic range, which may bias downstream analyses. A key statistical issue is therefore how best to adjust for the two different probe designs. Results: Here we propose a novel model-based intra-array normalization strategy for 450 k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes. The strategy involves application of a three-state beta-mixture model to assign probes to methylation states, subsequent transformation of probabilities into quantiles and finally a methylation-dependent dilation transformation to preserve the monotonicity and continuity of the data. We validate our method on cell-line data, fresh frozen and paraffin-embedded tumour tissue samples and demonstrate that BMIQ compares favourably with two competing methods. Specifically, we show that BMIQ improves the robustness of the normalization procedure, reduces the technical variation and bias of type2 probe values and successfully eliminates the type1 enrichment bias caused by the lower dynamic range of type2 probes. BMIQ will be useful as a preprocessing step for any study using the Illumina Infinium 450 k platform. Availability: BMIQ is freely available from http://code.google.com/p/bmiq/. Contact: a.teschendorff@ucl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online

...read moreread less

1,685 citations

Journal Article•10.1093/BIB/BBS046•

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

[...]

Marie-Agnès Dillies, Andrea Rau, Julie Aubert, Christelle Hennequet-Antier¹, Marine Jeanmougin, Nicolas Servant, Céline Keime, Guillemette Marot, David Castel, Jordi Estellé, Gregory Guernec, Bernd Jagla¹, Luc Jouneau², Denis Laloë, Caroline Le Gall, Brigitte Schaeffer², Stéphane Le Crom, Mickaël Guedj², Florence Jaffrézic - Show less +15 more•Institutions (2)

Pasteur Institute¹, Institut national de la recherche agronomique²

01 Nov 2013-Briefings in Bioinformatics

TL;DR: This work focuses on a comprehensive comparison of seven recently proposed normalization methods for the differential analysis of RNA-seq data, with an emphasis on the use of varied real and simulated datasets involving different species and experimental designs to represent data characteristics commonly observed in practice.

...read moreread less

Abstract: During the last 3 years, a number of approaches for the normalization of RNA sequencing data have emerged in the literature, differing both in the type of bias adjustment and in the statistical strategy adopted. However, as data continue to accumulate, there has been no clear consensus on the appropriate normalization method to be used or the impact of a chosen method on the downstream analysis. In this work, we focus on a comprehensive comparison of seven recently proposed normalization methods for the differential analysis of RNA-seq data, with an emphasis on the use of varied real and simulated datasets involving different species and experimental designs to represent data characteristics commonly observed in practice. Based on this comparison study, we propose practical recommendations on the appropriate normalization method to be used and its impact on the differential analysis of RNA-seq data.

...read moreread less

1,380 citations

Journal Article•10.1186/1471-2164-14-293•

A data-driven approach to preprocessing Illumina 450K methylation array data

[...]

Ruth Pidsley¹, Chloe C. Y. Wong¹, Manuela Volta¹, Katie Lunnon¹, Jonathan Mill², Jonathan Mill¹, Leonard C. Schalkwyk¹ - Show less +3 more•Institutions (2)

King's College London¹, University of Exeter²

01 May 2013-BMC Genomics

TL;DR: It is demonstrated that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics, and that careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes.

...read moreread less

Abstract: As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets. The standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas (βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive. Careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data.

...read moreread less

1,124 citations

Journal Article•10.1002/CYTO.A.22271•

Normalization of mass cytometry data with bead standards

[...]

Rachel Finck¹, Erin F. Simonds¹, Astraea Jager¹, Smita Krishnaswamy², Karen Sachs¹, Wendy J. Fantl¹, Dana Pe'er², Garry P. Nolan¹, Sean C. Bendall¹ - Show less +5 more•Institutions (2)

Stanford University¹, Columbia University²

01 May 2013-Cytometry Part A

TL;DR: The protocol described here includes simultaneous measurements of beads and cells on the mass cytometer, subsequent extraction of the bead‐based signature, and the application of an algorithm enabling correction of both short‐ and long‐term signal fluctuations.

...read moreread less

Abstract: Mass cytometry uses atomic mass spectrometry combined with isotopically pure reporter elements to currently measure as many as 40 parameters per single cell. As with any quantitative technology, there is a fundamental need for quality assurance and normalization protocols. In the case of mass cytometry, the signal variation over time due to changes in instrument performance combined with intervals between scheduled maintenance must be accounted for and then normalized. Here, samples were mixed with polystyrene beads embedded with metal lanthanides, allowing monitoring of mass cytometry instrument performance over multiple days of data acquisition. The protocol described here includes simultaneous measurements of beads and cells on the mass cytometer, subsequent extraction of the bead-based signature, and the application of an algorithm enabling correction of both short- and long-term signal fluctuations. The variation in the intensity of the beads that remains after normalization may also be used to determine data quality. Application of the algorithm to a one-month longitudinal analysis of a human peripheral blood sample reduced the range of median signal fluctuation from 4.9-fold to 1.3-fold.

...read moreread less

774 citations

Journal Article•10.1186/1471-2105-14-219•

TCC: an R package for comparing tag count data with robust normalization strategies

[...]

Jianqiang Sun¹, Tomoaki Nishiyama², Kentaro Shimizu¹, Koji Kadota¹•Institutions (2)

University of Tokyo¹, Kanazawa University²

09 Jul 2013-BMC Bioinformatics

TL;DR: DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number.

...read moreread less

Abstract: Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression. TCC is available at http://www.iu.a.u-tokyo.ac.jp/~kadota/TCC/ and will appear in Bioconductor ( http://bioconductor.org/ ) from ver. 2.13.

...read moreread less

545 citations

Journal Article•10.1093/BIOINFORMATICS/BTT474•

DNorm: disease name normalization with pairwise learning to rank.

[...]

Robert Leaman¹, Rezarta Islamaj Dogan¹, Zhiyong Lu¹•Institutions (1)

Arizona State University¹

15 Nov 2013-Bioinformatics

TL;DR: This article introduces the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM, a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data.

...read moreread less

Abstract: Motivation: Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text—the task of disease name normalization (DNorm)—compared with other normalization tasks in biomedical text mining research. Methods: In this article we introduce the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH and OMIM. Our method is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. The technique is based on pairwise learning to rank, which has not previously been applied to the normalization task but has proven successful in large optimization problems for information retrieval. Results: We compare our method with several techniques based on lexical normalization and matching, MetaMap and Lucene. Our algorithm achieves 0.782 micro-averaged F-measure and 0.809 macroaveraged F-measure, an increase over the highest performing baseline method of 0.121 and 0.098, respectively. Availability: The source code for DNorm is available at http://www. ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/DNorm, along with a webbased demonstration and links to the NCBI disease corpus. Results on PubMed abstracts are available in PubTator: http://www.ncbi.nlm. nih.gov/CBBresearch/Lu/Demo/PubTator

...read moreread less

544 citations

Journal Article•10.1371/JOURNAL.PONE.0080635•

Maximum allowed solvent accessibilites of residues in proteins.

[...]

Matthew Z. Tien¹, A. Meyer², A. Meyer³, Dariya K. Sydykova³, Stephanie J. Spielman³, Claus O. Wilke³ - Show less +2 more•Institutions (3)

University of Chicago¹, Texas Tech University Health Sciences Center², University of Texas at Austin³

21 Nov 2013-PLOS ONE

TL;DR: It is concluded that previously published ASA normalization values were too small, primarily because the conformations that maximize ASA had not been correctly identified, and a new normalization scale is derived that does provide a tight upper bound on observed ASA values.

...read moreread less

Abstract: The relative solvent accessibility (RSA) of a residue in a protein measures the extent of burial or exposure of that residue in the 3D structure. RSA is frequently used to describe a protein's biophysical or evolutionary properties. To calculate RSA, a residue's solvent accessibility (ASA) needs to be normalized by a suitable reference value for the given amino acid; several normalization scales have previously been proposed. However, these scales do not provide tight upper bounds on ASA values frequently observed in empirical crystal structures. Instead, they underestimate the largest allowed ASA values, by up to 20%. As a result, many empirical crystal structures contain residues that seem to have RSA values in excess of one. Here, we derive a new normalization scale that does provide a tight upper bound on observed ASA values. We pursue two complementary strategies, one based on extensive analysis of empirical structures and one based on systematic enumeration of biophysically allowed tripeptides. Both approaches yield congruent results that consistently exceed published values. We conclude that previously published ASA normalization values were too small, primarily because the conformations that maximize ASA had not been correctly identified. As an application of our results, we show that empirically derived hydrophobicity scales are sensitive to accurate RSA calculation, and we derive new hydrophobicity scales that show increased correlation with experimentally measured scales.

...read moreread less

464 citations

Journal Article•10.1016/J.AB.2012.10.010•

Stain-Free technology as a normalization tool in Western blot analysis

[...]

Anne Gürtler, Nancy Kunz¹, Maria Gomolka, Sabine Hornhardt, Anna A. Friedl, Kevin Mcdonald¹, Jonathan E. Kohn¹, Anton Posch¹ - Show less +4 more•Institutions (1)

Bio-Rad Laboratories¹

15 Feb 2013-Analytical Biochemistry

TL;DR: Stain-Free technology appears to be more reliable, more robust, and more sensitive to small effects of protein regulation when compared with HKP normalization with GAPDH.

...read moreread less

369 citations

Journal Article•10.1073/PNAS.1217854110•

Normalization is a general neural mechanism for context-dependent decision making

[...]

Kenway Louie¹, Mel W. Khaw¹, Paul W. Glimcher²•Institutions (2)

New York University¹, Center for Neural Science²

09 Apr 2013-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is shown that choice models using normalization generate significant choice phenomena driven by either the value or number of alternative options, suggesting that the neural mechanism of value coding critically influences stochastic choice behavior and provide a generalizable quantitative framework for examining context effects in decision making.

...read moreread less

Abstract: Understanding the neural code is critical to linking brain and behavior. In sensory systems, divisive normalization seems to be a canonical neural computation, observed in areas ranging from retina to cortex and mediating processes including contrast adaptation, surround suppression, visual attention, and multisensory integration. Recent electrophysiological studies have extended these insights beyond the sensory domain, demonstrating an analogous algorithm for the value signals that guide decision making, but the effects of normalization on choice behavior are unknown. Here, we show that choice models using normalization generate significant (and classically irrational) choice phenomena driven by either the value or number of alternative options. In value-guided choice experiments, both monkey and human choosers show novel context-dependent behavior consistent with normalization. These findings suggest that the neural mechanism of value coding critically influences stochastic choice behavior and provide a generalizable quantitative framework for examining context effects in decision making.

...read moreread less

337 citations

Journal Article•10.1038/NRN3424•

Erratum: Normalization as a canonical neural computation

[...]

Matteo Carandini, David J. Heeger

01 Feb 2013-Nature Reviews Neuroscience

TL;DR: On page 52 of this article, in the legend for figure 1, the text “lower concentrations are shown by lighter colours” should have read “ lower concentrations are show by darker colours’.

...read moreread less

Abstract: Nature Reviews Neuroscience 13, 51–62 (2012) On page 52 of this article, in the legend for figure 1, the text “lower concentrations are shown by lighter colours” should have read “lower concentrations are shown by darker colours”. This has been corrected in the online version.

...read moreread less

290 citations

Journal Article•10.47893/IJCCT.2013.1201•

Min Max Normalization Based Data Perturbation Method for Privacy Protection

[...]

Yogendra Kumar Jain¹, Santosh Kumar Bhandare¹•Institutions (1)

Samrat Ashok Technological Institute¹

1 Oct 2013

Journal Article•10.1109/TPAMI.2012.233•

Coupled Gaussian processes for pose-invariant facial expression recognition

[...]

Ognjen Rudovic¹, Maja Pantic¹, Ioannis Patras²•Institutions (2)

Imperial College London¹, Queen Mary University of London²

01 Jun 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed Coupled Scaled Gaussian Process Regression model for head-posing normalization outperforms state-of-the-art regression-based approaches to head-pose normalization, 2D and 3D Point Distribution Models (PDMs), and Active Appearance Models (AAMs), especially in cases of unknown poses and imbalanced training data.

...read moreread less

Abstract: We propose a method for head-pose invariant facial expression recognition that is based on a set of characteristic facial points. To achieve head-pose invariance, we propose the Coupled Scaled Gaussian Process Regression (CSGPR) model for head-pose normalization. In this model, we first learn independently the mappings between the facial points in each pair of (discrete) nonfrontal poses and the frontal pose, and then perform their coupling in order to capture dependences between them. During inference, the outputs of the coupled functions from different poses are combined using a gating function, devised based on the head-pose estimation for the query points. The proposed model outperforms state-of-the-art regression-based approaches to head-pose normalization, 2D and 3D Point Distribution Models (PDMs), and Active Appearance Models (AAMs), especially in cases of unknown poses and imbalanced training data. To the best of our knowledge, the proposed method is the first one that is able to deal with expressive faces in the range from $(-45^\circ)$ to $(+45^\circ)$ pan rotation and $(-30^\circ)$ to $(+30^\circ)$ tilt rotation, and with continuous changes in head pose, despite the fact that training was conducted on a small set of discrete poses. We evaluate the proposed method on synthetic and real images depicting acted and spontaneously displayed facial expressions.

...read moreread less

Journal Article•10.1007/S00216-012-6517-2•

Normalization procedures and reference material selection in stable HCNOS isotope analyses: an overview

[...]

Grzegorz Skrzypek¹•Institutions (1)

University of Western Australia¹

01 Mar 2013-Analytical and Bioanalytical Chemistry

TL;DR: The uncertainties of stable isotope results depend not only on the technical aspects of measurements, but also on how raw data are normalized to one of the international isotope scales, so unification of the data processing protocols employed is highly desirable.

...read moreread less

Abstract: The uncertainties of stable isotope results depend not only on the technical aspects of measurements, but also on how raw data are normalized to one of the international isotope scales. The inconsistency in the normalization methods used and in the selection of standards may lead to substantial differences in the results obtained. Therefore, unification of the data processing protocols employed is highly desirable. The best performing methods are two-point or multipoint normalization methods based on linear regression. Linear regression is most robust when based on standards that cover the entire range of δ values typically observed in nature, regardless of the δ values of the samples analysed. The uncertainty can be reduced by 50 % if measurements of two different standards are performed four times, or measurements of four standards are performed twice, with each batch of samples. Chemical matrix matching between standards and samples seems to be critical for δ 18O of nitrate or δ 2H of hair samples (thermal conversion/elemental analyser), for example; however, it is not necessarily always critical for all types of samples and techniques (e.g. not for most δ 15N and δ 13C elemental analyser analyses). To ensure that all published data can be recalculated, if δ values of standards or the isotope scales are to be updated, the details of the normalization technique and the δ values of the standards used should always be clearly reported.

...read moreread less

Journal Article•10.1007/S11192-012-0913-4•

Source normalized indicators of citation impact: an overview of different approaches and an empirical comparison

[...]

Ludo Waltman¹, Nees Jan van Eck¹•Institutions (1)

Leiden University¹

01 Sep 2013-Scientometrics

TL;DR: In this paper, an overview of the source normalization approaches is provided and empirically compared with a traditional normalization approach based on a field classification system, and the issue of the selection of the journals to be included in a normalization for field differences is discussed.

...read moreread less

Abstract: Different scientific fields have different citation practices. Citation-based bibliometric indicators need to normalize for such differences between fields in order to allow for meaningful between-field comparisons of citation impact. Traditionally, normalization for field differences has usually been done based on a field classification system. In this approach, each publication belongs to one or more fields and the citation impact of a publication is calculated relative to the other publications in the same field. Recently, the idea of source normalization was introduced, which offers an alternative approach to normalize for field differences. In this approach, normalization is done by looking at the referencing behavior of citing publications or citing journals. In this paper, we provide an overview of a number of source normalization approaches and we empirically compare these approaches with a traditional normalization approach based on a field classification system. We also pay attention to the issue of the selection of the journals to be included in a normalization for field differences. Our analysis indicates a number of problems of the traditional classification-system-based normalization approach, suggesting that source normalization approaches may yield more accurate results.

...read moreread less

Journal Article•10.1007/S11192-012-0940-1•

On bibliographic networks

[...]

Vladimir Batagelj¹, Monika Cerinšek•Institutions (1)

University of Ljubljana¹

20 Jan 2013-arXiv: Social and Information Networks

TL;DR: In this paper, the authors show that the bibliographic data can be transformed into a collection of compatible networks using network multiplication, and they also discuss the question when the multiplication of sparse networks preserves sparseness.

...read moreread less

Abstract: In the paper we show that the bibliographic data can be transformed into a collection of compatible networks. Using network multiplication different interesting derived networks can be obtained. In defining them an appropriate normalization should be considered. The proposed approach can be applied also to other collections of compatible networks. We also discuss the question when the multiplication of sparse networks preserves sparseness. The proposed approaches are illustrated with analyses of collection of networks on the topic "social network" obtained from the Web of Science.

...read moreread less

Journal Article•10.1016/J.JNEUMETH.2013.03.019•

Preprocessing effects of 22 linear univariate features on the performance of seizure prediction methods.

[...]

Jalil Rasekhi¹, Mohammad Reza Karami Mollaei¹, Mojtaba Bandarabadi², Cesar Teixeira², António Dourado² - Show less +1 more•Institutions (2)

Babol Noshirvani University of Technology¹, University of Coimbra²

15 Jul 2013-Journal of Neuroscience Methods

TL;DR: Combining multiple linear univariate features in one feature space and classifying the feature space using machine learning methods could predict epileptic seizures in patients suffering from refractory epilepsy.

...read moreread less

Journal Article•

Multivariate functional principal component analysis: A normalization approach

[...]

Jeng-Min Chiou, Ya-Fang Yang, Yu-Ting Chen

01 May 2013-Annals of Statistics

TL;DR: In this article, a modified version of the classical Karhunen-Loeve expansion for a vector-valued random process, called a normalized multivariate functional principal component (mFPCn) approach, was proposed as a general stochastic representation for multivariate random functions.

...read moreread less

Abstract: This study proposes a modified version of the classical Karhunen-Loeve expansion for a vector-valued random process, called a normalized multivariate functional principal component (mFPCn) approach, as a general stochastic representation for multivariate random functions. The mFPCn approach takes the varying extent of variations between the components of multivariate random functions into account and takes advantage of component dependency through the pairwise cross-covariance functions. The multivariate approach leads to a single set of multivariate functional principal component scores, which serves well as the proxy of multivariate functional data. We derive the consistency properties for the estimates of the mFPCn model components, and the asymptotic distributions for statistical inferences. We illustrate the finite sample performance of the mFPCn approach through the analysis of a traffic flow data set, including an application to clustering multivariate functional data derived from the mFPCn approach and a simulation study. The mFPCn approach serves as a basic and useful statistical tool for multivariate functional data analysis.

...read moreread less

Journal Article•10.1093/BIOINFORMATICS/BTT511•

SAMstrt: statistical test for differential expression in single-cell transcriptome with spike-in normalization

[...]

Shintaro Katayama¹, Virpi Töhönen¹, Sten Linnarsson¹, Juha Kere¹•Institutions (1)

Science for Life Laboratory¹

15 Nov 2013-Bioinformatics

TL;DR: SAMstrt as mentioned in this paper is an extension code for SAMseq, which is a statistical method for differential expression, to enable spike-in normalization and statistical testing based on the estimated absolute number of transcripts per cell for single-cell RNA-seq methods.

...read moreread less

Abstract: Motivation: Recent transcriptome studies have revealed that total transcript numbers vary by cell type and condition; therefore, the statistical assumptions for single-cell transcriptome studies must be revisited. SAMstrt is an extension code for SAMseq, which is a statistical method for differential expression, to enable spike-in normalization and statistical testing based on the estimated absolute number of transcripts per cell for single-cell RNA-seq methods. Availability and Implementation: SAMstrt is implemented on R and available in github (https://github.com/shka/R-SAMstrt). Contact: shintaro.katayama@ki.se Supplementary Information: Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•10.1109/TASL.2013.2255278•

On Acoustic Emotion Recognition: Compensating for Covariate Shift

[...]

Ali Hassan¹, Robert I. Damper¹, Mahesan Niranjan¹•Institutions (1)

University of Southampton¹

01 Jul 2013-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this paper, the authors employ three algorithms from the domain of transfer learning that apply importance weights (IWs) within a support vector machine classifier to reduce the effects of covariate shift.

...read moreread less

Abstract: Pattern recognition tasks often face the situation that training data are not fully representative of test data. This problem is well-recognized in speech recognition, where methods like cepstral mean normalization (CMN), vocal tract length normalization (VTLN) and maximum likelihood linear regression (MLLR) are used to compensate for channel and speaker differences. Speech emotion recognition (SER) is an important emerging field in human-computer interaction and faces the same data shift problems, a fact which has been generally overlooked in this domain. In this paper, we show that compensating for channel and speaker differences can give significant improvements in SER by modelling these differences as a covariate shift. We employ three algorithms from the domain of transfer learning that apply importance weights (IWs) within a support vector machine classifier to reduce the effects of covariate shift. We test these methods on the FAU Aibo Emotion Corpus, which was used in the Interspeech 2009 Emotion Challenge. It consists of two separate parts recorded independently at different schools; hence the two parts exhibit covariate shift. Results show that the IW methods outperform combined CMN and VTLN and significantly improve on the baseline performance of the Challenge. The best of the three methods also improves significantly on the winning contribution to the Challenge.

...read moreread less

Journal Article•10.4161/CIB.25849•

Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments: A matter of relative size of studied transcriptomes.

[...]

Elie Maza¹, Pierre Frasse², Pavel Senin, Mondher Bouzayen, Mohamed Zouine³ - Show less +1 more•Institutions (3)

University of Toulouse¹, Entertainments National Service Association², Institut national de la recherche agronomique³

30 Jul 2013-Communicative & Integrative Biology

TL;DR: This study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size, named “Median Ratio Normalization” (MRN).

...read moreread less

Abstract: In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named "Median Ratio Normalization" (MRN) gives the lower number of false discoveries. Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MRN is more consistent and robust than existing methods.

...read moreread less

Journal Article•10.1103/PHYSREVC.88.064002•

Coarse-grained potential analysis of neutron-proton and proton-proton scattering below the pion production threshold

[...]

R. Navarro Pérez, J. E. Amaro, E. Ruiz Arriola

06 Dec 2013-Physical Review C

TL;DR: In this article, the authors presented a successful fit to neutron-proton and protonproton scattering data below pion production threshold using the delta-shell representation, which includes data within the years 1950 to 2013.

...read moreread less

Abstract: Using the delta-shell representation we present a successful fit to neutron-proton and proton-proton scattering data below pion production threshold. A detailed overview of the theory necessary to calculate observables with this potential is presented. A new data selection process is used to obtain the largest mutually consistent data base. The analysis includes data within the years 1950 to 2013. Using 46 parameters we obtain chi^2/Ndata = 1.04 with Ndata = 6713 including normalization data. Phase shifts with error bars are provided.

...read moreread less

Proceedings Article•

A Log-Linear Model for Unsupervised Text Normalization

[...]

Yi Yang¹, Jacob Eisenstein¹•Institutions (1)

Georgia Institute of Technology¹

1 Oct 2013

TL;DR: This work uses the output of UNLOL to automatically normalize a large corpus of social media text, revealing a set of coherent orthographic styles that underlie online language variation.

...read moreread less

Abstract: We present a unified unsupervised statistical model for text normalization. The relationship between standard and non-standard tokens is characterized by a log-linear model, permitting arbitrary features. The weights of these features are trained in a maximumlikelihood framework, employing a novel sequential Monte Carlo training algorithm to overcome the large label space, which would be impractical for traditional dynamic programming solutions. This model is implemented in a normalization system called UNLOL, which achieves the best known results on two normalization datasets, outperforming more complex systems. We use the output of UNLOL to automatically normalize a large corpus of social media text, revealing a set of coherent orthographic styles that underlie online language variation.

...read moreread less

Journal Article•10.1021/AC401559V•

Measurement of DNA concentration as a normalization strategy for metabolomic data from adherent cell lines

[...]

Leslie P. Silva¹, Philip L. Lorenzi¹, Preeti Purwaha¹, Valeda Yong¹, David H. Hawke¹, John N. Weinstein¹ - Show less +2 more•Institutions (1)

University of Texas MD Anderson Cancer Center¹

02 Oct 2013-Analytical Chemistry

TL;DR: It is concluded that DNA concentration is a widely applicable method for normalizing metabolomic data from adherent cell lines.

...read moreread less

Abstract: Metabolomics is a rapidly advancing field, and much of our understanding of the subject has come from research on cell lines. However, the results and interpretation of such studies depend on appropriate normalization of the data; ineffective or poorly chosen normalization methods can lead to frankly erroneous conclusions. That is a recurrent challenge because robust, reliable methods for normalization of data from cells have not been established. In this study, we have compared several methods for normalization of metabolomic data from cell extracts. Total protein concentration, cell count, and DNA concentration exhibited strong linear correlations with seeded cell number, but DNA concentration was found to be the most generally useful method for the following reasons: (1) DNA concentration showed the greatest consistency across a range of cell numbers; (2) DNA concentration was the closest to proportional with cell number; (3) DNA samples could be collected from the same dish as the metabolites; and (4)...

...read moreread less

Proceedings Article•10.1109/ASRU.2013.6707731•

Score normalization and system combination for improved keyword spotting

[...]

Damianos Karakos¹, Richard Schwartz¹, Stavros Tsakalidis¹, Le Zhang¹, Shivesh Ranjan¹, Tim Ng¹, Roger Hsiao¹, Guruprasad Saikumar¹, Ivan Bulyko¹, Long Nguyen¹, John Makhoul¹, Frantisek Grezl², Mirko Hannemann², Martin Karafiat², Igor Szöke², Karel Vesely², Lori Lamel, Viet Bac Le³ - Show less +14 more•Institutions (3)

Raytheon¹, Brno University of Technology², Vocapia Research³

1 Dec 2013

TL;DR: Two techniques are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures, which resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.

...read moreread less

Abstract: We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination, where the detections of multiple systems are merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion. Both score normalization and system combination approaches show that significant gains in ATWV/MTWV can be obtained, sometimes on the order of 8-10 points (absolute), in five different languages. A variant of these methods resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.

...read moreread less

Journal Article•10.1007/S11192-012-0898-Z•

Opinion paper: thoughts and facts on bibliometric indicators

[...]

Wolfgang Glänzel¹, Henk F. Moed²•Institutions (2)

Katholieke Universiteit Leuven¹, Elsevier²

01 Jul 2013-Scientometrics

TL;DR: This paper aims at contributing to the on-going discussion about building and applying bibliometric indicators by shedding light on their properties and requirements concerning six different aspects: deterministic versus probabilistic approach, application-related properties, the time dependence, normalization issues, size dependence and network indicators.

...read moreread less

Abstract: This paper aims at contributing to the on-going discussion about building and applying bibliometric indicators. It sheds light on their properties and requirements concerning six different aspects: deterministic versus probabilistic approach, application-related properties, the time dependence, normalization issues, size dependence and network indicators.

...read moreread less

Journal Article•10.1214/14-AOS1285•

Role of Normalization in Spectral Clustering for Stochastic Blockmodels

[...]

Purnamrita Sarkar, Peter J. Bickel

05 Oct 2013-arXiv: Machine Learning

TL;DR: This paper theoretically shows that normalization shrinks the spread of points in a class by a constant fraction under a broad parameter regime and obtains sharp deviation bounds of empirical principal eigenvalues of graphs generated from a stochastic blockmodel.

...read moreread less

Abstract: Spectral clustering is a technique that clusters elements using the top few eigenvectors of their (possibly normalized) similarity matrix. The quality of spectral clustering is closely tied to the convergence properties of these principal eigenvectors. This rate of convergence has been shown to be identical for both the normalized and unnormalized variants in recent random matrix theory literature. However, normalization for spectral clustering is commonly believed to be beneficial [Stat. Comput. 17 (2007) 395-416]. Indeed, our experiments show that normalization improves prediction accuracy. In this paper, for the popular stochastic blockmodel, we theoretically show that normalization shrinks the spread of points in a class by a constant fraction under a broad parameter regime. As a byproduct of our work, we also obtain sharp deviation bounds of empirical principal eigenvalues of graphs generated from a stochastic blockmodel.

...read moreread less

Patent•

Loudness normalization based on user feedback

[...]

Frank M. Baumgarte¹•Institutions (1)

Apple Inc.¹

25 Nov 2013

TL;DR: In this paper, a system for performing loudness normalization based on user feedback is described, where a content delivery system and a plurality of audio playback devices are used to communicate volume settings used during playback of pieces of sound program content.

...read moreread less

Abstract: A system for performing loudness normalization based on user feedback is described herein. The system includes a content delivery system and a plurality of audio playback devices. The audio playback devices may communicate volume settings used during playback of pieces of sound program content to the content delivery system. Based on these collected data points, a statistical analysis may be performed to generate loudness adjustment values for pieces of sound program content. The loudness adjustment values may be communicated to the audio playback devices through metadata in associated pieces of sound program content or as separate communications from the content delivery system. An offline version is also described that supports individual loudness normalization adjustments based on a single user's preferences. Under either system, loudness normalization may be achieved based on real world volume settings for individual pieces of sound program content played using various playback configurations.

...read moreread less

Journal Article•10.1063/1.4818323•

Absolute cross-section normalization of magnetic neutron scattering data

[...]

Guangyong Xu¹, Zhijun Xu, John M. Tranquada•Institutions (1)

Brookhaven National Laboratory¹

19 Aug 2013-Review of Scientific Instruments

TL;DR: In this paper, various methods to obtain the resolution volume for neutron scattering experiments, in order to perform absolute normalization on inelastic magnetic neutron scattering data, are discussed and the advantages of different normalization processes are discussed.

...read moreread less

Abstract: We discuss various methods to obtain the resolution volume for neutron scattering experiments, in order to perform absolute normalization on inelastic magnetic neutron scattering data. Examples from previous experiments are given. We also try to provide clear definitions of a number of physical quantities which are commonly used to describe neutron magnetic scattering results, including the dynamic spin correlation function and the imaginary part of the dynamic susceptibility. Formulas that can be used for general purposes are provided and the advantages of the different normalization processes are discussed.

...read moreread less

Journal Article•10.1021/AC401400B•

Combination of Injection Volume Calibration by Creatinine and MS Signals’ Normalization to Overcome Urine Variability in LC-MS-Based Metabolomics Studies

[...]

Yanhua Chen¹, Guoqing Shen¹, Ruiping Zhang¹, Jiuming He¹, Yi Zhang¹, Jing Xu¹, Wei Yang¹, Xiaoguang Chen¹, Yongmei Song¹, Zeper Abliz¹ - Show less +6 more•Institutions (1)

Peking Union Medical College¹

02 Aug 2013-Analytical Chemistry

TL;DR: The results showed that the calibration of injection volumes based on creatinine values could effectively eliminate intragroup differences caused by variations in the concentrations of urinary metabolites, thus giving better parallelism and clustering effects and peak area normalization could further eliminate intraclass differences.

...read moreread less

Abstract: It is essential to choose one preprocessing method for liquid chromatography–mass spectrometry (LC-MS)-based metabolomics studies of urine samples in order to overcome their variability. However, the commonly used normalization methods do not substantially reduce the high variabilities arising from differences in urine concentration, especially for signal saturation (abundant metabolites exceed the dynamic range of the instrumentation) or missing values. Herein, a simple preacquisition strategy based on differential injection volumes calibrated by creatinine (to reduce the concentration differences between the samples), combined with normalization to “total useful MS signals” or “all MS signals”, is proposed to overcome urine variabilities. This strategy was first systematically compared with other popular normalization methods by application to serially diluted urine samples. Then, the method has been verified using rat urine samples of pre- and postinoculation of Walker 256 carcinoma cells. The results ...

...read moreread less

Journal Article•10.1145/2414425.2414428•

Named entity recognition for tweets

[...]

Xiaohua Liu¹, Furu Wei², Shaodian Zhang³, Ming Zhou²•Institutions (3)

Harbin Institute of Technology¹, Microsoft², Shanghai Jiao Tong University³

01 Feb 2013-ACM Transactions on Intelligent Systems and Technology

TL;DR: A novel method consisting of a combination of a K-Nearest Neighbors classifier with a linear Conditional Random Fields model, a KNN-based classifier, and a semisupervised learning framework to solve the challenges of Named Entity Recognition for tweets.

...read moreread less

Abstract: Two main challenges of Named Entity Recognition (NER) for tweets are the insufficient information in a tweet and the lack of training data. We propose a novel method consisting of three core elements: (1) normalization of tweets; (2) combination of a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model; and (3) semisupervised learning framework. The tweet normalization preprocessing corrects common ill-formed words using a global linear model. The KNN-based classifier conducts prelabeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semisupervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of normalization, KNN, and semisupervised learning.

...read moreread less

...

Expand