TL;DR: A novel model-based intra-array normalization strategy for 450 k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes is proposed.
Abstract: Motivation: The Illumina Infinium 450 k DNA Methylation Beadchip is a prime candidate technology for Epigenome-Wide Association Studies (EWAS). However, a difficulty associated with these beadarrays is that probes come in two different designs, characterized by widely different DNA methylation distributions and dynamic range, which may bias downstream analyses. A key statistical issue is therefore how best to adjust for the two different probe designs.
Results: Here we propose a novel model-based intra-array normalization strategy for 450 k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes. The strategy involves application of a three-state beta-mixture model to assign probes to methylation states, subsequent transformation of probabilities into quantiles and finally a methylation-dependent dilation transformation to preserve the monotonicity and continuity of the data. We validate our method on cell-line data, fresh frozen and paraffin-embedded tumour tissue samples and demonstrate that BMIQ compares favourably with two competing methods. Specifically, we show that BMIQ improves the robustness of the normalization procedure, reduces the technical variation and bias of type2 probe values and successfully eliminates the type1 enrichment bias caused by the lower dynamic range of type2 probes. BMIQ will be useful as a preprocessing step for any study using the Illumina Infinium 450 k platform.
Availability: BMIQ is freely available from http://code.google.com/p/bmiq/.
Contact: a.teschendorff@ucl.ac.uk
Supplementary information:Supplementary data are available at Bioinformatics online
TL;DR: This work focuses on a comprehensive comparison of seven recently proposed normalization methods for the differential analysis of RNA-seq data, with an emphasis on the use of varied real and simulated datasets involving different species and experimental designs to represent data characteristics commonly observed in practice.
Abstract: During the last 3 years, a number of approaches for the normalization of RNA sequencing data have emerged in the literature, differing both in the type of bias adjustment and in the statistical strategy adopted. However, as data continue to accumulate, there has been no clear consensus on the appropriate normalization method to be used or the impact of a chosen method on the downstream analysis. In this work, we focus on a comprehensive comparison of seven recently proposed normalization methods for the differential analysis of RNA-seq data, with an emphasis on the use of varied real and simulated datasets involving different species and experimental designs to represent data characteristics commonly observed in practice. Based on this comparison study, we propose practical recommendations on the appropriate normalization method to be used and its impact on the differential analysis of RNA-seq data.
TL;DR: It is demonstrated that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics, and that careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes.
Abstract: As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets. The standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas (βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive. Careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data.
TL;DR: The protocol described here includes simultaneous measurements of beads and cells on the mass cytometer, subsequent extraction of the bead‐based signature, and the application of an algorithm enabling correction of both short‐ and long‐term signal fluctuations.
Abstract: Mass cytometry uses atomic mass spectrometry combined with isotopically pure reporter elements to currently measure as many as 40 parameters per single cell. As with any quantitative technology, there is a fundamental need for quality assurance and normalization protocols. In the case of mass cytometry, the signal variation over time due to changes in instrument performance combined with intervals between scheduled maintenance must be accounted for and then normalized. Here, samples were mixed with polystyrene beads embedded with metal lanthanides, allowing monitoring of mass cytometry instrument performance over multiple days of data acquisition. The protocol described here includes simultaneous measurements of beads and cells on the mass cytometer, subsequent extraction of the bead-based signature, and the application of an algorithm enabling correction of both short- and long-term signal fluctuations. The variation in the intensity of the beads that remains after normalization may also be used to determine data quality. Application of the algorithm to a one-month longitudinal analysis of a human peripheral blood sample reduced the range of median signal fluctuation from 4.9-fold to 1.3-fold.
TL;DR: DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number.
Abstract: Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression. TCC is available at http://www.iu.a.u-tokyo.ac.jp/~kadota/TCC/
and will appear in Bioconductor (
http://bioconductor.org/
) from ver. 2.13.
TL;DR: This article introduces the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM, a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data.
Abstract: Motivation: Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text—the task of disease name normalization (DNorm)—compared with other normalization tasks in biomedical text mining research. Methods: In this article we introduce the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH and OMIM. Our method is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. The technique is based on pairwise learning to rank, which has not previously been applied to the normalization task but has proven successful in large optimization problems for information retrieval. Results: We compare our method with several techniques based on lexical normalization and matching, MetaMap and Lucene. Our algorithm achieves 0.782 micro-averaged F-measure and 0.809 macroaveraged F-measure, an increase over the highest performing baseline method of 0.121 and 0.098, respectively. Availability: The source code for DNorm is available at http://www. ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/DNorm, along with a webbased demonstration and links to the NCBI disease corpus. Results on PubMed abstracts are available in PubTator: http://www.ncbi.nlm. nih.gov/CBBresearch/Lu/Demo/PubTator
TL;DR: It is concluded that previously published ASA normalization values were too small, primarily because the conformations that maximize ASA had not been correctly identified, and a new normalization scale is derived that does provide a tight upper bound on observed ASA values.
Abstract: The relative solvent accessibility (RSA) of a residue in a protein measures the extent of burial or exposure of that residue in the 3D structure. RSA is frequently used to describe a protein's biophysical or evolutionary properties. To calculate RSA, a residue's solvent accessibility (ASA) needs to be normalized by a suitable reference value for the given amino acid; several normalization scales have previously been proposed. However, these scales do not provide tight upper bounds on ASA values frequently observed in empirical crystal structures. Instead, they underestimate the largest allowed ASA values, by up to 20%. As a result, many empirical crystal structures contain residues that seem to have RSA values in excess of one. Here, we derive a new normalization scale that does provide a tight upper bound on observed ASA values. We pursue two complementary strategies, one based on extensive analysis of empirical structures and one based on systematic enumeration of biophysically allowed tripeptides. Both approaches yield congruent results that consistently exceed published values. We conclude that previously published ASA normalization values were too small, primarily because the conformations that maximize ASA had not been correctly identified. As an application of our results, we show that empirically derived hydrophobicity scales are sensitive to accurate RSA calculation, and we derive new hydrophobicity scales that show increased correlation with experimentally measured scales.
TL;DR: Stain-Free technology appears to be more reliable, more robust, and more sensitive to small effects of protein regulation when compared with HKP normalization with GAPDH.
TL;DR: It is shown that choice models using normalization generate significant choice phenomena driven by either the value or number of alternative options, suggesting that the neural mechanism of value coding critically influences stochastic choice behavior and provide a generalizable quantitative framework for examining context effects in decision making.
Abstract: Understanding the neural code is critical to linking brain and behavior. In sensory systems, divisive normalization seems to be a canonical neural computation, observed in areas ranging from retina to cortex and mediating processes including contrast adaptation, surround suppression, visual attention, and multisensory integration. Recent electrophysiological studies have extended these insights beyond the sensory domain, demonstrating an analogous algorithm for the value signals that guide decision making, but the effects of normalization on choice behavior are unknown. Here, we show that choice models using normalization generate significant (and classically irrational) choice phenomena driven by either the value or number of alternative options. In value-guided choice experiments, both monkey and human choosers show novel context-dependent behavior consistent with normalization. These findings suggest that the neural mechanism of value coding critically influences stochastic choice behavior and provide a generalizable quantitative framework for examining context effects in decision making.
TL;DR: On page 52 of this article, in the legend for figure 1, the text “lower concentrations are shown by lighter colours” should have read “ lower concentrations are show by darker colours’.
Abstract: Nature Reviews Neuroscience 13, 51–62 (2012) On page 52 of this article, in the legend for figure 1, the text “lower concentrations are shown by lighter colours” should have read “lower concentrations are shown by darker colours”. This has been corrected in the online version.
TL;DR: The proposed Coupled Scaled Gaussian Process Regression model for head-posing normalization outperforms state-of-the-art regression-based approaches to head-pose normalization, 2D and 3D Point Distribution Models (PDMs), and Active Appearance Models (AAMs), especially in cases of unknown poses and imbalanced training data.
Abstract: We propose a method for head-pose invariant facial expression recognition that is based on a set of characteristic facial points. To achieve head-pose invariance, we propose the Coupled Scaled Gaussian Process Regression (CSGPR) model for head-pose normalization. In this model, we first learn independently the mappings between the facial points in each pair of (discrete) nonfrontal poses and the frontal pose, and then perform their coupling in order to capture dependences between them. During inference, the outputs of the coupled functions from different poses are combined using a gating function, devised based on the head-pose estimation for the query points. The proposed model outperforms state-of-the-art regression-based approaches to head-pose normalization, 2D and 3D Point Distribution Models (PDMs), and Active Appearance Models (AAMs), especially in cases of unknown poses and imbalanced training data. To the best of our knowledge, the proposed method is the first one that is able to deal with expressive faces in the range from $(-45^\circ)$ to $(+45^\circ)$ pan rotation and $(-30^\circ)$ to $(+30^\circ)$ tilt rotation, and with continuous changes in head pose, despite the fact that training was conducted on a small set of discrete poses. We evaluate the proposed method on synthetic and real images depicting acted and spontaneously displayed facial expressions.
TL;DR: The uncertainties of stable isotope results depend not only on the technical aspects of measurements, but also on how raw data are normalized to one of the international isotope scales, so unification of the data processing protocols employed is highly desirable.
Abstract: The uncertainties of stable isotope results depend not only on the technical aspects of measurements, but also on how raw data are normalized to one of the international isotope scales. The inconsistency in the normalization methods used and in the selection of standards may lead to substantial differences in the results obtained. Therefore, unification of the data processing protocols employed is highly desirable. The best performing methods are two-point or multipoint normalization methods based on linear regression. Linear regression is most robust when based on standards that cover the entire range of δ values typically observed in nature, regardless of the δ values of the samples analysed. The uncertainty can be reduced by 50 % if measurements of two different standards are performed four times, or measurements of four standards are performed twice, with each batch of samples. Chemical matrix matching between standards and samples seems to be critical for δ 18O of nitrate or δ 2H of hair samples (thermal conversion/elemental analyser), for example; however, it is not necessarily always critical for all types of samples and techniques (e.g. not for most δ 15N and δ 13C elemental analyser analyses). To ensure that all published data can be recalculated, if δ values of standards or the isotope scales are to be updated, the details of the normalization technique and the δ values of the standards used should always be clearly reported.
TL;DR: In this paper, an overview of the source normalization approaches is provided and empirically compared with a traditional normalization approach based on a field classification system, and the issue of the selection of the journals to be included in a normalization for field differences is discussed.
Abstract: Different scientific fields have different citation practices. Citation-based bibliometric indicators need to normalize for such differences between fields in order to allow for meaningful between-field comparisons of citation impact. Traditionally, normalization for field differences has usually been done based on a field classification system. In this approach, each publication belongs to one or more fields and the citation impact of a publication is calculated relative to the other publications in the same field. Recently, the idea of source normalization was introduced, which offers an alternative approach to normalize for field differences. In this approach, normalization is done by looking at the referencing behavior of citing publications or citing journals. In this paper, we provide an overview of a number of source normalization approaches and we empirically compare these approaches with a traditional normalization approach based on a field classification system. We also pay attention to the issue of the selection of the journals to be included in a normalization for field differences. Our analysis indicates a number of problems of the traditional classification-system-based normalization approach, suggesting that source normalization approaches may yield more accurate results.
TL;DR: In this paper, the authors show that the bibliographic data can be transformed into a collection of compatible networks using network multiplication, and they also discuss the question when the multiplication of sparse networks preserves sparseness.
Abstract: In the paper we show that the bibliographic data can be transformed into a collection of compatible networks. Using network multiplication different interesting derived networks can be obtained. In defining them an appropriate normalization should be considered. The proposed approach can be applied also to other collections of compatible networks. We also discuss the question when the multiplication of sparse networks preserves sparseness. The proposed approaches are illustrated with analyses of collection of networks on the topic "social network" obtained from the Web of Science.
TL;DR: Combining multiple linear univariate features in one feature space and classifying the feature space using machine learning methods could predict epileptic seizures in patients suffering from refractory epilepsy.
TL;DR: In this article, a modified version of the classical Karhunen-Loeve expansion for a vector-valued random process, called a normalized multivariate functional principal component (mFPCn) approach, was proposed as a general stochastic representation for multivariate random functions.
Abstract: This study proposes a modified version of the classical Karhunen-Loeve expansion for a vector-valued random process, called a normalized multivariate functional principal component (mFPCn) approach, as a general stochastic representation for multivariate random functions. The mFPCn approach takes the varying extent of variations between the components of multivariate random functions into account and takes advantage of component dependency through the pairwise cross-covariance functions. The multivariate approach leads to a single set of multivariate functional principal component scores, which serves well as the proxy of multivariate functional data. We derive the consistency properties for the estimates of the mFPCn model components, and the asymptotic distributions for statistical inferences. We illustrate the finite sample performance of the mFPCn approach through the analysis of a traffic flow data set, including an application to clustering multivariate functional data derived from the mFPCn approach and a simulation study. The mFPCn approach serves as a basic and useful statistical tool for multivariate functional data analysis.
TL;DR: SAMstrt as mentioned in this paper is an extension code for SAMseq, which is a statistical method for differential expression, to enable spike-in normalization and statistical testing based on the estimated absolute number of transcripts per cell for single-cell RNA-seq methods.
Abstract: Motivation: Recent transcriptome studies have revealed that total transcript numbers vary by cell type and condition; therefore, the statistical assumptions for single-cell transcriptome studies must be revisited. SAMstrt is an extension code for SAMseq, which is a statistical method for differential expression, to enable spike-in normalization and statistical testing based on the estimated absolute number of transcripts per cell for single-cell RNA-seq methods. Availability and Implementation: SAMstrt is implemented on R and available in github (https://github.com/shka/R-SAMstrt). Contact: shintaro.katayama@ki.se Supplementary Information: Supplementary data are available at Bioinformatics online.
TL;DR: In this paper, the authors employ three algorithms from the domain of transfer learning that apply importance weights (IWs) within a support vector machine classifier to reduce the effects of covariate shift.
Abstract: Pattern recognition tasks often face the situation that training data are not fully representative of test data. This problem is well-recognized in speech recognition, where methods like cepstral mean normalization (CMN), vocal tract length normalization (VTLN) and maximum likelihood linear regression (MLLR) are used to compensate for channel and speaker differences. Speech emotion recognition (SER) is an important emerging field in human-computer interaction and faces the same data shift problems, a fact which has been generally overlooked in this domain. In this paper, we show that compensating for channel and speaker differences can give significant improvements in SER by modelling these differences as a covariate shift. We employ three algorithms from the domain of transfer learning that apply importance weights (IWs) within a support vector machine classifier to reduce the effects of covariate shift. We test these methods on the FAU Aibo Emotion Corpus, which was used in the Interspeech 2009 Emotion Challenge. It consists of two separate parts recorded independently at different schools; hence the two parts exhibit covariate shift. Results show that the IW methods outperform combined CMN and VTLN and significantly improve on the baseline performance of the Challenge. The best of the three methods also improves significantly on the winning contribution to the Challenge.
TL;DR: This study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size, named “Median Ratio Normalization” (MRN).
Abstract: In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named "Median Ratio Normalization" (MRN) gives the lower number of false discoveries. Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MRN is more consistent and robust than existing methods.
TL;DR: In this article, the authors presented a successful fit to neutron-proton and protonproton scattering data below pion production threshold using the delta-shell representation, which includes data within the years 1950 to 2013.
Abstract: Using the delta-shell representation we present a successful fit to neutron-proton and proton-proton scattering data below pion production threshold. A detailed overview of the theory necessary to calculate observables with this potential is presented. A new data selection process is used to obtain the largest mutually consistent data base. The analysis includes data within the years 1950 to 2013. Using 46 parameters we obtain chi^2/Ndata = 1.04 with Ndata = 6713 including normalization data. Phase shifts with error bars are provided.
TL;DR: This work uses the output of UNLOL to automatically normalize a large corpus of social media text, revealing a set of coherent orthographic styles that underlie online language variation.
Abstract: We present a unified unsupervised statistical model for text normalization. The relationship between standard and non-standard tokens is characterized by a log-linear model, permitting arbitrary features. The weights of these features are trained in a maximumlikelihood framework, employing a novel sequential Monte Carlo training algorithm to overcome the large label space, which would be impractical for traditional dynamic programming solutions. This model is implemented in a normalization system called UNLOL, which achieves the best known results on two normalization datasets, outperforming more complex systems. We use the output of UNLOL to automatically normalize a large corpus of social media text, revealing a set of coherent orthographic styles that underlie online language variation.
TL;DR: It is concluded that DNA concentration is a widely applicable method for normalizing metabolomic data from adherent cell lines.
Abstract: Metabolomics is a rapidly advancing field, and much of our understanding of the subject has come from research on cell lines. However, the results and interpretation of such studies depend on appropriate normalization of the data; ineffective or poorly chosen normalization methods can lead to frankly erroneous conclusions. That is a recurrent challenge because robust, reliable methods for normalization of data from cells have not been established. In this study, we have compared several methods for normalization of metabolomic data from cell extracts. Total protein concentration, cell count, and DNA concentration exhibited strong linear correlations with seeded cell number, but DNA concentration was found to be the most generally useful method for the following reasons: (1) DNA concentration showed the greatest consistency across a range of cell numbers; (2) DNA concentration was the closest to proportional with cell number; (3) DNA samples could be collected from the same dish as the metabolites; and (4)...
TL;DR: Two techniques are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures, which resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.
Abstract: We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination, where the detections of multiple systems are merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion. Both score normalization and system combination approaches show that significant gains in ATWV/MTWV can be obtained, sometimes on the order of 8-10 points (absolute), in five different languages. A variant of these methods resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.
TL;DR: This paper aims at contributing to the on-going discussion about building and applying bibliometric indicators by shedding light on their properties and requirements concerning six different aspects: deterministic versus probabilistic approach, application-related properties, the time dependence, normalization issues, size dependence and network indicators.
Abstract: This paper aims at contributing to the on-going discussion about building and applying bibliometric indicators. It sheds light on their properties and requirements concerning six different aspects: deterministic versus probabilistic approach, application-related properties, the time dependence, normalization issues, size dependence and network indicators.
TL;DR: This paper theoretically shows that normalization shrinks the spread of points in a class by a constant fraction under a broad parameter regime and obtains sharp deviation bounds of empirical principal eigenvalues of graphs generated from a stochastic blockmodel.
Abstract: Spectral clustering is a technique that clusters elements using the top few eigenvectors of their (possibly normalized) similarity matrix. The quality of spectral clustering is closely tied to the convergence properties of these principal eigenvectors. This rate of convergence has been shown to be identical for both the normalized and unnormalized variants in recent random matrix theory literature. However, normalization for spectral clustering is commonly believed to be beneficial [Stat. Comput. 17 (2007) 395-416]. Indeed, our experiments show that normalization improves prediction accuracy. In this paper, for the popular stochastic blockmodel, we theoretically show that normalization shrinks the spread of points in a class by a constant fraction under a broad parameter regime. As a byproduct of our work, we also obtain sharp deviation bounds of empirical principal eigenvalues of graphs generated from a stochastic blockmodel.
TL;DR: In this paper, a system for performing loudness normalization based on user feedback is described, where a content delivery system and a plurality of audio playback devices are used to communicate volume settings used during playback of pieces of sound program content.
Abstract: A system for performing loudness normalization based on user feedback is described herein. The system includes a content delivery system and a plurality of audio playback devices. The audio playback devices may communicate volume settings used during playback of pieces of sound program content to the content delivery system. Based on these collected data points, a statistical analysis may be performed to generate loudness adjustment values for pieces of sound program content. The loudness adjustment values may be communicated to the audio playback devices through metadata in associated pieces of sound program content or as separate communications from the content delivery system. An offline version is also described that supports individual loudness normalization adjustments based on a single user's preferences. Under either system, loudness normalization may be achieved based on real world volume settings for individual pieces of sound program content played using various playback configurations.
TL;DR: In this paper, various methods to obtain the resolution volume for neutron scattering experiments, in order to perform absolute normalization on inelastic magnetic neutron scattering data, are discussed and the advantages of different normalization processes are discussed.
Abstract: We discuss various methods to obtain the resolution volume for neutron scattering experiments, in order to perform absolute normalization on inelastic magnetic neutron scattering data. Examples from previous experiments are given. We also try to provide clear definitions of a number of physical quantities which are commonly used to describe neutron magnetic scattering results, including the dynamic spin correlation function and the imaginary part of the dynamic susceptibility. Formulas that can be used for general purposes are provided and the advantages of the different normalization processes are discussed.
TL;DR: The results showed that the calibration of injection volumes based on creatinine values could effectively eliminate intragroup differences caused by variations in the concentrations of urinary metabolites, thus giving better parallelism and clustering effects and peak area normalization could further eliminate intraclass differences.
Abstract: It is essential to choose one preprocessing method for liquid chromatography–mass spectrometry (LC-MS)-based metabolomics studies of urine samples in order to overcome their variability. However, the commonly used normalization methods do not substantially reduce the high variabilities arising from differences in urine concentration, especially for signal saturation (abundant metabolites exceed the dynamic range of the instrumentation) or missing values. Herein, a simple preacquisition strategy based on differential injection volumes calibrated by creatinine (to reduce the concentration differences between the samples), combined with normalization to “total useful MS signals” or “all MS signals”, is proposed to overcome urine variabilities. This strategy was first systematically compared with other popular normalization methods by application to serially diluted urine samples. Then, the method has been verified using rat urine samples of pre- and postinoculation of Walker 256 carcinoma cells. The results ...
TL;DR: A novel method consisting of a combination of a K-Nearest Neighbors classifier with a linear Conditional Random Fields model, a KNN-based classifier, and a semisupervised learning framework to solve the challenges of Named Entity Recognition for tweets.
Abstract: Two main challenges of Named Entity Recognition (NER) for tweets are the insufficient information in a tweet and the lack of training data. We propose a novel method consisting of three core elements: (1) normalization of tweets; (2) combination of a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model; and (3) semisupervised learning framework. The tweet normalization preprocessing corrects common ill-formed words using a global linear model. The KNN-based classifier conducts prelabeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semisupervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of normalization, KNN, and semisupervised learning.