TL;DR: The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization, which allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions.
Abstract: We present a method to boost the performance of probabilistic generative models that work with i-vector representations. The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization. This non-linear transformation allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions. Significant performance improvements are demonstrated on the telephone portion of NIST SRE 2010.
TL;DR: The authors' within-lane normalization procedures, followed by between-lanenormalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression.
Abstract: Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.
TL;DR: This model, which uses a simple functional operation (normalization) for which there is considerable experimental support, also accounts for the recent observation that the mathematical rule by which multisensory neurons combine their inputs changes with cue reliability.
Abstract: Responses of neurons that integrate multiple sensory inputs are traditionally characterized in terms of a set of empirical principles. However, a simple computational framework that accounts for these empirical features of multisensory integration has not been established. We propose that divisive normalization, acting at the stage of multisensory integration, can account for many of the empirical principles of multisensory integration shown by single neurons, such as the principle of inverse effectiveness and the spatial principle. This model, which uses a simple functional operation (normalization) for which there is considerable experimental support, also accounts for the recent observation that the mathematical rule by which multisensory neurons combine their inputs changes with cue reliability. The normalization model, which makes a strong testable prediction regarding cross-modal suppression, may therefore provide a simple unifying computational account of the important features of multisensory integration by neurons.
TL;DR: It is shown that neurons in the monkey lateral intraparietal cortex encode a relative form of saccadic value, explicitly dependent on the values of the other available alternatives, which provides a possible mechanistic basis for behavioral context-dependent violations of rationality.
Abstract: The representation of value is a critical component of decision making. Rational choice theory assumes that options are assigned absolute values, independent of the value or existence of other alternatives. However, context-dependent choice behavior in both animals and humans violates this assumption, suggesting that biological decision processes rely on comparative evaluation. Here we show that neurons in the monkey lateral intraparietal cortex encode a relative form of saccadic value, explicitly dependent on the values of the other available alternatives. Analogous to extra-classical receptive field effects in visual cortex, this relative representation incorporates target values outside the response field and is observed in both stimulus-driven activity and baseline firing rates. This context-dependent modulation is precisely described by divisive normalization, indicating that this standard form of sensory gain control may be a general mechanism of cortical computation. Such normalization in decision circuits effectively implements an adaptive gain control for value coding and provides a possible mechanistic basis for behavioral context-dependent violations of rationality.
TL;DR: A measure based on normalized mutual information, which demonstrates unintuitive behaviour of this measure, and shows how this can be corrected by using a more conventional normalization, is demonstrated.
Abstract: Given the increasing popularity of algorithms for overlapping clustering, in particular in social network analysis, quantitative measures are needed to measure the accuracy of a method. Given a set of true clusters, and the set of clusters found by an algorithm, these sets of clusters must be compared to see how similar or different the sets are. A normalized measure is desirable in many contexts, for example assigning a value of 0 where the two sets are totally dissimilar, and 1 where they are identical. A measure based on normalized mutual information, [1], has recently become popular. We demonstrate unintuitive behaviour of this measure, and show how this can be corrected by using a more conventional normalization. We compare the results to that of other measures, such as the Omega index [2].
TL;DR: A novel illumination insensitive representation of face images under varying illuminations is exploited via a ratio image, called “Weber-face,” where a ratio between local intensity variation and the background is computed.
Abstract: Weber's law suggests that for a stimulus, the ratio between the smallest perceptual change and the background is a constant, which implies stimuli are perceived not in absolute terms but in relative terms. Inspired from this, we exploit and analyze a novel illumination insensitive representation of face images under varying illuminations via a ratio image, called “Weber-face,” where a ratio between local intensity variation and the background is computed. Experimental results on both CMU-PIE and Yale B face databases show that Weber-face performs better than the existing representative approaches.
TL;DR: This work investigated whether normalization can be improved if dominant signals are excluded from the calculation and found two alternatives: normalization on the spectra noise level or on the median of signal intensities in the spectrum to be significantly more robust against artifact generation.
Abstract: Normalization is critically important for the proper interpretation of matrix-assisted laser desorption/ionization (MALDI) imaging datasets. The effects of the commonly used normalization techniques based on total ion count (TIC) or vector norm normalization are significant, and they are frequently beneficial. In certain cases, however, these normalization algorithms may produce misleading results and possibly lead to wrong conclusions, e.g. regarding to potential biomarker distributions. This is typical for tissues in which signals of prominent abundance are present in confined areas, such as insulin in the pancreas or β-amyloid peptides in the brain. In this work, we investigated whether normalization can be improved if dominant signals are excluded from the calculation. Because manual interaction with the data (e.g., defining the abundant signals) is not desired for routine analysis, we investigated two alternatives: normalization on the spectra noise level or on the median of signal intensities in the spectrum. Normalization on the median and the noise level was found to be significantly more robust against artifact generation compared to normalization on the TIC. Therefore, we propose to include these normalization methods in the standard “toolbox” of MALDI imaging for reliable results under conditions of automation.
TL;DR: Experiments results show that the method using SVM with normalization has much better performance compared to the method use SVM without normalization in classing intrusion data of KDD99 and Min-Max Normalization has better performance in speed, accuracy of cross validation and quantity of support vectors than other normalization methods.
Abstract: Network intrusion is always hidden in a mass of routine data and the differences between these data are very large. Normalization can help to speed up the learning phase and avoiding numerical problems such as precision loss from arithmetic overflows. Some normalization methods are analyzed and simulated. Experiments results show that the method using SVM with normalization has much better performance compared to the method using SVM without normalization in classing intrusion data of KDD99 and Min-Max Normalization has better performance in speed, accuracy of cross validation and quantity of support vectors than other normalization methods.
TL;DR: Questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speaker-specific nature of mental disorders are addressed empirically using classifier configurations employed in emotion recognition from speech.
Abstract: In recent years, the problem of automatic detection of mental illness from the speech signal has gained some initial interest, however questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speaker-specific nature of mental disorders. In this paper, these questions are addressed empirically using classifier configurations employed in emotion recognition from speech, evaluated on a 47-speaker depressed/neutral read sentence speech database. Results demonstrate that (1) detailed spectral features are well suited to the task, (2) speaker normalization provides benefits mainly for less detailed features, and (3) dynamic information appears to provide little benefit. Classification accuracy using a combination of MFCC and formant based features approached 80% for this database.
TL;DR: This paper proposes a general and efficient method to introduce a sufficiently large lower bound for TF normalization which can be shown analytically to fix or alleviate the problem of very long documents being overly penalized.
Abstract: In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is not lower-bounded properly; as a result, very long documents tend to be overly penalized. In order to analytically diagnose this problem, we propose two desirable formal constraints to capture the heuristic of lower-bounding TF, and use constraint analysis to examine several representative retrieval functions. Analysis results show that all these retrieval functions can only satisfy the constraints for a certain range of parameter values and/or for a particular set of query terms. Empirical results further show that the retrieval performance tends to be poor when the parameter is out of the range or the query term is not in the particular set. To solve this common problem, we propose a general and efficient method to introduce a sufficiently large lower bound for TF normalization which can be shown analytically to fix or alleviate the problem. Our experimental results demonstrate that the proposed method, incurring almost no additional computational cost, can be applied to state-of-the-art retrieval functions, such as Okapi BM25, language models, and the divergence from randomness approach, to significantly improve the average precision, especially for verbose queries.
TL;DR: This work measured the expression of 20 candidate reference genes and 7 target genes in 15 Drosophila head cDNA samples using RT-qPCR to establish a method for determination of the most stable normalizing factor (NF) across samples for robust data normalization.
Abstract: Reverse transcription and real-time PCR (RT-qPCR) has been widely used for rapid quantification of relative gene expression. To offset technical confounding variations, stably-expressed internal reference genes are measured simultaneously along with target genes for data normalization. Statistic methods have been developed for reference validation; however normalization of RT-qPCR data still remains arbitrary due to pre-experimental determination of particular reference genes. To establish a method for determination of the most stable normalizing factor (NF) across samples for robust data normalization, we measured the expression of 20 candidate reference genes and 7 target genes in 15 Drosophila head cDNA samples using RT-qPCR. The 20 reference genes exhibit sample-specific variation in their expression stability. Unexpectedly the NF variation across samples does not exhibit a continuous decrease with pairwise inclusion of more reference genes, suggesting that either too few or too many reference genes may detriment the robustness of data normalization. The optimal number of reference genes predicted by the minimal and most stable NF variation differs greatly from 1 to more than 10 based on particular sample sets. We also found that GstD1, InR and Hsp70 expression exhibits an age-dependent increase in fly heads; however their relative expression levels are significantly affected by NF using different numbers of reference genes. Due to highly dependent on actual data, RT-qPCR reference genes thus have to be validated and selected at post-experimental data analysis stage rather than by pre-experimental determination.
TL;DR: This work extends the circle fitting method of Rangarajan and Kanatani (2009) to accommodate ellipse fitting and relies on algebraic distance minimization with a carefully chosen scale normalization to derive an estimator far superior to the standard LS and slightly better than the Taubin estimator.
TL;DR: The NORMA-Gene algorithm is presented, which is based on a data-driven normalization and is useful for as little as five target genes comprising the data-set, allowing researchers to focus their efforts on studying target genes of biological relevance.
Abstract: Normalization of target gene expression, measured by real-time quantitative PCR (qPCR), is a requirement for reducing experimental bias and thereby improving data quality. The currently used normalization approach is based on using one or more reference genes. Yet, this approach extends the experimental work load and suffers from assumptions that may be difficult to meet and to validate. We developed a data driven normalization algorithm (NORMA-Gene). An analysis of the performance of NORMA-Gene compared to reference gene normalization on artificially generated data-sets showed that the NORMA-Gene normalization yielded more precise results under a large range of parameters tested. Furthermore, when tested on three very different real qPCR data-sets NORMA-Gene was shown to be best at reducing variance due to experimental bias in all three data-sets compared to normalization based on the use of reference gene(s). Here we present the NORMA-Gene algorithm that is applicable to all biological and biomedical qPCR studies, especially those that are based on a limited number of assayed genes. The method is based on a data-driven normalization and is useful for as little as five target genes comprising the data-set. NORMA-Gene does not require the identification and validation of reference genes allowing researchers to focus their efforts on studying target genes of biological relevance.
TL;DR: Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions, and EB correction along with normalization is recommended for effective batch effect removal.
Abstract: Genome-wide methylation profiling has led to more comprehensive insights into gene regulation mechanisms and potential therapeutic targets. Illumina Human Methylation BeadChip is one of the most commonly used genome-wide methylation platforms. Similar to other microarray experiments, methylation data is susceptible to various technical artifacts, particularly batch effects. To date, little attention has been given to issues related to normalization and batch effect correction for this kind of data. We evaluated three common normalization approaches and investigated their performance in batch effect removal using three datasets with different degrees of batch effects generated from HumanMethylation27 platform: quantile normalization at average β value (QNβ); two step quantile normalization at probe signals implemented in "lumi" package of R (lumi); and quantile normalization of A and B signal separately (ABnorm). Subsequent Empirical Bayes (EB) batch adjustment was also evaluated. Each normalization could remove a portion of batch effects and their effectiveness differed depending on the severity of batch effects in a dataset. For the dataset with minor batch effects (Dataset 1), normalization alone appeared adequate and "lumi" showed the best performance. However, all methods left substantial batch effects intact in the datasets with obvious batch effects and further correction was necessary. Without any correction, 50 and 66 percent of CpGs were associated with batch effects in Dataset 2 and 3, respectively. After QNβ, lumi or ABnorm, the number of CpGs associated with batch effects were reduced to 24, 32, and 26 percent for Dataset 2; and 37, 46, and 35 percent for Dataset 3, respectively. Additional EB correction effectively removed such remaining non-biological effects. More importantly, the two-step procedure almost tripled the numbers of CpGs associated with the outcome of interest for the two datasets. Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions. Normalization can reduce part but not all batch effects. EB correction along with normalization is recommended for effective batch effect removal.
TL;DR: New approaches for data normalization that allow a more accurate assessment of differential abundance by reducing the covariance between individual features implicitly introduced by the traditionally used ratio-based normalization are described.
Abstract: Metagenomic studies were originally focused on exploratory/validation projects but are rapidly being applied in a clinical setting. In this setting, researchers are interested in finding characteristics of the microbiome that correlate with the clinical status of the corresponding sample. Comparatively few computational/statistical tools have been developed that can assist in this process. Rather, most developments in the metagenomics community have focused on methods that compare samples as a whole. Specifically, the focus has been on developing robust methods for determining the level of similarity or difference between samples, rather than on identifying the specific characteristics that distinguish different samples from each other. Metastats [1] was the first statistical method developed specifically to address the questions asked in clinical studies. Metastats allows a comparison of metagenomic samples (represented as counts of individual features such as organisms, genes and functional groups) from two treatment populations (for example, healthy versus disease) and identifies those features that statistically distinguish the two populations.
Here, we present major improvements to the Metastats software and the underlying statistical methods. First, we describe new approaches for data normalization that allow a more accurate assessment of differential abundance by reducing the covariance between individual features implicitly introduced by the traditionally used ratio-based normalization. These normalization techniques are also of interest for time-series analyses or in the estimation of microbial networks. A second extension of Metastats is a mixed-model zero-inflated Gaussian distribution that allows Metastats to account for a common characteristic of metagenomic data: the presence of many features with zero counts owing to undersampling of the community. The number of ‘missing features’ (zero counts) correlates with the amount of sequencing performed, thereby biasing abundance measurements and the differential abundance statistics derived from them.
Using simulated and real data, we show that these methods significantly improve the accuracy of Metastats. We also describe the addition of several new statistical tests to our code (including presence/absence and the corresponding odds ratio, and penetrance calculations) that improve the usability of our software in clinical practice.
TL;DR: Several subsampling-based normalization strategies were applied to different high-throughput sequencing data sets originating from human and murine gut environments and their effects on the data sets' characteristics and normalization efficiencies, as measured by several β-diversity metrics were compared.
Abstract: Several subsampling-based normalization strategies were applied to different high-throughput sequencing data sets originating from human and murine gut environments. Their effects on the data sets' characteristics and normalization efficiencies, as measured by several β-diversity metrics, were compared. For both data sets, subsampling to the median rather than the minimum number appeared to improve the analysis.
TL;DR: Of the four successful methods, XPN generally shows the highest inter-platform concordance when treatment groups are equally sized, while DWD is most robust to differently sized treatment groups and consistently shows the smallest loss in gene detection.
Abstract: Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the combination and comparison of data sets. Researchers wishing to perform cross platform normalization face two major obstacles. First, a choice must be made about which method or methods to employ. Nine are currently available, and no rigorous comparison exists. Second, software for the selected method must be obtained and incorporated into a data analysis workflow.
TL;DR: In this article, a reply to the article ''Scopus's Source Normalized Impact per Paper (SNIP) versus a Journal Impact Factor based on Fractional Counting of Citations\", published by Loet Leydesdorff and Tobias Opthof (arXiv:1004.3580v2 [cs.DL]).
Abstract: This paper is a reply to the article \"Scopus's Source Normalized Impact per Paper (SNIP) versus a Journal Impact Factor based on Fractional Counting of Citations\", published by Loet Leydesdorff and Tobias Opthof (arXiv:1004.3580v2 [cs.DL]). It clarifies the relationship between SNIP and Elsevier's Scopus. Since Leydesdorff and Opthof's description of SNIP is not complete, it indicates four key differences between SNIP and the indicator proposed by the two authors, and argues why the former is more valid than the latter. Nevertheless, the idea of fractional citation counting deserves further exploration. The paper discusses difficulties that arise if one attempts to apply this principle at the level of individual (citing) papers.
TL;DR: A novel approach is presented to evaluate normalization strategies, which includes the peptide selection component associated with the derivation of normalization values, which improves the structure of the data without introducing bias into the normalized peak intensities.
Abstract: Quantification of LC-MS peak intensities assigned during peptide identification in a typical comparative proteomics experiment will deviate from run-to-run of the instrument due to both technical and biological variation. Thus, normalization of peak intensities across an LC-MS proteomics dataset is a fundamental step in pre-processing. However, the downstream analysis of LC-MS proteomics data can be dramatically affected by the normalization method selected. Current normalization procedures for LC-MS proteomics data are presented in the context of normalization values derived from subsets of the full collection of identified peptides. The distribution of these normalization values is unknown a priori. If they are not independent from the biological factors associated with the experiment the normalization process can introduce bias into the data, possibly affecting downstream statistical biomarker discovery. We present a novel approach to evaluate normalization strategies, which includes the peptide selection component associated with the derivation of normalization values. Our approach evaluates the effect of normalization on the between-group variance structure in order to identify the most appropriate normalization methods that improve the structure of the data without introducing bias into the normalized peak intensities.
TL;DR: In this article, the TREES-3 project has processed more than 12,000 Landsat TM and ETM+ data subsets systematically distributed over the tropics, and the results show that the haze correction algorithm has improved the visual appearance of the image and significantly corrected the digital numbers for the red band.
Abstract: In support to the Remote Sensing Survey of the global Forest Resource Assessment 2010, the TREES-3 project has processed more than 12,000 Landsat TM and ETM+ data subsets systematically distributed over the tropics. The project aims at deriving area estimates of tropical forest cover change for the periods 1990–2000–2005. The paper presents the pre-processing steps applied in an operational and robust manner to this large amount of multi-date and multi-scene imagery: conversion to top-of-atmosphere reflectance, cloud and cloud shadow detection, haze correction and image radiometric normalization. The results show that the haze correction algorithm has improved the visual appearance of the image and significantly corrected the digital numbers for Landsat visible bands, especially the red band. The impact of the normalization procedures (forest normalization and relative normalization) was assessed on 210 image pairs: in all cases the correlation between the spectral values of the same land cover in both images was improved. The developed automatic pre-processing chain provided a consistent multi-temporal data set across the tropics that will constitute the basis for an automatic object-based supervised classification.
TL;DR: Using publication and citation data of seven Korean research universities, the advantages and the differences in the rankings are demonstrated, the possible statistics are explained, and ways to visualize the Differences in (citing) audiences in terms of a network are suggested.
TL;DR: It is demonstrated that piecewise temporal alignment techniques outperform other commonly used alignment methods (normalization to percent gait cycle, dynamic time warping, and derivative dynamic time Warping) in typical biomechanical and clinical alignment tasks.
TL;DR: The advantage of the proposed methodology, and the efficiency and accurate localization of extracted features are demonstrated using LiDAR data of two different areas and comparing both extractions with field surveyed networks.
Abstract: . A statistical approach to LiDAR derived topographic attributes for the automatic extraction of channel network and for the choice of the scale to apply for parameter evaluation is presented in this paper. The basis of this approach is to use distribution analysis and statistical descriptors to identify channels where terrain geometry denotes significant convergences. Two case study areas with different morphology and degree of organization are used with their 1 m LiDAR Digital Terrain Models (DTMs). Topographic attribute maps (curvature and openness) for various window sizes are derived from the DTMs in order to detect surface convergences. A statistical analysis on value distributions considering each window size is carried out for the choice of the optimum kernel. We propose a three-step method to extract the network based (a) on the normalization and overlapping of openness and minimum curvature to highlight the more likely surface convergences, (b) a weighting of the upslope area according to these normalized maps to identify drainage flow paths and flow accumulation consistent with terrain geometry, (c) the standard score normalization of the weighted upslope area and the use of standard score values as non subjective threshold for channel network identification. As a final step for optimal definition and representation of the whole network, a noise-filtering and connection procedure is applied. The advantage of the proposed methodology, and the efficiency and accurate localization of extracted features are demonstrated using LiDAR data of two different areas and comparing both extractions with field surveyed networks.
TL;DR: A novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data is introduced, embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation").
Abstract: Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/
.
TL;DR: The correlation between SpCs of the same proteins across the different data sets was investigated and it was reported that TSpC normalization and NSAF normalization yielded almost ideal slopes of unity for normalized SpC versus average normalized SpCs, while NSP did not afford effective corrections of the unnormalized data.
TL;DR: In this paper, the Taylor expansion was applied near the standard plasma condition to obtain the standard state value of the characteristic line intensity from theory, and the results showed that measurement precision and accuracy can be greatly improved by the application of this normalization method in measuring the Cu concentration for 29 brass alloy samples.
Abstract: Relatively high uncertainty (or low repeatability) is one of the main bottlenecks for wide application of LIBS quantitative measurements. The change of plasma temperature and electron number density from pulse to pulse weakens the correlation between the ablation mass and total or part of the spectral area for the same sample, making the normally applied normalization method not effective enough for uncertainty reduction. In the present work, it was assumed that there existed a standard state for samples with similar matrix, where there is a standard plasma temperature, electron number density, and total number density of the element of interest. Therefore, Taylor expansion can be applied near the standard plasma condition to obtain the standard state value of the characteristic line intensity from theory. The temperature variation was regarded to be proportional to the variation of the logarithm of the ratio of two spectral line intensities of the interested element, the variation of electron number density was regarded to be proportional to the variation of the full width at half maximum (FWHM), and the variation of total number density was regarded to be proportional to the variation of the sum of the multiple spectral line intensities of the measured element. Based on these assumptions, the calibration model was established. The results show that measurement precision and accuracy can be greatly improved by the application of this normalization method in measuring the Cu concentration for 29 brass alloy samples. The average relative standard deviation (RSD) value, the coefficient of determination (R2), the root mean square error of prediction (RMSEP), and average value of the maximum relative error were 2.92%, 0.99, 1.46%, 8.42%, respectively, while the values for normalization with the whole spectrum area were: 8.61%, 0.95, 3.28%, 29.19%, respectively, showing significant improvement.
TL;DR: In this paper, the authors propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model, which is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the KL divergence between the data distribution and the distribution produced by running the dynamics for an infinitesimal time.
Abstract: Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives. Here we propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the KL divergence between the data distribution and the distribution produced by running the dynamics for an infinitesimal time. Score matching, minimum velocity learning, and certain forms of contrastive divergence are shown to be special cases of this learning technique. We demonstrate parameter estimation in Ising models, deep belief networks and an independent component analysis model of natural scenes. In the Ising model case, current state of the art techniques are outperformed by at least an order of magnitude in learning time, with lower error in recovered coupling parameters.
TL;DR: This paper investigates the main ingredients of a disjunctive cut separation procedure, and analyzes their impact on the quality of the root-node bound for a set of instances taken from MIPLIB library.
Abstract: Disjunctive cuts for Mixed-Integer Linear Programs (MIPs) were introduced by Egon Balas in the late 1970s and have been successfully exploited in practice since the late 1990s. In this paper we investigate the main ingredients of a disjunctive cut separation procedure, and analyze their impact on the quality of the root-node bound for a set of instances taken from MIPLIB library. We compare alternative normalization conditions, and try to better understand their role. In particular, we point out that constraints that become redundant (because of the disjunction used) can produce over-weak cuts, and analyze this property with respect to the normalization used. Finally, we introduce a new normalization condition and analyze its theoretical properties and computational behavior. Along the way, we make use of a number of small numerical examples to illustrate some basic (and often misinterpreted) disjunctive programming features.
TL;DR: The rationales for normalization are reviewed and it is argued that in well-conducted psychiatric gene expression studies using human brain tissue, it is reducing intersubject variability rather than experimental error that is the major benefit of normalization.