TL;DR: The normalization strategy presented here is a prerequisite for accurate RT-PCR expression profiling, which opens up the possibility of studying the biological relevance of small expression differences.
Abstract: Gene-expression analysis is increasingly important in biological research, with real-time reverse transcription PCR (RT-PCR) becoming the method of choice for high-throughput and accurate expression profiling of selected genes. Given the increased sensitivity, reproducibility and large dynamic range of this methodology, the requirements for a proper internal control gene for normalization have become increasingly stringent. Although housekeeping gene expression has been reported to vary considerably, no systematic survey has properly determined the errors related to the common practice of using only one control gene, nor presented an adequate way of working around this problem. We outline a robust and innovative strategy to identify the most stably expressed control genes in a given set of tissues, and to determine the minimum number of genes required to calculate a reliable normalization factor. We have evaluated ten housekeeping genes from different abundance and functional classes in various human tissues, and demonstrated that the conventional use of a single gene for normalization leads to relatively large errors in a significant proportion of samples tested. The geometric mean of multiple carefully selected housekeeping genes was validated as an accurate normalization factor by analyzing publicly available microarray data. The normalization strategy presented here is a prerequisite for accurate RT-PCR expression profiling, which, among other things, opens up the possibility of studying the biological relevance of small expression differences.
TL;DR: This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments.
Abstract: There are many sources of systematic variation in cDNA microarray experiments which affect the measured gene expression levels (e.g. differences in labeling efficiency between the two fluorescent dyes). The term normalization refers to the process of removing such variation. A constant adjustment is often used to force the distribution of the intensity log ratios to have a median of zero for each slide. However, such global normalization approaches are not adequate in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments. The selection of appropriate controls for normalization is discussed and a novel set of controls (microarray sample pool, MSP) is introduced to aid in intensity-dependent normalization. Lastly, to allow for comparisons of expression levels across slides, a robust method based on maximum likelihood estimation is proposed to adjust for scale differences among slides.
TL;DR: A framework for deriving probabilistic models of Information Retrieval using term-weighting models obtained in the language model approach by measuring the divergence of the actual term distribution from that obtained under a random process is introduced.
Abstract: We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model.
TL;DR: In this article, it was shown that uncorrelated variables acquire spurious correlations when normalized, and a number of realistic scenarios were worked out to show that the correlations between normalized element contents still suffer from the closure effect.
TL;DR: In this paper, a system and method for dynamically generating alarm thresholds for performance metrics, and for applying those thresholds to generate alarms is described, where statistical methods are used to generate one or more thresholds for metrics that may not fit a Gaussian or normal distribution, or that may exhibit cyclic behavior or persistent shifts in the values of the metrics.
Abstract: A system and method for dynamically generating alarm thresholds for performance metrics, and for applying those thresholds to generate alarms is described. Statistical methods are used to generate one or more thresholds for metrics that may not fit a Gaussian or normal distribution, or that may exhibit cyclic behavior or persistent shifts in the values of the metrics. The statistical methods used to generate the thresholds may include statistical process control (SPC) methods, normalization methods, and heuristics.
TL;DR: The most important of these recommendations is the use of a two-tiered normalization approach including wet sieving (<63 microm), followed by an additional geochemical co-factor normalization.
Abstract: Rational pollution, or the effectiveness of natural attenuation assessments based upon estimating the degree of contamination, critically depends on the basis of a sound normalization to take into account heterogeneous sedimentary environments. By normalizing the measured contaminant concentration patterns for the sediment characteristics, the inherent variability can be reduced and so allow a more meaningful assessment of both the spatial distributions and the temporal trends. A brief overview and guidance in the methodology available for choosing an appropriate site-specific normalization approach is presented. This is followed by general recommendations with respect to the choice of normalizer and the necessary geochemical and statistical quality assurance methods, with support from the results of recent international intercomparison exercises within the QUASH (Quality Assurance of Sample Handling) programme, as well as discussions within the International Commission
on the Exploration of the Sea (ICES) working groups. The most important of these recommendations is the use of a two-tiered normalization approach including wet sieving (<63 µm), followed by an additional geochemical co-factor normalization.
TL;DR: Testing a normalization procedure based on comparing gene expression levels to the signals generated from hybridizing genomic DNA concluded that genomic DNA standards offer advantages over conventional RNA normalization procedures and can be adapted for the investigation of microbial genomes.
Abstract: A fundamental problem in DNA microarray analysis is the lack of a common standard to compare the expression levels of different samples. Several normalization protocols have been proposed to overcome variables inherent in this technology. As yet, there are no satisfactory methods to exchange gene expression data among different research groups or to compare gene expression values under different stimulus‐response profiles. We have tested a normalization procedure based on comparing gene expression levels to the signals generated from hybridizing genomic DNA (genomic normalization). This procedure was applied to DNA microarrays of Mycobacterium tuberculosis using RNA extracted from cultures growing to the logarithmic and stationary phases. The applied normalization procedure generated reproducible measurements of expression level for 98% of the putative mycobacterial ORFs, among which 5.2% were significantly changed comparing the logarithmic to stationary growth phase. Additionally, analysis of expression levels of a subset of genes by real time PCR technology revealed an agreement in expression of 90% of the examined genes when genomic DNA normalization was applied instead of 29‐68% agreement when RNA normalization was used to measure the expression levels in the same set of RNA samples. Further examination of microarray expression levels displayed clusters of genes differentially expressed between the logarithmic, early stationary and late stationary growth phases. We conclude that genomic DNA standards offer advantages over conventional RNA normalization procedures and can be adapted for the investigation of microbial genomes.
TL;DR: SNOMAD is a collection of algorithms for the normalization and standardization of gene expression datasets derived from diverse biological and technological sources which correct for bias and variance which are non-uniformly distributed across the range of microarray element signal intensities.
Abstract: SNOMAD is a collection of algorithms for the normalization and standardization of gene expression datasets derived from diverse biological and technological sources. In addition to conventional transformations and visualization tools, SNOMAD includes two non-linear transformations which correct for bias and variance which are non-uniformly distributed across the range of microarray element signal intensities: (1). Local mean normalization; and (2). Local variance correction (Z-score generation using a locally calculated standard deviation).
TL;DR: In this paper, the authors propose a method that allows the normalization of traffic data that is simultaneously transferred to a network intrusion detection system (NIDS) and monitored end-systems located in a network, such as a TCP/IP network, in which packets of data such as IP datagrams, are fragmented and reassembled.
Abstract: A method that allows the normalization of traffic data that is simultaneously transferred to a network intrusion detection system (NIDS) and monitored end-systems located in a network, such as a TCP/IP network, in which packets of data such as IP datagrams, are fragmented and reassembled. Accordingly, the information of received fragments and/or the topology of the network comprising the network intrusion detection system (NIDS) and the monitored end-systems are entered into a normalization table, that is dynamically established and maintained. Subsequently packets of data such as IP datagrams are modified, redirected or discarded in case that ambiguities are detected when comparing information contained in the normalization table with information contained in the headers of the received data packets.
TL;DR: In this article, the authors measured the shear variance on scales ranging from 07 to 14, with a detection significance greater than 3.8 σ, and measured the normalization of the matter power spectrum to be σ8 = (0.94 ± 0.14)
Abstract: Weak lensing by large-scale structure provides a direct measurement of matter fluctuations in the universe. We report a measurement of this "cosmic shear" based on 271 Wide Field Planetary Camera 2 archival images from the Hubble Space Telescope Medium Deep Survey. Our measurement method and treatment of systematic effects were discussed in an earlier paper. We measure the shear variance on scales ranging from 07 to 14, with a detection significance greater than 3.8 σ. This allows us to measure the normalization of the matter power spectrum to be σ8 = (0.94 ± 0.10 ± 0.14)(0.3/Ωm)0.44(0.21/Γ)0.15, in a ΛCDM universe. The first 1 σ error includes statistical errors only, while the latter also includes (Gaussian) cosmic variance and the uncertainty in the galaxy redshift distribution. Our results are consistent with earlier cosmic shear measurements from the ground and from space. We compare our cosmic shear results and those from other groups to the normalization from cluster abundance and galaxy surveys. We find that the combination of four recent cosmic shear measurements are somewhat inconsistent with the recent normalization using these methods and discuss possible explanations for the discrepancy.
TL;DR: The rapid transition to effective normalization at low contrasts suggested cooperativity in the normalization, and a model embodying such a cooperative step provided a good account of the data.
Abstract: Contrast normalization is a process whereby responses of neurons are scaled according to the total amount of contrast in a region of the image nearby the receptive field of a neuron. This process allows neurons to code for informative scene or object attributes in a manner unaffected by changes in illumination. Evidence for normalization is seen in striate and extrastriate cortex from experiments where multiple stimuli are presented with a single receptive field (RF). Neuronal responses in such experiments are smaller than that predicted by linear summation, revealing the presence of normalization. While the presence of normalization is often clear, its mechanism is less so. To study the mechanism of normalization, we measured the interaction between pairs of brief local stimuli (spatial Gabor functions) within the RFs of cells in the middle temporal (MT or V5) area of monkeys and varied both the location and contrast of the stimuli. We found response summed approximately linearly when contrast was low but rapidly became normalized as stimulus contrast increased. The rapid transition to effective normalization at low contrasts suggested cooperativity in the normalization, and a model embodying such a cooperative step provided a good account of our data.
TL;DR: A method of automatically generating training data for Maximum Entropy (ME) modeling of abbreviations and acronyms is demonstrated and it is shown that using ME modeling is a promising technique for abbreviation and acronym normalization.
Abstract: Text normalization is an important aspect of successful information retrieval from medical documents such as clinical notes, radiology reports and discharge summaries. In the medical domain, a significant part of the general problem of text normalization is abbreviation and acronym disambiguation. Numerous abbreviations are used routinely throughout such texts and knowing their meaning is critical to data retrieval from the document. In this paper I will demonstrate a method of automatically generating training data for Maximum Entropy (ME) modeling of abbreviations and acronyms and will show that using ME modeling is a promising technique for abbreviation and acronym normalization. I report on the results of an experiment involving training a number of ME models used to normalize abbreviations and acronyms on a sample of 10,000 rheumatology notes with ~89% accuracy.
TL;DR: The findings suggest that the functional variability is much larger than the anatomical one and that precise alignment of anatomical features has low influence on the resulting intersubject functional maps.
TL;DR: Graphically and numerically, it was noted that normalization tended to modify the continuous relative phase curve configuration, which tended to neglect the nonlinear forces acting on the system since it did not maintain the aspect ratio of the phase plot.
TL;DR: In this paper, the authors demonstrate a method of automatically generating training data for maximum entropy (ME) modeling of abbreviations and acronyms and show that using ME modeling is a promising technique for abbreviation and acronym normalization.
Abstract: Text normalization is an important aspect of successful information retrieval from medical documents such as clinical notes, radiology reports and discharge summaries. In the medical domain, a significant part of the general problem of text normalization is abbreviation and acronym disambiguation. Numerous abbreviations are used routinely throughout such texts and knowing their meaning is critical to data retrieval from the document. In this paper I will demonstrate a method of automatically generating training data for Maximum Entropy (ME) modeling of abbreviations and acronyms and will show that using ME modeling is a promising technique for abbreviation and acronym normalization. I report on the results of an experiment involving training a number of ME models used to normalize abbreviations and acronyms on a sample of 10,000 rheumatology notes with ~89% accuracy.
TL;DR: Incorrect versions of Figures 5 and 6 containing normalization errors were accidentally published by Borcherdt (2002) and should be replaced with the figures shown here.
Abstract: Incorrect versions of Figures 5 and 6 containing normalization errors were accidentally published by Borcherdt (2002). They should be replaced with the figures shown here. The text and tabulated regression values published in Borcherdt (2002) …
TL;DR: The tool described in this report was developed in the R statistical language and is freely available on the Internet as part of a larger gene expression analysis package and allows the easy use of the local mean normalization tool, without programming expertise or downloading of additional software.
Abstract: Here we present a methodology for the normalization of element signal intensities to a mean intensity calculated locally across the surface of a DNA microarray. These methods allow the detection and/or correction of spatially systematic artifacts in microarray data. These include artifacts that can be introduced during the robotic printing, hybridization, washing, or imaging of microarrays. Using array element signal intensities alone, this local mean normalization process can correct for such artifacts because they vary across the surface of the array. The local mean normalization can be usedfor quality control and data correction purposes in the analysis of microarray data. These algorithms assume that array elements are not spatially ordered with regard to sequence or biological function and require that this spatial mapping is identical between the two sets of intensities to be compared. The tool described in this report was developed in the R statistical language and is freely available on the Internet as part of a larger gene expression analysis package. This Web implementation is interactive and user-friendly and allows the easy use of the local mean normalization tool described here, without programming expertise or downloading of additional software.
TL;DR: It is found that robust categorical adjustments outperform the ones based on a precisely defined stochastic model, including some commonly used procedures.
Abstract: Motivation: Existing analyses of microarray data often incorporate an obscure data normalization procedure applied prior to data analysis. For example, ratios of microarray channels intensities are normalized to have common mean over the set of genes. We made an attempt to understand the meaning of such procedures from the modeling point of view, and to formulate the model assumptions that underlie them. Given a considerable diversity of data adjustment procedures, the question of their performance, comparison and ranking for various microarray experiments was of interest. Results: A two-step statistical procedure is proposed: data transformation (adjustment for slide-specific effect) followed by a statistical test applied to transformed data. Various methods of analysis for differential expression are compared using simulations and real data on colon cancer cell lines. We found that robust categorical adjustments outperform the ones based on a precisely defined stochastic model, including some commonly used procedures. Availability: A program implementing the proposed adjustment and test procedures is available at http://www.hci.
TL;DR: In this article, the authors present a systematic theoretical and experimental investigation on the accuracy of thermal diffusivity α and thermal effusivity e of liquids measured by the photopyroelectric (PPE) method in backdetection configuration (BPPE).
Abstract: We present a systematic theoretical and experimental investigation on the accuracy of thermal diffusivity α and thermal effusivity e of liquids measured by the photopyroelectric (PPE) method in back-detection configuration (BPPE). Special cases corresponding to different cell structures are analyzed in terms of error determination of α and e for water and ethylene glycol. We propose a new normalization procedure allowing for estimation of these parameters with accuracy of 2% on α and 5% on e over extended frequency range. The normalization eliminates the frequency-dependent influence of the transducer impedance and associated electronics, reduces the errors due to coupling fluid between cell components, and reduces the number of temperature-dependent parameters that must be known in order to characterize the sample. Technical solutions for improving the performances are suggested. Another goal of the study was to demonstrate the possibility of the BPPE method to yield small variations of thermal parameter...
TL;DR: The results suggest that when optimal normalization parameters are used, anatomical landmarks in the medial temporal lobes are colocalized to within a standard deviation of about 1 mm, and Interestingly the optimal parameters are those that provide a rather constrained normalization as opposed to those that optimize intensity matching at the expense of rendering the warps "unlikely."
TL;DR: If the global component of the transform domain LMS is also made time-variable, depending on the output error, the speed of convergence can be significantly improved.
Abstract: We introduce a new transform domain (least mean square) LMS algorithm with variable step. The existing approaches use different time-variable step-sizes for each filter tap. The step-sizes are time-variable due to the power estimates of each transform coefficient. In our new approach, for each step-size we define a local component that is given by the power normalization, and a global component that is the same for each filter coefficient. We show that if the global component is also made time-variable, depending on the output error, the speed of convergence can be significantly improved.
TL;DR: In this article, it is shown that the theoretical basis of the successful Tsallis' generalized exponential distribution shows some worrying properties with the conventional normalization and the escort probability, and that these theoretical difficulties may be avoided by introducing an incomplete normalization allowing to deduce the generalized distribution in a more convincing and consistent way.
Abstract: We comment on some open questions and theoretical peculiarities in Tsallis nonextensive statistical mechanics. It is shown that the theoretical basis of the successful Tsallis' generalized exponential distribution shows some worrying properties with the conventional normalization and the escort probability. These theoretical difficulties may be avoided by introducing an so called incomplete normalization allowing to deduce Tsallis' generalized distribution in a more convincing and consistent way.
TL;DR: In this paper, the authors investigated the properties of the spectral properties of all the configurations of the MSSM spectrum in all 175 models and found that only twenty patterns of representations were found to be without anomalous U(1).
TL;DR: A globalization system for processing data in a multiple-locale or multilingual environment is described in this article, where a range of functionality is provided through various classes and interfaces that can be associated with subsystems running in the environment.
Abstract: A globalization system for processing data in a multiple-locale or multilingual environment. A range of functionality is provided through various classes and interfaces that can be associated with subsystems running in the environment. These interfaces and classes provide for the development of subsystems (i.e., applications, servers, adapters, and so forth) that are independent of knowledge of languages and data formats of a particular locale. A locale associated with various data can be used to dynamically configure information. A normalization capability is also provided for standardizing the representation of data coming into a processing domain from various locales.
TL;DR: In this article, the authors investigated the accuracy of spatial basis function normalization using anatomical landmarks to determine how precisely homologous regions are colocalized and found that when optimal normalization parameters are used, anatomical landmarks in the medial temporal lobes are coocalized to within a standard deviation of about 1 mm.
Abstract: We investigated the accuracy of spatial basis function normalization using anatomical landmarks to determine how precisely homologous regions are colocalized. We examined precision in terms of: (1) the number of nonlinear basis functions used by the normalization procedure; (2) the degree of (Bayesian) regularization; and (3) the effect of substituting different templates and how this interacted with the number of basis functions. The face validity of spatial normalization was assessed as a function of these parameters, using the colocalization of homologous landmarks in a test sample of 20 normally developing children and 5 children with bilateral hippocampal pathology. Our results suggest that when optimal normalization parameters are used, anatomical landmarks in the medial temporal lobes are colocalized to within a standard deviation of about 1 mm. When suboptimal parameters are used this standard deviation can increase up to 3 mm. Interestingly the optimal parameters are those that provide a rather constrained normalization as opposed to those that optimize intensity matching at the expense of rendering the warps "unlikely." The implications of our results, for users of voxel-based morphometry, are discussed.
TL;DR: In this article, it was shown that any analytic vector field which is integrable in the non-Hamiltonian sense admits a local convergent Poincare-Dulac normalization.
Abstract: We show that, to find a Poincare-Dulac normalization for a vector field is the same as to find and linearize a torus action which preserves the vector field. Using this toric characterization and other geometrical arguments, we prove that any local analytic vector field which is integrable in the non-Hamiltonian sense admits a local convergent Poincare-Dulac normalization. These results generalize the main results of our previous paper from the Hamiltonian case to the non-Hamiltonian case. Similar results are presented for the case of isochore vector fields.
TL;DR: A new score normalization technique in Automatic Speaker Verification (ASV): the D-Norm, based on the use of Kullback-Leibler distances in an ASV context, which is comparable to that of the Z-Norm.
Abstract: In this paper, we propose a new score normalization technique in Automatic Speaker Verification (ASV): the D-Norm. The main advantage of this score normalization is that it does not need any additional speech data nor external speaker population, as opposed to the state-of-the-art approaches. The D-Norm is based on the use of Kullback-Leibler (KL) distances in an ASV context. In a first step. we estimate the KL distances with a Monte-Carlo method and we experimentally show that they are correlated with the verification scores. In a second step, we use this correlation to implement a score normalization procedure, the D-Norm. We analyse its performance and we compare it to that of a conventional normalization, the Z-Norm. The results show that performance of the D-Norm is comparable to that of the Z-Norm. We then conclude about the results we obtain and we discuss the applications of this work.
TL;DR: A nonlinear method for the normalization, i.e. neural network normalization (N3) approach, of cDNA microarray experiments in the community of bioinformatics is proposed, which can obtain much better normalization performance of c DNA microarray data than current existing approaches.
Abstract: In microarray experiments, there are a variety of systematic errors to affect the measured gene expression levels. Although a number of algorithms were proposed for the normalization of different types of cDNA microarray data, they have encountered many difficulties due to the complex nonlinear sources of systematic error. In this case, a nonlinear normalization method is of great potential to deal with this difficult problem. The paper first proposes a nonlinear method for the normalization, i.e. neural network normalization (N3) approach, of cDNA microarray experiments in the community of bioinformatics. By utilizing the instinctive nonlinear processing ability of neural networks, N3 is able to balance the complex nonlinear dependence between two different dyed channels in cDNA microarray experiments. In such a way, we can obtain much better normalization performance of cDNA microarray data than current existing approaches. Several experiments are conducted to illustrate the validation of our proposed methods in detail.
TL;DR: In this article, a supergravity discussion of two-and three-point correlators involving these bound states is presented, and the analysis in cases (1) is valid for general N, while (4) is a large-N approximation.
Abstract: In a recent paper by Ryzhov [1], ¼-BPS chiral primaries were constructed in the fully interacting four dimensional = 4 Super-Yang-Mills theory with gauge group SU(N). These operators are annihilated by four supercharges, and at order g2 have protected scaling dimension and normalization. Here, we compute three-point functions involving these ¼-BPS operators along with ½-BPS operators. The combinatorics of the problem is rather involved, and we consider the following special cases: (1) correlators ½½BPS of two ½-BPS primaries with an arbitrary chiral primary; (2) certain classes of ½¼¼ and ¼¼¼ three-point functions; (3) three-point functions involving the Δ ≤ 7 operators found in [1]; (4) ½¼¼ correlators with the special ¼ made of single and double trace operators only. The analysis in cases (1)--(3) is valid for general N, while (4) is a large-N approximation. Order g2 corrections to all three-point functions considered in this paper are found to vanish. In the AdS/CFT correspondence, ¼-BPS chiral primaries are dual to threshold bound states of elementary supergravity excitations. We present a supergravity discussion of two- and three-point correlators involving these bound states.