On the Distribution of the Two-Sample Cramer-von Mises Criterion
TL;DR: The Cramer-von Mises criterion for testing whether a sample is drawn from a specified continuous distribution was introduced in this paper. But it is not known whether the criterion can be applied to the case of two samples.
read more
Abstract: The Cramer-von Mises $\omega^2$ criterion for testing that a sample, $x_1, \cdots, x_N$, has been drawn from a specified continuous distribution $F(x)$ is \begin{equation*}\tag{1}\omega^2 = \int^\infty_{-\infty} \lbrack F_N(x) - F(x)\rbrack^2 dF(x),\end{equation*} where $F_N(x)$ is the empirical distribution function of the sample; that is, $F_N(x) = k/N$ if exactly $k$ observations are less than or equal to $x(k = 0, 1, \cdots, N)$. If there is a second sample, $y_1, \cdots, y_M$, a test of the hypothesis that the two samples come from the same (unspecified) continuous distribution can be based on the analogue of $N\omega^2$, namely \begin{equation*}\tag{2} T = \lbrack NM/(N + M)\rbrack \int^\infty_{-\infty} \lbrack F_N(x) - G_M(x)\rbrack^2 dH_{N+M}(x),\end{equation*} where $G_M(x)$ is the empirical distribution function of the second sample and $H_{N+M}(x)$ is the empirical distribution function of the two samples together [that is, $(N + M)H_{N+M}(x) = NF_N(x) + MG_M(x)\rbrack$. The limiting distribution of $N\omega^2$ as $N \rightarrow \infty$ has been tabulated [2], and it has been shown ([3], [4a], and [7]) that $T$ has the same limiting distribution as $N \rightarrow \infty, M \rightarrow \infty$, and $N/M \rightarrow \lambda$, where $\lambda$ is any finite positive constant. In this note we consider the distribution of $T$ for small values of $N$ and $M$ and present tables to permit use of the criterion at some conventional significance levels for small values of $N$ and $M$. The limiting distribution seems a surprisingly good approximation to the exact distribution for moderate sample sizes (corresponding to the same feature for $N\omega^2$ [6]). The accuracy of approximation is better than in the case of the two-sample Kolmogorov-Smirnov statistic studied by Hodges [4].
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The implicit constraints of Fundamental Review of the Trading Book profit-and-loss-attribution testing and a possible alternative framework
TL;DR: In this article, the authors highlight the very strong, implicit constraints embedded in PLA and the generally low probability of conducting a successful PLA test; these results support industry concerns related to the proposed regulatory requirements.
3
Consensus of Clusterings Based on High-Order Dissimilarities
Helena Aidos,Ana Fred +1 more
- 01 Jan 2015
TL;DR: A DID-based algorithm builds upon an initial data partition, different initializations producing different data partitions, and a validation criterion based on DID is presented to select the best final partition, consisting in the estimation of graph probabilities for each cluster based on the DID.
3
Extreme Path Delay Estimation of Critical Paths in Within-Die Process Fluctuations Using Multi-Parameter Distributions
TL;DR: In this article , two multi-parameter distributions, namely the Pearson type IV and metalog distributions, are discussed and suggested as alternatives to the normal distribution for modelling path delay data that determines the maximum clock frequency (FMAX) of a microprocessor or other digital circuit.
Predictive modeling using sparse logistic regression with applications
Tapio Manninen
- 31 Jan 2014
TL;DR: It is shown that a combination of a careful model assessment scheme and automatic feature selection by means of logistic regression model and coefficient regularization create a powerful, yet simple and practical, tool chain for applications of supervised learning and classification.
3
•Posted Content
Financial interaction analysis using best-fitted probability distribution
TL;DR: This article used Monte Carlo simulation on the derived distributions to generate values and impute them into a model or formula that defines the interaction between the variables, obtaining the outcome of their interactions.