TL;DR: In this paper, the authors developed a new approach to the problem of testing the existence of a level relationship between a dependent variable and a set of regressors, when it is not known with certainty whether the underlying regressors are trend- or first-difference stationary.
TL;DR: This article showed that correlation is not related to market volatility per se but to the market trend and that correlation increases in bear markets, but not in bull markets, and they also showed that the distribution of extreme correlation for a wide class of return distributions can be derived using extreme value theory.
Abstract: Testing the hypothesis that international equity market correlation increases in volatile times is a difficult exercise and misleading results have often been reported in the past because of a spurious relationship between correlation and volatility. Using “extreme value theory” to model the multivariate distribution tails, we derive the distribution of extreme correlation for a wide class of return distributions. Empirically, we reject the null hypothesis of multivariate normality for the negative tail, but not for the positive tail. We also find that correlation is not related to market volatility per se but to the market trend. Correlation increases in bear markets, but not in bull markets. INTERNATIONAL EQUITY MARKET CORRELATION has been widely studied. Previous studies 1 suggest that correlation is larger when focusing on large absolutevalue returns, and that this seems more important in bear markets. The conclusion that international correlation is much higher in periods of volatile markets ~large absolute returns! has indeed become part of the accepted wisdom among practitioners and the financial press. However, one should exert great care in testing such a proposition. The usual approach is to condition the estimated correlation on the observed ~or ex post! realization of market returns. Unfortunately correlation is a complex function of returns and such tests can lead to wrong conclusions, unless the null hypothesis and
TL;DR: The high volume and often contradictory nature5 of medical research findings, however, is not only because of publication bias, but also because of the widespread misunderstanding of the nature of statistical significance.
Abstract: The findings of medical research are often met with considerable scepticism, even when they have apparently come from studies with sound methodologies that have been subjected to appropriate statistical analysis. This is perhaps particularly the case with respect to epidemiological findings that suggest that some aspect of everyday life is bad for people. Indeed, one recent popular history, the medical journalist James Le Fanu's The Rise and Fall of Modern Medicine , went so far as to suggest that the solution to medicine's ills would be the closure of all departments of epidemiology.1
One contributory factor is that the medical literature shows a strong tendency to accentuate the positive; positive outcomes are more likely to be reported than null results.2–4 By this means alone a host of purely chance findings will be published, as by conventional reasoning examining 20 associations will produce one result that is “significant at P=0.05” by chance alone. If only positive findings are published then they may be mistakenly considered to be of importance rather than being the necessary chance results produced by the application of criteria for meaningfulness based on statistical significance. As many studies contain long questionnaires collecting information on hundreds of variables, and measure a wide range of potential outcomes, several false positive findings are virtually guaranteed. The high volume and often contradictory nature5 of medical research findings, however, is not only because of publication bias. A more fundamental problem is the widespread misunderstanding of the nature of statistical significance.
#### Summary points
P values, or significance levels, measure the strength of the evidence against the null hypothesis; the smaller the P value, the stronger the evidence against the null hypothesis
An arbitrary division of results, into “significant” or “non-significant” according to the P value, was not the intention of the …
TL;DR: In this article, the authors compare this technique to the standard method of testing significance under the common assumptions of consistency, normality, and asymptotic independence of the estimates.
Abstract: To judge whether the difference between two point estimates is statistically significant, data analysts often examine the overlap between the two associated confidence intervals. We compare this technique to the standard method of testing significance under the common assumptions of consistency, asymptotic normality, and asymptotic independence of the estimates. Rejection of the null hypothesis by the method of examining overlap implies rejection by the standard method, whereas failure to reject by the method of examining overlap does not imply failure to reject by the standard method. As a consequence, the method of examining overlap is more conservative (i.e., rejects the null hypothesis less often) than the standard method when the null hypothesis is true, and it mistakenly fails to reject the null hypothesis more frequently than does the standard method when the null hypothesis is false. Although the method of examining overlap is simple and especially convenient when lists or graphs of confidence int...
TL;DR: An information-theoretic paradigm for analysis of ecological data, based on Kullback–Leibler information, that is an extension of likelihood theory and avoids the pitfalls of null hypothesis testing is described.
Abstract: We describe an information-theoretic paradigm for analysis of ecological data, based on Kullback–Leibler information, that is an extension of likelihood theory and avoids the pitfalls of null hypothesis testing. Information-theoretic approaches emphasise a deliberate focus on the a priori science in developing a set of multiple working hypotheses or models. Simple methods then allow these hypotheses (models) to be ranked from best to worst and scaled to reflect a strength of evidence using the likelihood of each model (gi), given the data and the models in the set (i.e. L(gi | data)). In addition, a variance component due to model-selection uncertainty is included in estimates of precision. There are many cases where formal inference can be based on all the models in the a priori set and this multi-model inference represents a powerful, new approach to valid inference. Finally, we strongly recommend inferences based on a priori considerations be carefully separated from those resulting from some form of data dredging. An example is given for questions related to age- and sex-dependent rates of tag loss in elephant seals (Mirounga leonina).
TL;DR: In this article, it is argued that a sound and natural approach to such tests must rely primarily on the out-of-sample forecasting performance of models relating the original (non-prewhitened) series of interest.
Abstract: This paper is concerned with testing for causation, using the Granger definition, in a bivariate time-series context. It is argued that a sound and natural approach to such tests must rely primarily on the out-of-sample forecasting performance of models relating the original (non-prewhitened) series of interest. A specific technique of this sort is presented and employed to investigate the relation between aggregate advertising and aggregate consumption spending. The null hypothesis that advertising does not cause consumption cannot be rejected, but some evidence suggesting that consumption may cause advertising is presented.
TL;DR: An integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard.
Abstract: Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes an integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard. The combined numeric and graphic tests of statistical difference, equivalence, and indeterminacy are designed to avoid common interpretive problems associated with NHST procedures. Multiple comparisons, power, sample size, test reliability, effect size, and cause-effect ratio are discussed. A section on the proper interpretation of confidence intervals is followed by a decision rule summary and caveats.
TL;DR: In the newly emerging discipline of macroecology, null models can be used to identify constraining boundaries in bivariate scatterplots of variables such as body size, range size, and population density.
Abstract: Null models are pattern-generating models that deliberately exclude a mechanism of interest, and allow for randomization tests of ecological and biogeographic data. Although they have had a controversial history, null models are widely used as statistical tools by ecologists and biogeographers. Three active research fronts in null model analysis include biodiversity measures, species co-occurrence patterns, and macroecology. In the analysis of biodiversity, ecologists have used random sampling procedures such as rarefaction to adjust for differences in abundance and sampling effort. In the analysis of species co-occurrence and assembly rules, null models have been used to detect the signature of species interactions. However, controversy persists over the details of computer algorithms used for randomizing presence-absence matrices. Finally, in the newly emerging discipline of macroecology, null models can be used to identify constraining boundaries in bivariate scatterplots of variables such as body size, range size, and population density. Null models provide specificity and flexibility in data analysis that is often not possible with conventional statistical tests.
TL;DR: This article used data on over 150 000 Chilean eighth-graders to compare Spanish and mathematics achievement in six types of public and private schools, including voucher schools operated by Catholic and non-religious institutions.
Abstract: In 1980, Chile began financing public and most private schools with vouchers. This paper uses 1997 data on over 150 000 Chilean eighth-graders to compare Spanish and mathematics achievement in six types of public and private schools, including voucher schools operated by Catholic and non-religious institutions. Initial findings suggest that Catholic voucher schools have a small advantage over most public schools, once student and peer attributes are controlled for. There is no important difference in achievement between public and non-religious voucher schools, most of which were created in direct response to the 1980 reforms. In some cases, it appears that non-religious voucher schools produce slightly lower achievement than public schools. Accounting for selection bias reduces any private school advantages (or widens their disadvantages), although these estimates are not sufficiently precise to convincingly reject the null hypothesis of no selection bias.
TL;DR: In this paper, the authors investigated the null hypothesis that the dependence between financial assets can be modeled by the Gaussian copula and found that most pairs of currencies and pairs of major stocks are compatible with this hypothesis.
Abstract: Using one of the key property of copulas that they remain invariant under an arbitrary monotonous change of variable, we investigate the null hypothesis that the dependence between financial assets can be modeled by the Gaussian copula. We find that most pairs of currencies and pairs of major stocks are compatible with the Gaussian copula hypothesis, while this hypothesis can be rejected for the dependence between pairs of commodities (metals). Notwithstanding the apparent qualification of the Gaussian copula hypothesis for most of the currencies and the stocks, a non-Gaussian copula, such as the Student's copula, cannot be rejected if it has sufficiently many ``degrees of freedom''. As a consequence, it may be very dangerous to embrace blindly the Gaussian copula hypothesis, especially when the correlation coefficient between the pair of asset is too high as the tail dependence neglected by the Gaussian copula can be as large as 0.6, i.e., three out five extreme events which occur in unison are missed.
TL;DR: This work provides an intuitive overview of how to apply the bootstrap and multiple imputations techniques, referring to existing theoretical literature and various applied examples to illustrate both their possibilities and their pitfalls.
Abstract: When an applied econometrician calculates regression coefficients or other statistics based on a data sample, there is a moment of truth when the statistical precision or reliability of the estimates is evaluated using critical values of relevant probability distributions. If statistical precision is underestimated, producing confidence intervals that are too wide, the researcher may falsely conclude that no useful or reliable evidence has been provided. Since statistically imprecise results often are regarded as noninformative and may never see the light of day, applied econometricians have a strong incentive to obtain tight confidence bands. But overestimation of statistical precision also is a concern, as it potentially causes rejection of a null hypothesis when in fact no statistically reliable evidence has been presented. Such results are misleading and provide poor guidance for future research and perhaps public policy as well. This article describes two techniques that have been developed by statisticians in the last 20 years to enhance the accuracy of estimated confidence bands and critical values: the bootstrap and multiple imputations. These are computationally intensive methods that rely on repeated sampling from empirical data sets and associated estimates. As such, their development and dissemination have been supported by substantial increases in computing power over the same period. These techniques often can provide accurate estimates of statistical precision when standard analytical estimates are biased or, in some cases, unavailable. Both already
TL;DR: The authors generalize from a sample to a population, which is a product of empirical inquiry, but the process by which the data are selected introduces uncertainty, and the conclusions would have been different if the dataset had been different, and so would the conclusions, at least by a little.
Abstract: Researchers who study punishment and social control, like those who study other social phenomena, typically seek to generalize their findings from the data they have to some larger context: in statistical jargon, they generalize from a sample to a population. Generalizations are one important product of empirical inquiry. Of course, the process by which the data are selected introduces uncertainty. Indeed, any given dataset is but one of many that could have been studied. If the dataset had been different, the statistical summaries would have been different, and so would the conclusions, at least by a little.
TL;DR: This work proposes a method which involves recalculating the target sample size by computing the number of further observations required to maintain the probability of rejecting the null hypothesis at the end of the study under the prespecified absolute difference in mean response conditional on the data observed so far.
Abstract: The sample size required to achieve a given power at a prespecified absolute difference in mean response may depend on one or more nuisance parameters, which are usually unknown. Proposed methods for using an internal pilot to recalculate the sample size using estimates of these parameters have been well studied. Most of these methods ignore the fact that data on the parameter of interest from within this internal pilot will contribute towards the value of the final test statistic. We propose a method which involves recalculating the target sample size by computing the number of further observations required to maintain the probability of rejecting the null hypothesis at the end of the study under the prespecified absolute difference in mean response conditional on the data observed so far. We do this within the framework of a two-group error-spending sequential test, modified so as to prevent inflation of the type I error rate.
TL;DR: In this paper, the authors assess the published literature on various strategies such as (1) meta-analysis to combine disparate information from several studies including Bayesian techniques as in the confidence profile method and (2) other alternatives such as assessing therapeutic results in a single treated population (e.g., astronauts) by sequentially measuring whether the intervention is falling above or below a preestablished probability outcome range and meeting predesigned specifications as opposed to incremental improvement.
Abstract: Clinical trials are used to elucidate the most appropriate preventive, diagnostic, or treatment options for individuals with a given medical condition. Perhaps the most essential feature of a clinical trial is that it aims to use results based on a limited sample of research participants to see if the intervention is safe and effective or if it is comparable to a comparison treatment. Sample size is a crucial component of any clinical trial. A trial with a small number of research participants is more prone to variability and carries a considerable risk of failing to demonstrate the effectiveness of a given intervention when one really is present. This may occur in phase I (safety and pharmacologic profiles), II (pilot efficacy evaluation), and III (extensive assessment of safety and efficacy) trials. Although phase I and II studies may have smaller sample sizes, they usually have adequate statistical power, which is the committee's definition of a "large" trial. Sometimes a trial with eight participants may have adequate statistical power, statistical power being the probability of rejecting the null hypothesis when the hypothesis is false. Small Clinical Trials assesses the current methodologies and the appropriate situations for the conduct of clinical trials with small sample sizes. This report assesses the published literature on various strategies such as (1) meta-analysis to combine disparate information from several studies including Bayesian techniques as in the confidence profile method and (2) other alternatives such as assessing therapeutic results in a single treated population (e.g., astronauts) by sequentially measuring whether the intervention is falling above or below a preestablished probability outcome range and meeting predesigned specifications as opposed to incremental improvement.
TL;DR: This report presents the development of a rigorous statistical procedure for use with a previously reported graphical method, the P plot, for estimation of the number of "true" null hypotheses in the set, which can then be used to sharpen existing multiple comparison procedures.
TL;DR: In this article, the authors argue that psychology, overtly or covertly, reached wrong conclusions with regard to commonsense beliefs and took a long time to correct its course because researchers erred in overgeneralizing from null findings.
Abstract: When psychologists test a commonsense (CS) hypothesis and obtain no support, they tend to erroneously conclude that the CS belief is wrong. In many such cases it appears, after many years, that the CS hypothesis was valid after all. It is argued that this error of accepting the "theoretical" null hypothesis reflects confusion between the operationalized hypothesis and the theory or generalizatio n that it is designed to test. That is, on the basis of reliable null data one can accept the operationalized null hypothesis (e.g., "A measure of attitude x is not correlated with a measure of behavior y"). In contrast, one cannot generalize from the findings and accept the abstract or theoretical null (e.g., "We know that attitudes do not predict behavior"). The practice of accepting the theoretical null hypothesis hampers research and reduces the trust of the public in psychological research. Many psychologists begin their scientific research by testing a commonsense (CS) belief. Often, though, the CS belief is inconsistent with the research data. After repeated failures to support that CS belief, psychologists appear to generalize the lack of support for their operationalized studies to the abstract or theoretical level. That is, null-hypothesis findings often lead psychologists, overtly or covertly, to suggest that psychology "knows" that the CS belief is wrong. Such data-driven "knowledge" that counters CS beliefs persists in the psychological literature—sometimes, for more than 50 years. Eventually, however, many CS beliefs that were once proclaimed by psychology to be invalid are declared valid after all, albeit with some modification of the CS belief. This observation of the rise, fall, and resurrection of several CS beliefs in psychology sparked the motivation to write this article. We argue that psychology, overtly or covertly, reached wrong conclusions with regard to CS beliefs and took a long time to correct its course because researchers erred in overgeneraliz ing from null findings. To justify our argument, we first analyze the following constructs: a theory, a generalization, and an operationalized hypothesis. Next, we argue that although CS beliefs are not scientific theories, they are likely to contain a valid kernel, which.
TL;DR: The collapse of null hypothesis significance testing as a statistical paradigm has created liabilities and opportunities in wildlife science as discussed by the authors, but the principal intellectual instrument of the scientist remains the research hypothesis, not the statistical hypothesis.
Abstract: The collapse of null hypothesis significance testing as a statistical paradigm has created liabilities and opportunities in wildlife science. One liability is that some formalism for statistical hypothesis testing, such as likelihood with information theory, will simply replace null hypothesis significance testing as a rote approach to wildlife science. The principal intellectual instrument of the scientist remains the research hypothesis, not the statistical hypothesis. Accordingly, 1 opportunity arising from a change of statistical paradigms is that the research hypothesis will move to the foreground of wildlife science, the statistical hypothesis to the background. A second opportunity is a broadening of the suite of inferential methods considered orthodox in wildlife science. Realization of these opportunities should help wildlife scientists resist the social tendency to allow tools (means, statistical models) to supplant the search for reliable knowledge (end) as the benchmark of scientific endeavor. Science of the highest order, including virtually all discoveries that humankind extols today, is possible without the statistical hypothesis, but not without the research hypothesis.
TL;DR: This paper examined the asymptotic and small sample properties of model-based and robust tests of the null hypothesis of no randomized treatment effect based on the partial likelihood arising from an arbitrarily misspecified Cox proportional hazards model.
Abstract: We examine the asymptotic and small sample properties of model-based and robust tests of the null hypothesis of no randomized treatment effect based on the partial likelihood arising from an arbitrarily misspecified Cox proportional hazards model. When the distribution of the censoring variable is either conditionally independent of the treatment group given covariates or conditionally independent of covariates given the treatment group, the numerators of the partial likelihood treatment score and Wald tests have asymptotic mean equal to 0 under the null hypothesis, regardless of whether or how the Cox model is misspecified. We show that the model-based variance estimators used in the calculation of the model-based tests are not, in general, consistent under model misspecification, yet using analytic considerations and simulations we show that their true sizes can be as close to the nominal value as tests calculated with robust variance estimators. As a special case, we show that the model-based log-rank test is asymptotically valid. When the Cox model is misspecified and the distribution of censoring depends on both treatment group and covariates, the asymptotic distributions of the resulting partial likelihood treatment score statistic and maximum partial likelihood estimator do not, in general, have a zero mean under the null hypothesis. Here neither the fully model-based tests, including the log-rank test, nor the robust tests will be asymptotically valid, and we show through simulations that the distortion to test size can be substantial.
TL;DR: This work presents what it believes is a succeeding approach to the testing of Bayesian point null hypotheses on variance component models based on a simple reparameterization of the model in terms of the total variance and the proportion of the additive genetic variance with respect to it.
Abstract: The testing of Bayesian point null hypotheses on variance component models have resulted in a tough assignment for which no clear and generally accepted method exists. In this work we present what we believe is a succeeding approach to such a task. It is based on a simple reparameterization of the model in terms of the total variance and the proportion of the additive genetic variance with respect to it, as well as on the explicit inclusion on the prior probability of a discrete component at origin. The reparameterization was used to bypass an arbitrariness related to the impropriety of uninformative priors onto unbounded variables while the discrete component was necessary to overcome the zero probability assigned to sets of null measure by the usual continuous variable models. The method was tested against computer simulations with appealing results.
TL;DR: In this paper, the authors employed a panel unit root test to test whether the real exchange rates in the panel are mean reverting or not, and found that the null hypothesis of a unit root is rejected for the three real exchange rate indices, namely, the import-based and trade-weighted multilateral indices, and the bilateral indices, while for the export-based indices, the null hypotheses is not rejected.
Abstract: The paper tests whether the theory of Purchasing Power Parity holds in a selected sample of twenty African countries The paper employs a panel unit root test to test whether the real exchange rates in the panel are mean reverting or not The test employed is the Im et al (1997) test Results show that the null of a unit root is rejected for the three real exchange rate indices, namely, the import -based and trade-weighted multilateral indices, and the bilateral indices, while for the exportbased indices, the null hypothesis is not rejected That is, Purchasing Power Parity is confirmed for the import-based and trade-weighted multilateral indices, and the bilateral indices, while it is rejected for the export-based multilateral indices After performing the demeaning adjustment to account for cross-sectional dependence, our results show that the null hypothesis of a unit root is rejected for the importbased multilateral indices and the bilateral indices, while the null is not rejected for the trade-weighted multilateral indices Purchasing Power Parity is therefore only confirmed for the import-based multilateral indices and bilateral indices, while it is rejected for the trade-weighted multilateral indices
TL;DR: The large literature on family-based tests of association and/or linkage is reviewed, concentrating on the underlying principles and on recent methodological developments.
Abstract: The large literature on family-based tests of association and/or linkage is reviewed, concentrating on the underlying principles and on recent methodological developments. We explain the distinction between testing for association and testing for linkage, and give our views on the circumstances in which each is the appropriate null hypothesis.
TL;DR: Uni- and multivariate methods for distinctness and conformity are treated, separately for genetically homogeneous and heterogeneous varieties, as are the relations between morphological, marker and pedigree information.
Abstract: An exhibition is given of old and new statistical procedures for dealing with marker information in the context of distinctness testing and assessing genetic conformity for essential derivation purposes. Conceptual issues are discussed in relation to statistical methods. It is believed that the most important statistical and conceptual difference between distinctness and conformity testing resides in the wording of null and alternative hypotheses. For distinctness testing, the null hypothesis states no difference between varieties, while the alternative implies the existence of a difference. For conformity testing, null and alternative hypothesis are non-equivalence and equivalence, respectively. The reversal of null and alternative hypothesis has rather limited statistical consequences when test statistics are distance measures. Characteristically, morphological characters form the preferred traits for assessing distinctness, while molecular markers are chosen for assessing conformity. From a statistical point of view this difference is rather immaterial. Distinctness and conformity are throughout presented as two closely related concepts, whose assessment takes place by highly comparable statistical procedures. Specific topics that are addressed in the paper are first the present positions of UPOV and ASSINSEL. Subsequently, uni- and multivariate methods for distinctness and conformity are treated, separately for genetically homogeneous and heterogeneous varieties. Lastly, the choice of markers is discussed, as are the relations between morphological, marker and pedigree information.
TL;DR: In this paper, a test of the null hypothesis that an observed time series is a realization of a strictly stationary random process is proposed, based on the result that the kth value of the discrete Fourier transform of a sample frame has a zero mean.
Abstract: We develop a test of the null hypothesis that an observed time series is a realization of a strictly stationary random process. Our test is based on the result that the kth value of the discrete Fourier transform of a sample frame has a zero mean under the null hypothesis. The test that we develop will have considerable power against an important form of nonstationarity hitherto not considered in the mainstream econometric time-series literature, that is, where the mean of a time series is periodic with random variation in its periodic structure. The size and power properties of the test are: investigated and its applicability to real-world problems is demonstrated by application to three: economic data sets.
TL;DR: In this paper, a sequential testing and estimation method for the number of change points in structural-change models is presented, where the null hypothesis that there is no structural change against the alternative of one change is tested.
Abstract: This paper derives a sequential testing and estimation method for the number of change points in structural-change models. In the first step, the parameters are estimated by a one-change model. The null hypothesis that there is no structural change against the alternative of one change is tested. If the null is rejected, then the whole sample is split into two subsamples by using the estimated change point in the previous step as a cutoff point. The same procedure is repeated until the null in each subsample is accepted. We argue that this method can consistently estimate the number, locations and magnitudes of changes. Situations in which the sample splitting method fails are also discussed.
TL;DR: In this paper, the authors suggest an alternative explanation for the mismatch and warn against the use of panel methods for testing for unit roots in macroeconomic time series, which assumes that cross-unit cointegrating or long-run relationships are not present.
Abstract: A common finding in the empirical literature on the validity of purchasing power parity (PPP) is that it holds when tested for in panel data, but not in univariate (i.e. country specific) analysis. The usual explanation for this mismatch is that panel tests for unit roots and cointegration are more powerful than their univariate counterparts. In this paper we suggest an alternative explanation for the mismatch. More generally, we warn against the use of panel methods for testing for unit roots in macroeconomic time series. Existing panel methods assume that cross-unit cointegrating or long-run relationships, that tie the units of the panel together, are not present. However, using empirical examples on PPP for a panel of OECD countries, we show that this assumption is very likely to be violated. Simulations of the properties of panel unit root tests in the presence of long-run cross-unit relationships are then presented to demonstrate the serious cost of assuming away such relationships. The empirical size of the tests is substantially higher than the nominal level, so that the null hypothesis of a unit root is rejected very often, even if correct.
TL;DR: The authors present a revision of the latest contributions of methodologists of different opinions, for and against, and also set out the guidelines to research within behavioral science recently issued by the A.P.A. Task Force in Statistical Inference (Wilkinson, 1999).
Abstract: The judgment against null hypothesis. Many witnesses and a virtuous sentence. Null hypothesis significance testing has been a source of debate within the scientific community of behavioral researchers for years, since inadequate interpretations have resulted in incorrect use of this procedure. In this paper, we present a revision of the latest contributions of methodologists of different opinions, for and against, and we also set out the guidelines to research within behavioral science recently issued by the A.P.A. (American Psychological Association) Task Force in Statistical Inference (Wilkinson, 1999).
TL;DR: A generalized nonparametric test procedure for comparing k types of failure in a competing risks model is proposed in this article, which is based on the test process defined as a set of weighted integrals of the difference between Aalen-estimated cumulative cause-specific hazards and the average of these estimators.
Abstract: In this paper, a generalized nonparametric test procedure for comparing k types of failure in a competing risks model is proposed. The test'procedure is based on the test process defined as a set of weighted integrals of the difference between Aalen-estimated cumulative cause-specific hazards and the average of these estimators. Estimation of the distribution function of the testing process under the null hypothesis is usually the main barrier in developing a test procedure. We introduce a vector of symmetrically inputed processes conditional on the competing risks data and show that the two processes have the same asymptotic distribution under the null hypothesis. This result gives us flexibility in selecting types of statistics and weight functions for various alternatives. Several types of tests including Chi-square type and supremum type for testing the null hypothesis H0 versus various alternative hypotheses are discussed in this paper. With an appro-priate selection of weight function, we also obtai...
TL;DR: This post hoc analysis of ECASS II data was designed to make the least number of a priori assumptions by a bootstrap-based hypothesis test on a non-parametric test statistic and rejected the null hypothesis.
Abstract: The results of the Second European-Australasian Acute Stroke Study (ECASS II) were negative with respect to the primary endpoint. This post hoc analysis of ECASS II data was designed to make the least number of a priori assumptions. This is accomplished by a bootstrap-based hypothesis test on a non-parametric test statistic. No assumptions are made on shape or variance of population distributions and the method does not suffer from the disadvantages of dichotomization. By reducing the number of a priori assumptions, the possibilities to modify the test result by adjusting the test procedure are minimized. Results: If rt-PA does not improve the outcome (null hypothesis), the probability of observing a difference of modified ranking scale equal or larger than the one observed in ECASS II is 0.047. We therefore rejected the null hypothesis.
TL;DR: In this article, the authors developed improved statistical tests for situations satisfying the following two nonstandard conditions simultaneously: (a) some nuisance parameters become unidentified under the null hypothesis, and (b) the alternative hypothesis is restricted in the sense that it has inequality constraints/multiparameter one-sided hypotheses.
Abstract: In this article we develop improved statistical tests for situations satisfying the following two nonstandard conditions simultaneously: (a) Some nuisance parameters become unidentified under the null hypothesis, and (b) the alternative hypothesis is restricted in the sense that it has inequality constraints/multiparameter one-sided hypotheses. In the statistical and econometric literature, inference problems under these two nonstandard conditions have been studied separately but not simultaneously. For example, procedures to deal with the nonstandard condition (a) only have been studied by Bera and Ra and by Andrews and Ploberger; surveys of test procedures to deal with (b) only may be found in the work of Robertson, Wright, and Dykstra. A main contribution of this article is that, by pooling the ideas and insights from both these areas of literature, we develop new tests to deal with (a) and (b) simultaneously. Based on the approach that we take, we would conjecture that our tests should perform better ...