About: Data dredging is a research topic. Over the lifetime, 137 publications have been published within this topic receiving 8802 citations. The topic is also known as: data fishing & data snooping.
TL;DR: The examples considered in this paper show the tension between the scientific rationale for using meta-regression and the difficult interpretative problems to which such analyses are prone.
Abstract: SUMMARY Appropriate methods for meta-regression applied to a set of clinical trials, and the limitations and pitfalls in interpretation, are insuciently recognized. Here we summarize recent research focusing on these issues, and consider three published examples of meta-regression in the light of this work. One principal methodological issue is that meta-regression should be weighted to take account of both within-trial variances of treatment eects and the residual between-trial heterogeneity (that is, heterogeneity not explained by the covariates in the regression). This corresponds to random eects meta-regression. The associations derived from meta-regressions are observational, and have a weaker interpretation than the causal relationships derived from randomized comparisons. This applies particularly when averages of patient characteristics in each trial are used as covariates in the regression. Data dredging is the main pitfall in reaching reliable conclusions from meta-regression. It can only be avoided by prespecication of covariates that will be investigated as potential sources of heterogeneity. However, in practice this is not always easy to achieve. The examples considered in this paper show the tension between the scientic rationale for using meta-regression and the dicult interpretative problems to which such analyses are prone. Copyright ? 2002 John Wiley & Sons, Ltd.
TL;DR: It is found that null hypothesis testing is uninformative when no estimates of means or effect size and their precision are given, and an alternative paradigm of data analysis based on Kullback-Leibler information is described.
Abstract: This paper presents a review and critique of statistical null hypothesis testing in ecological studies in general, and wildlife studies in particular, and describes an alternative. Our review of Ecology and the Journal of Wildlife Management found the use of null hypothesis testing to be pervasive. The estimated number of P-values appearing within articles of Ecology exceeded 8,000 in 1991 and has exceeded 3,000 in each year since 1984, whereas the estimated number of P-values in the Journal of Wildlife Management exceeded 8,000 in 1997 and has exceeded 3,000 in each year since 1994. We estimated that 47% (SE = 3.9%) of the P-values in the Journal of Wildlife Management lacked estimates of means or effect sizes or even the sign of the difference in means or other parameters. We find that null hypothesis testing is uninformative when no estimates of means or effect size and their precision are given. Contrary to common dogma, tests of statistical null hypotheses have relatively little utility in science and are not a fundamental aspect of the scientific method. We recommend their use be reduced in favor of more informative approaches. Towards this objective, we describe a relatively new paradigm of data analysis based on Kullback-Leibler information. This paradigm is an extension of likelihood theory and, when used correctly, avoids many of the fundamental limitations and common misuses of null hypothesis testing. Information-theoretic methods focus on providing a strength of evidence for an a priori set of alternative hypotheses, rather than a statistical test of a null hypothesis. This paradigm allows the following types of evidence for the alternative hypotheses: the rank of each hypothesis, expressed as a model; an estimate of the formal likelihood of each model, given the data; a measure of precision that incorporates model selection uncertainty; and simple methods to allow the use of the set of alternative models in making, formal inference. We provide an example of the information-theoretic approach using data on the effect of lead on survival in spectacled eider ducks (Somateria fischeri). Regardless of the analysis paradigm used, we strongly recommend inferences based on a priori considerations be clearly separated from those resulting from some form of data dredging.
TL;DR: An information-theoretic paradigm for analysis of ecological data, based on Kullback–Leibler information, that is an extension of likelihood theory and avoids the pitfalls of null hypothesis testing is described.
Abstract: We describe an information-theoretic paradigm for analysis of ecological data, based on Kullback–Leibler information, that is an extension of likelihood theory and avoids the pitfalls of null hypothesis testing. Information-theoretic approaches emphasise a deliberate focus on the a priori science in developing a set of multiple working hypotheses or models. Simple methods then allow these hypotheses (models) to be ranked from best to worst and scaled to reflect a strength of evidence using the likelihood of each model (gi), given the data and the models in the set (i.e. L(gi | data)). In addition, a variance component due to model-selection uncertainty is included in estimates of precision. There are many cases where formal inference can be based on all the models in the a priori set and this multi-model inference represents a powerful, new approach to valid inference. Finally, we strongly recommend inferences based on a priori considerations be carefully separated from those resulting from some form of data dredging. An example is given for questions related to age- and sex-dependent rates of tag loss in elephant seals (Mirounga leonina).
TL;DR: In this paper, the authors propose to test statistical hypothese in the context of testing statistical hypotheses, and test statistical hypotheses in the setting of statistical hypothesis testing, testing statistical hypothetse.
TL;DR: It is concluded that epidemiological studies with inclusion of all persons in a population followed for decades available relatively fast are important data sources for modern epidemiology, but it is important to acknowledge the data limitations.
Abstract: Studies based on databases, medical records and registers are used extensively today in epidemiological research. Despite the increasing use, no developed methodological literature on use and evaluation of population-based registers is available, even though data collection in register-based studies differs from researcher-collected data, all persons in a population are available and traditional statistical analyses focusing on sampling error as the main source of uncertainty may not be relevant. We present the main strengths and limitations of register-based studies, biases especially important in register-based studies and methods for evaluating completeness and validity of registers. The main strengths are that data already exist and valuable time has passed, complete study populations minimizing selection bias and independently collected data. Main limitations are that necessary information may be unavailable, data collection is not done by the researcher, confounder information is lacking, missing information on data quality, truncation at start of follow-up making it difficult to differentiate between prevalent and incident cases and the risk of data dredging. We conclude that epidemiological studies with inclusion of all persons in a population followed for decades available relatively fast are important data sources for modern epidemiology, but it is important to acknowledge the data limitations.