TL;DR: Using Akaike's information criterion, three examples of statistical data are reanalyzed and show reasonably definite conclusions in this paper, one is concerned with the multiple comparison problem for the means in normal populations.
Abstract: Using Akaike's information criterion, three examples of statistical data are reanalyzed and show reasonably definite conclusions. One is concerned with the multiple comparison problem for the means in normal populations. The second is concerned with the grouping of the categories in a contingency table. The third is concerned with the multiple comparison problem for the analysis of variance by the iogit model in contingency tables, Finite correction of Akaike's information criterionis also proposed.
TL;DR: It is shown that stepwise regression allows models containing significant predictors to be obtained from each year's data, and that the significance of the selected models vary substantially between years and suggest patterns that are at odds with those determined by analysing the full, 4-year data set.
Abstract: 1. The biases and shortcomings of stepwise multiple regression are well established within the statistical literature. However, an examination of papers published in 2004 by three leading ecological and behavioural journals suggested that the use of this technique remains widespread: of 65 papers in which a multiple regression approach was used, 57% of studies used a stepwise procedure. 2. The principal drawbacks of stepwise multiple regression include bias in parameter estimation, inconsistencies among model selection algorithms, an inherent (but often overlooked) problem of multiple hypothesis testing, and an inappropriate focus or reliance on a single best model. We discuss each of these issues with examples. 3. We use a worked example of data on yellowhammer distribution collected over 4 years to highlight the pitfalls of stepwise regression. We show that stepwise regression allows models containing significant predictors to be obtained from each year's data. In spite of the significance of the selected models, they vary substantially between years and suggest patterns that are at odds with those determined by analysing the full, 4-year data set. 4. An information theoretic (IT) analysis of the yellowhammer data set illustrates why the varying outcomes of stepwise analyses arise. In particular, the IT approach identifies large numbers of competing models that could describe the data equally well, showing that no one model should be relied upon for inference.
TL;DR: It is seen that the optimal number of parameters suggested by both single-sample and two-sample cross-validation indices will depend on sample size.
Abstract: Many different methods have been proposed to construct nonparametric estimates of a smooth regression function, including local polynomial, (convolution) kernel and smoothing spline estimators. Each of these estimators uses a smoothing parameter to control the amount of smoothing performed on a given data set. In this paper an improved version of a criterion based on the Akaike information criterion (AIC), termed AICC, is derived and examined as a way to choose the smoothing parameter. Unlike plug‐in methods, AICC can be used to choose smoothing parameters for any linear smoother, including local quadratic and smoothing spline estimators. The use of AICC avoids the large variability and tendency to undersmooth (compared with the actual minimizer of average squared error) seen when other ‘classical’ approaches (such as generalized cross‐validation (GCV) or the AIC) are used to choose the smoothing parameter. Monte Carlo simulations demonstrate that the AICC‐based smoothing parameter is competitive with a plug‐in method (assuming that one exists) when the plug‐in method works well but also performs well when the plug‐in approach fails or is unavailable.