TL;DR: In this article, a measure based on confidence ellipsoids is developed for judging the contribution of each data point to the determination of the least squares estimate of the parameter vector in full rank linear regression models.
Abstract: A new measure based on confidence ellipsoids is developed for judging the contribution of each data point to the determination of the least squares estimate of the parameter vector in full rank linear regression models. It is shown that the measure combines information from the studentized residuals and the variances of the residuals and predicted values. Two examples are presented.
TL;DR: In this article, it was shown that in multiple regression the quantity has a t distribution with n − p − 1 degrees of freedom, where τn is the nth studentized residual.
Abstract: It is shown that in multiple regression the quantity has a t distribution with n − p − 1 degrees of freedom, where τn is the nth studentized residual. The effects on the residuals and the sum of the squared residuals by adding a new data point to a multiple regression problem are investigated.
TL;DR: In this article, the authors compare the two most popular linear regression measures for linear regression, Cook's (1977) Di and Belsley, Kuh and Welsch's (1980) DFFITS, and Weisberg's (1982) likelihood displacement, using the likelihood displacement as a unifying concept.
Abstract: The young field of statistical diagnostics has produced an array of competing statistics for measuring the influence of individual cases. Two of the most popular measures for linear regression are Cook's (1977) Di and Belsley, Kuh and Welsch's (1980) DFFITS.. Using the likelihood displacement (Cook and Weisberg 1982) as a unifying concept, these two measures are compared.
TL;DR: Researchers working with large national health surveys such as the NLSCY and the National Population Health Survey are advised to include a detailed influence analysis before any final conclusions are made.
Abstract: This paper highlights the impact of survey weights on model fit in multiple linear regression with specific reference to the National Longitudinal Survey of Children and Youth (NLSCY) and provides recommendations for the treatment of influential observations. Multiple linear regression was used to estimate the association between child and family factors in the preschool years and vocabulary development at school age. Analyses were performed with and without survey weights. The model fit was assessed by examining the distribution of the studentized residuals and the change in the regression coefficients that would occur if an observation were removed. Two summary measures of influence, Dffits and Cook's D are reported. The models were refit excluding influential observations. Weighting of the linear model resulted in previously non-influential observations having an undue influence on the estimation of the regression parameters in the weighted model. The influential observations were driven primarily by the size of the survey weight as opposed to unusual values of x and y. Researchers working with large national health surveys such as the NLSCY and the National Population Health Survey (NPHS) are advised to include a detailed influence analysis before any final conclusions are made.