TL;DR: In this article, the authors derived the distributions of the least-squares residuals under a variety of specification errors, including omitted variables, incorrect functional form, simultaneous equation problems and heteroskedasticity.
Abstract: SUMMARY The effects on the distribution of least-squares residuals of a series of model mis-specifications are considered. It is shown that for a variety of specification errors the distributions of the least-squares residuals are normal, but with non-zero means. An alternative predictor of the disturbance vector is used in developing four procedures for testing for the presence of specification error. The specification errors considered are omitted variables, incorrect functional form, simultaneous equation problems and heteroskedasticity. THE objectives of this paper are two. The first is to derive the distributions of the classical linear least-squares residuals under a variety of specification errors. The errors considered are omitted variables, incorrect functional form, simultaneous equation problems and heteroskedasticity. It is assumed that the disturbance terms are independently and normally distributed. It will be shown that the effect of the specification errors considered above is, with the exception of the error of heteroskedasticity, to yield residuals which though normally distributed do not have zero means, so that the distribution of the squared residuals is non-central x2. The second objective is to derive procedures to test for the presence of the specification errors considered in the first part of the paper. The tests are developed by comparing the distribution of residuals under the hypothesis that the specification of the model is correct to the distribution of the residuals yielded under the alternative hypothesis that there is a specification error of one of the types considered in the first part of the paper. As a preliminary step to deriving the test procedures the classical least-squares residual vector is transformed to a sub-vector which has more desirable properties for testing the null hypothesis that the specification of the model is correct. Also, under certain assumptions, with respect to the alternative hypothesis, it is shown that the mean vector of the residuals can be approximated by a linear sum of vectors qj,
TL;DR: In this article, a Bayesian approach to estimating structural uncertainty about unknown quantities is presented, which can be applied to forecasting the price of oil and the chance of catastrophic failure of the US space shuttle.
Abstract: In most examples of inference and prediction, the expression of uncertainty about unknown quantities y on the basis of known quantities x is based on a model M that formalizes assumptions about how x and y are related. M will typically have two parts: structural assumptions S, such as the form of the link function and the choice of error distribution in a generalized linear model, and parameters θ whose meaning is specific to a given choice of S. It is common in statistical theory and practice to acknowledge parametric uncertainty about θ given a particular assumed structure S; it is less common to acknowledge structural uncertainty about S itself. A widely used approach involves enlisting the aid of x to specify a plausible single «best» choice S* for S, and then proceeding as if S* were known to be correct. In general this approach fails to assess and propagate structural uncertainty fully and may lead to miscalibrated uncertainty assessments about y given x. When miscalibration occurs it will often result in understatement of inferential or predictive uncertainty about y, leading to inaccurate scientific summaries and overconfident decisions that do not incorporate sufficient hedging against uncertainty. In this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the chance of catastrophic failure of the US space shuttle
TL;DR: In this article, the consequences of violating the assumptions of the regression model procedures for figuring out when violations exist and strategies for dealing with problems when they occur are discussed, with many examples from political science sociology and economics.
Abstract: Multiple regression analysis is one of the social sciences most popular procedures. This monograph provides a systematic treatment of many of the major problems encountered in using regression analysis. Because it is likely that 1 or more of the regression models assumptions will be violated in a specific empirical analysis the ability to recognize problems and take appropriate action helps to ensure the proper use of the technic. The author clearly and concisely discusses the consequences of violating the assumptions of the regression model procedures for figuring out when violations exist and strategies for dealing with problems when they occur. Problems of specification error measurement error multicollinearity nonlinearity and nonadditivity and heteroscedasticity and autocorrelation are discussed with many examples from political science sociology and economics. Because many applications of regression in the social sciences involve analysis of causes randomly drawn from a larger population some of the major examples are constructed to show more clearly the properties of regression estimates derived from samples. The concepts of bias and efficiency in statistical estimation receive particular attention.
TL;DR: This is the first published exposition of current econometric methods for the study of duration data, including both structural and reduced form models and models with and without neglected heterogeneity.
Abstract: This book presents statistical methods for analysis of the duration of events. The primary focus is on models for single-spell data, events in which individual agents are observed for a single duration. Some attention is also given to multiple-spell data. The first part of the book covers model specification, including both structural and reduced form models and models with and without neglected heterogeneity. The book next deals with likelihood based inference about such models, with sections on full and semiparametric specification. A final section treats graphical and numerical methods of specification testing. This is the first published exposition of current econometric methods for the study of duration data.
TL;DR: In this paper, applied regression allows social scientists who are not specialists in quantitative techniques to arrive at clear verbal explanations of their numerical results, including residuals, interaction effects, specification error, multicollinearity, standardized coefficients, and dummy variables.
Abstract: Applied regression allows social scientists who are not specialists in quantitative techniques to arrive at clear verbal explanations of their numerical results. Provides a lucid discussion of more specialized subjects: analysis of residuals, interaction effects, specification error, multicollinearity, standardized coefficients, and dummy variables.