TL;DR: In this article, the authors propose Continuous Outcomes Binary Outcomes Testing and Fit Ordinal Outcomes Numeric Outcomes and Numeric Numeric Count Outcomes (NOCO).
Abstract: Introduction Continuous Outcomes Binary Outcomes Testing and Fit Ordinal Outcomes Nominal Outcomes Limited Outcomes Count Outcomes Conclusions
TL;DR: In this paper, the least square estimation of a change point in multiple regressions is studied and the analytical density function and the cumulative distribution function for the general skewed distribution are derived.
Abstract: This paper studies the least squares estimation of a change point in multiple regressions. Consistency, rate of convergence, and asymptotic distributions are obtained. The model allows for lagged dependent variables and trending regressors. The error process can be dependent and heteroskedastic. For nonstationary regressors or disturbances, the asymptotic distribution is shown to be skewed. The analytical density function and the cumulative distribution function for the general skewed distribution are derived. The analysis applies to both pure and partial changes. The method is used to analyze the response of market interest rates to discount rate changes.
TL;DR: The most popular explanatory variables used have been income, relative tourism prices, and transportation costs as discussed by the authors, which have been published in the 80s, have used annual data, and have been based on estimation of loglinear single-equation models.
TL;DR: In this paper, a conceptual model that provides both a traditional ownership-focused internalization perspective on those issues and an integrated approach combining a broader transaction cost interpretation of control with a resource input-based bargaining power model is proposed.
Abstract: The authors examine the meaning of control in international joint ventures (IJVs) and the relationships of potential means of control in such organizations to the performance satisfaction of the foreign partner. They propose a conceptual model that provides both a traditional ownership-focused internalization perspective on those issues and an integrated approach combining a broader transaction cost interpretation of control with a resource input-based bargaining power model. A set of simultaneous structural equations with endogenous explanatory variables provides multiple possible paths from various resource and power inputs through different means of control to perceived performance satisfaction. In such a model, intermediate variables act both as dependent and independent variables; thus the complex theoretical interactions of the variables are modeled more comprehensively and realistically than in single-equation models. To test the model and compare the theoretical relationships, the authors used dat...
TL;DR: This thesis considers the problem of finding the global optimum of the response with few function evaluations and presents strategies to conduct the optimization in stages and for optimization subject to constraints on additional response variables.
Abstract: A complex mathematical model that produces output values from input values is now commonly called a computer model. This thesis considers the problem of finding the global optimum of the response with few function evaluations. A small number of function evaluations is desirable since the computer model is often expensive (time consuming) to evaluate.
The function to be optimized is modeled as a stochastic process from initial function evaluations. Points are sampled sequentially according to a criterion that combines promising prediction values with prediction uncertainty. Some graphical tools are given that allow early assessment about whether the modeling strategy will work well. The approach is generalized by introducing a parameter that controls how global versus local the search strategy is. Strategies to conduct the optimization in stages and for optimization subject to constraints on additional response variables are presented.
Special consideration is given to the stopping criterion of the global optimization algorithm. The problem of achieving a tolerance on the global minimum can be represented by determining whether the first order statistic of N dependent variables is greater than a certain value. An algorithm is developed that quickly determines bounds on the probability of this event.
A strategy to explore high-dimensional data informally through effect plots is presented. The interpretation of the plots is guided by pointwise standard errors of the effects which are developed. When used in the context of global optimization, the graphical analysis sheds light on the number and location of local optima.
TL;DR: A number of techniques for dealing with multicollinearity are discussed and a dataset from a recent study of risk factors for pneumonia in swine is demonstrated using a dataset used in this paper.
TL;DR: This paper examined the relationship between interaction and quadratic terms in regression models in which the independent variables are correlated and showed that if the appropriate product terms are not added to the equation, then the estimated model may indicate concave (convex) relationships between independent variables and the dependent variable, whereas the true relationship is convex (concave).
Abstract: This article examines the relationships between interaction (product) terms and curvilinear (quadratic) terms in regression models in which the independent variables are correlated. The author uses 2 substantive examples to demonstrate the following outcomes: (a) If the appropriate quadratic terms are not added to the estimated model, then the observed interaction may indicate a synergistic (offsetting) relationship between the independent variables, whereas the true relationship is, in fact, offsetting (synergistic). (b) If the appropriate product terms are not added to the equation, then the estimated model may indicate concave (convex) relationships between the independent variables and the dependent variable, whereas the true relationship is, in fact, convex (concave). (c) If the appropriate product and quadratic terms are not examined simultaneously, then the observed interactive or curvilinear relationships may be nonsignificant when such relationships exist. The implications of these results for the examination of interaction and quadratic effects in multiple regression analysis are discussed.
TL;DR: In this paper, a multilevel perspective on item response modeling is presented, where the item response model is cast as a within-student model and the student population distribution as a betweenstudent model.
Abstract: In this article we show how certain analytic problems that arise when one attempts to use latent variables as outcomes in regression analyses can be addressed by taking a multilevel perspective on item response modeling. Under a multilevel, or hierarchical, perspective we cast the item response model as a within-student model and the student population distribution as a between-student model. Taking this perspective leads naturally to an extension of the student population model to include a range of studentlevel variables, and it invites the possibility of further extending the models to additional levels so that multilevel models can be applied with latent outcome variables. In the two-level case, the model that we employ is formally equivalent to the plausible value procedures that are used as part of the National Assessment of Educational Progress (NAEP), but we present the method for a different class of measurement models, and we use a simultaneous estimation method rather than two-step estimation. In our application of the models to the appropriate treatment of measurement error in the dependent variable of a between-student regression, we also illustrate the adequacy of some approximate procedures that are used in NAEP.
TL;DR: The authors considered estimation of multiplicative, unobserved component panel data models without imposing a strict exogeneity assumption on the conditioning variables, and proposed a method of moments estimators for nonnegative explained variables, including count variables, continuously distributed nonnegative outcomes and even binary variables.
Abstract: This paper considers estimation of multiplicative, unobserved components panel data models without imposing a strict exogeneity assumption on the conditioning variables. The method of moments estimators proposed have significant robustness properties. They require only a conditional mean assumption and apply to models with lagged dependent variables and to finite distributed lag models with arbitrary feedback from the explained to future values of the explanatory variables. The model is particularly suited to nonnegative explained variables, including count variables, continuously distributed nonnegative outcomes, and even binary variables. The general model can also be applied to certain nonlinear Euler equations.
TL;DR: A number of alternative ways to deal with the problem of variable selection, how to use model misspecification tests, and approaches to predictive neural modeling which are more in tune with the requirements for modeling financial data series are described.
Abstract: Neural networks have shown considerable successes in modeling financial data series. However, a major weakness of neural modeling is the lack of established procedures for performing tests for misspecified models, and tests of statistical significance for the various parameters that have been estimated. This is a serious disadvantage in applications where there is a strong culture for testing not only the predictive power of a model or the sensitivity of the dependent variable to changes in the inputs but also the statistical significance of the finding at a specified level of confidence. Rarely is this more important than in the case of financial engineering, where the data generating processes are dominantly stochastic and only partially deterministic. Partly a tutorial, partly a review, this paper describes a collection of typical applications in options pricing, cointegration, the term structure of interest rates and models of investor behavior which highlight these weaknesses and propose and evaluate a number of solutions. We describe a number of alternative ways to deal with the problem of variable selection, show how to use model misspecification tests, we deploy a novel way based on cointegration to deal with the problem of nonstationarity, and generally describe approaches to predictive neural modeling which are more in tune with the requirements for modeling financial data series.
TL;DR: For the sum of the moments of a sequence of independent symmetric (or nonnegative) random variables, this article gave lower and upper estimates of moments of $S. The estimates are exact, up to some universal constants, and extend the previous results for particular types of variables.
Abstract: For the sum $S = \sum X_i$ of a sequence $(X_i)$ of independent symmetric (or nonnegative) random variables, we give lower and upper estimates of moments of $S$. The estimates are exact, up to some universal constants, and extend the previous results for particular types of variables $X_i$.
TL;DR: Geographically Weighted Regression (GWR) as discussed by the authors is a statistical technique which can be used both to account for and to examine the presence of spatial non-stationarity in relationships.
Abstract: A frequent aim of data analysis is to identify relationships between pairs of variables, often after negating the effects of other variables. By far the most common type of analysis used to achieve this aim is that of regression in which relationships between one or more independent variables and a single dependent variable are estimated. In spatial analysis, the data are drawn from geographical units and a single regression equation is estimated. This has the effect of producing ‘average’ or ‘global’ parameter estimates which are assumed to apply equally over the whole region. That is, the relationships being measured are assumed to be stationary over space. Relationships which are not stationary, and which are said to exhibit spatial nonstationarity, create problems for the interpretation of parameter estimates from a regression model. It is the intention of this paper to describe a statistical technique, which we refer to as Geographically Weighted Regression (GWR), which can be used both to account for and to examine the presence of spatial non-stationarity in relationships.
TL;DR: It is argued that a simple contiguity matrix provides a unified approach that works with cross-sectional continuous linear relationships, as well as with binary and censored dependent variable problems and autoregressive time series relationships.
Abstract: Practitioners of regional science often are engaged in statistical analysis of regional data samples collected with reference to points in space. Examples are cross-sectional observations on county-level income, employment or payroll, cross-sectional observations from a group of neighboring states in a region, and firm-level employment or payroll where we know the firm address or an approximate location based on a postal code. Ignoring the spatial configuration of sample observations in regression analysis has been found to produce residuals that vary systematically over space, a phenomenon known as spatial autocorrelation. This paper illustrates how to incorporate spatial information in regression relationships that exhibit spatial auto-correlation. I argue that a simple contiguity matrix provides a unified approach that works with cross-sectional continuous linear relationships, as well as with binary and censored dependent variable problems and autoregressive time series relationships.
TL;DR: In this article, a simple method of averaging proper variables which have similar factor structures in a confirmatory factor model is proposed, and conditions on the relative errors of the measured variables are given that verify when a model based on averaged variables can give better estimators and tests than one based on omitted variables.
Abstract: The normal theory maximum likelihood and asymptotically distribution free methods are commonly used in covariance structure practice. When the number of observed variables is too large, neither method may give reliable inference due to bad condition numbers or unstable solutions. The main existing solution to the problem of high dimension is to build a model based on marginal variables. This practice is inefficient because the omitted variables may still contain valuable information regarding the structural model. In this paper, we propose a simple method of averaging proper variables which have similar factor structures in a confirmatory factor model. The effects of averaging variables on estimators and tests are investigated. Conditions on the relative errors of the measured variables are given that verify when a model based on averaged variables can give better estimators and tests than one based on omitted variables. Our method is compared to the method of variable selection based on mean square error of predicted factor scores. Some aspects related to averaging, such as improving the normality of observed variables, are also discussed.
TL;DR: In this article, a nonparametric approach for estimating optimal transformations of petrophysical data to obtain the maximum correlation between observed variables is proposed, which does not require a priori assumptions of a functional form and the optimal transformations are derived solely based on the data set.
Abstract: Conventional Imtrltiple regression for permeability estimation from well logs requires a functional relationship to be presumed Due to the inexact nature of the relationship between petrophysical variables, it is not always possible to identify the underlying functional form between dependent and independent variables in advance When large variations in metrological properties arc exhibited, parametric regression often fails or leads to unstable and erroneous results, especially for multi variate cases In this paper we describe a nonparametric approach for estimating optimal transformations of petrophysical data to obtain the maximum correlation between observed variables The approach does not require a priori assumptions of a functional form and the optimal transformations are derived solely based on the data set An iterative procedure involving the ul[ernaling conditional expec[a[ion (ACE) forms the basis of our approach The power of ACE is illustrated using synthetic as well as field examples The results clearly demonstrate improved permeability estimation by ACE compared to conventional parametric regression methods Introduction A critical aspect of reservoir description involves estimating References and illustrations at end of paper permeability in uncored wells based on well logs and other known petrophysical attributes A common approach is to develop a permeability-porosity relationship by regressing on data from cored wells and then, to predict permeability in uncored wells from well logs 1’2 Multiple regression is used when large variations in metrological properties exist (e g a wide range in grain sizes, high degree of cementation, diagenetic alteration etc) and a simple permeability-porosity relationship no longer holds good However, there are several Iim itations to such an approach Many of these arise from the inexact nature of the relationship between petrophysical variables and u priori assumptions regarding functional forms used to model the data -all leading to biased estimates When prediction of permeability extremes is a major concern, the high and low values are enhanced through a weighting scheme in the regression Besides being subjective in nature, such weighting can cause the prediction to become unstable which leads to erroneous results Most importantly, conventional regression assumes independent variables to be free of error, which is highly optimistic for geologic and petrophysical data, Jensen and Lake? introduced power transformations for optimization of regression-based permeability y-porosity predictions The underlying theory is that if the joint probability distribution function (jpdf) of two variables is binorrnal, (he relationship will be linear,3 Several methods exist to estimate the exponents for power transformation One method, described by Emerson and Stoto4 and adopted by Jensen and Lake,2 is based on symmetrizing the probability distribution function (pdf) Another method is a trial-anderror approach based on a normal probability plot of the data By power transforming permeability and porosity separately the authors are able to improve permeability-porosity correlations However, using a trial-and-error method for selecting exponents for power transformation is time consuming, and symmetrizing the p,d f does not necessarily guarantee a binormal distribution of transformed variables In addition, there are no indications as to whether power transformations will work for multivariate cases
TL;DR: In this article, the authors show that there is a conflict between high breakdown point designs and high efficient designs and that the conflict appears only in artificial situations, where the independent variables are random with possible outliers and in such model there will be no conflict between efficiency and positive breakdown.
Abstract: While it is known that for robustness concepts based on the influence function (or on shrinking contamination neighbourhoods) high robustness and high efficiency can be combined by constrained problems (see in particular Hampel et al. (1986) and Chapter 7) there exists a recent discussion whether there is a conflict between efficiency and high breakdown point. Morgenthaler (1991), Stefanski (1991) and Coakley et al. (1994) showed that estimators with positive breakdown point have very low efficiencies compared with the least squares estimator. Therefore Davies (1993, 1994) proposed for desirable properties of estimators mainly robustness properties and no efficiency property. But as Rousseeuw (1994) argued this depends on the assumption of outliers in the independent variables (x-variables, experimental conditions) and the assumption of equal variances at all independent variables and in particular at leverage points. Hence, the conflict appears only in artificial situations. If the independent variables are random with possible outliers then a better model will be a multivariate model and in such model there will be no conflict between efficiency and positive breakdown (see IIe (1994)). For fixed independent variables as appear in planned experiments there is also no conflict as Morgenthaler (1994) noticed. But there is a conflict between high breakdown point designs and high efficient designs, which is shown in this chapter. At first basing on results of Section 4.3 in Section 9.1 trimmed weighted Lp estimators and corresponding designs are derived which maximize the breakdown point.
TL;DR: In this paper, three different discriminant techniques are applied and compared to analyze a complex data set of credit risks: logistic discriminant analysis with a simple mean effects model, classification tree analysis, and a feedforward network with one hidden layer consisting of three units.
Abstract: Three different discriminant techniques are applied and compared to analyze a complex data set of credit risks. A large sample is split into a training, a validation, and a test sample. The dependent variable is whether a loan is paid back without problems or not. Predictor variables are sex, job duration, age, car ownership, telephone ownership, and marital status. The statistical techniques are logistic discriminant analysis with a simple mean effects model, classification tree analysis, and a feedforward network with one hidden layer consisting of three units. It turns out, that in the given test sample, the predictive power is about equal for all techniques with the logistic discrimination as the best technique. However, the feedforward network produces different classification rules from the logistic discrimination and the classification tree analysis. Therefore, an additional coupling procedure for forecasts is applied to produce a combined forecast. However, this forecast turns out to be slightly worse than the logit model.
TL;DR: This paper describes both how a continuous variable may be dichotomised by searching for a maximum value of zeta, and how a heuristic extension of this method can partition a continuous variables into more than two categories.
Abstract: This paper introduces a new technique for discretization of continuous variables based on zeta, a measure of strength of association between nominal variables developed for this purpose. Zeta is defined as the maximal accuracy achievable if each value of an independent variable must predict a different value of a dependent variable. We describe both how a continuous variable may be dichotomised by searching for a maximum value of zeta, and how a heuristic extension of this method can partition a continuous variable into more than two categories. Experimental comparisons with other published methods, show that zeta-discretization runs considerably faster than other techniques without any loss of accuracy.
TL;DR: In this paper, a new measure of goodness of fit for linear regression with dichotomous dependent variables is proposed, which can be interpreted intuitively in a similar way to R2 in the linear regression context.
Abstract: The econometrics literature contains many alternative measures of goodness of fit, roughly analogous to R2, for use with equations with dichotomous dependent variables. There is, however, no consensus as to the measures' relative merits or about which ones should be reported in empirical work. This paper proposes a new measure that possesses several useful properties that the other measures lack. The new measure may be interpreted intuitively in a similar way to R2 in the linear regression context.
TL;DR: This paper is a concise overview of what has been studied and how: the systems, independent and intervening variables manipulated or measured, and experimental procedures employed.
Abstract: By early 1996, approximately 140 different controlled experiments had been published in 164 articles in refereed journals or conference proceedings, which examined processes and outcomes in computer-supported group decision making. This paper is a concise overview of what has been studied and how: the systems, independent and intervening variables manipulated or measured, and experimental procedures employed. A subsequent paper will examine the dependent variables and findings of the experiments. The purpose is not only to provide a comprehensive summary of past research, but also to critically assess what has been studied little or inadequately, in order to inform design choices for future experiments.
TL;DR: The Monte Carlo simulations confirmed a previously undocumented "rule of thumb" stating that when the EPV is less than 10-20, the algebraic models used in logistic regression and proportional hazards regression may be unreliable, leading to imprecise or spurious results.
Abstract: Background Monte Carlo methods use "simulated" analyses with random numbers for solving problems, particularly those that defy solutions using mathematical theory alone. Research using Monte Carlo simulations is very popular in many branches of science and is sometimes done in clinical investigation. The origins and basic strategy of the technique, however, may not be well known to clinical researchers. The purpose of this paper is to describe the history and general principles of Monte Carlo methods and to demonstrate how Monte Carlo simulations were recently applied to examine a phenomenon in multivariable statistical analysis called the number of outcome events per independent variable (EPV). For example, in a cohort of 200 people, with 50 deaths and 5 independent (predictor) variables, EPV = 50/5 = 10. Methods The "real-world" data came from a clinical trial of 673 patients in which 7 variables were cogent predictors of 252 deaths, so that EPV = 252/7 = 36. For the Monte Carlo simulations, special models were used while allowing simulations of proportional hazards and logistic regression to maintain the basic relationship of variables and the same size of the original population, at EPV values of 2, 5, 10, 15, 20, and 25. Results The Monte Carlo simulations confirmed a previously undocumented "rule of thumb" stating that when the EPV is less than 10-20, the algebraic models used in logistic regression and proportional hazards regression may be unreliable, leading to imprecise or spurious results. Conclusion Monte Carlo techniques offer attractive methods for clinical investigators to use in solving problems that are not amenable to customary mathematical approaches.
TL;DR: Using sample moments up to the fourth in the standard simple regression model with measurement errors allows testing whether the parameters are identified by having independent variables that are not normally distributed as mentioned in this paper, and these moments can be used to estimate the parameters, if identified, consistently and asymptotically efficiently by minimum chi-squared.
Abstract: Using sample moments up to the fourth in the standard simple regression model with measurement errors allows testing whether the parameters are identified by having independent variables that are not normally distributed. These moments can be used to estimate the parameters, if identified, consistently and asymptotically efficiently by minimum chi-squared. Available instruments may be used to get more efficient, consistent estimates. A Monte Carlo study indicates that the approach offers a feasible, practical approach to handling errors in variables. Inference may sometimes be improved by using bootstrapped estimates.
TL;DR: In this article, the effect of cumulating errors is considered by conducting a simulation study using models estimated from thinning experiments carried out in Finland, and the aim of the study is to estimate the prediction bias and the precision of long-term growth projections due to sequential use of growth models.
TL;DR: In this article, three independent variables are used: district magnitude, electoral thresholds, and the combined measurement of these variables, namely the effective threshold, to explain differences in disproportionality.
TL;DR: In this article, the authors consider a regression in which the dependent variable is integrated of order zero, I(0), while the explanatory variables are integrated in order one, i.e., I(1).
Abstract: When variables included in an OLS regression are stationary, conventional statistical measures such as t-statistics and R 2's – in addition to a priori information from economic theory – are the standard indicators used to assess the performance of the hypothesized model. However, if the variables under consideration are non-stationary, such conventional measures no longer have the usual interpretation. With recent developments in time-series analysis, namely cointegration, researchers are able to deal with models containing non-stationary variables effectively. A standard cointegration model, however, requires all variables included in the regression to be of the same order of integration. In this paper we consider a regression in which the dependent variable is integrated of order zero, I(0), while the explanatory variables are integrated of order one, I(1). Conventional statistical measures are inapplicable because the regressors are not stationary. On the other hand, cointegration statistics are inapp...
TL;DR: Often one is interested in comparing two or more groups of times-toevent, if the groups are similar, except for the treatment under study, then, the nonparametric methods of Chapter 7 may be used directly.
Abstract: Often one is interested in comparing two or more groups of times-toevent If the groups are similar, except for the treatment under study, then, the nonparametric methods of Chapter 7 may be used directly More often than not, the subjects in the groups have some additional characteristics that may affect their outcome For example, subjects may have demographic variables recorded, such as age, gender, socioeconomic status, or education; behavioral variables, such as dietary habits, smoking history, physical activity level, or alcohol consumption; or physiological variables, such as blood pressure, blood glucose levels, hemoglobin levels, or heart rate Such variables may be used as covariates (explanatory variables, confounders, risk factors, independent variables) in explaining the response (dependent) variable After adjustment for these potential explanatory variables, the comparison of survival times between groups should be less biased and more precise than a simple comparison
TL;DR: In this article, the authors describe the process whereby multivariate interdisciplinary measures of potential to perform are integrated with performance measures to develop models of retail performance for bank branches, using the key business drivers of a major trading bank as dependent variables.
Abstract: Details the process whereby multivariate interdisciplinary measures of potential to perform are integrated with performance measures to develop models of retail performance for bank branches. The predictive models use the key business drivers of a major trading bank as dependent variables. Independent variables explaining business drivers are the theorized potential variables that measure the capacity to generate retail business. The models allow a comparison between the predicted and actual levels of key business diverts, thus measuring unrealized performance. Findings can assist decision making during restructuring, branch closures or downsizing. The variables presented should be regarded as examples rather than universally accepted measures of branch performance.
TL;DR: Five variables consistently exhibit a moderate positive association with reference accuracy, and the establishment of standard guidelines for reporting findings and an increase in using repeated measures would greatly facilitate making comparisons across studies.
Abstract: Meta-analysis can be used to synthesize the large volume of data describing numerous independent variables and their correlations with reference accuracy. The questions guiding this study are, How often have the same variables been examined across different studies? To what extent do the observed correlations agree or differ? and Can the results of multiple studies be combined to obtain a more accurate estimate of the strength of association for a given variable with reference accuracy? Consistent findings across studies would suggest that reference accuracy has some relationship with the independent variable in question. The ability to perform meta-analysis is limited because few studies use the same operational definitions for variables and rarely provide enough descriptive statistics. Out of seven eligible studies, only twelve comparisons can be made. Five variables (library expenditures, volumes added, fluctuation in the collection, size of the service population, and hours of operation) consistently ...
TL;DR: Log-linear modelling was used here to evaluate the effect of four variables (training set size, waveband combination, classification algorithm and testing set size) on classification accuracy.
Abstract: The accuracy of an image classification is a function of a range of variables. To select an appropriate classification approach for a particular set of data the analyst must be aware of the significant variables which may affect classification accuracy and the nature of their effect. The effect of a variable, or small number of variables, on classification accuracy may be evaluated by straightforward comparison of classification accuracies. However, for the evaluation of the simultaneous effect of a large number of variables such an approach may be impractical. In such circumstances log-linear modelling may be used to identify the significant variables affecting classification accuracy and the nature of the effect of the significant variables elucidated from further analysis. Log-linear modelling was used here to evaluate the effect of four variables (training set size, waveband combination, classification algorithm and testing set size) on classification accuracy. Since the analyst usually has m...
TL;DR: Hierarchical tree-based regression (HTBR) may provide a better model for forecasting continuous response variables in transportation applications when the shortcomings of OLS regression are present.
Abstract: Given the continual need for transportation professionals to forecast trends and the increasing availability of sophisticated and improved modeling methods in new and improved software packages, new methods should be explored to determine whether they can replace or supplement more classical statistical methods. One commonly used classical statistical technique for relating a continuous dependent variable with one or more independent variables (continuous or discrete) is ordinary least squares (OLS) regression. This method is routinely applied in transportation to forecast such things as energy use, trip attractions, trip productions, automobile emissions, and growth in vehicle miles traveled (VMT). Despite its widespread use and tremendous utility, however, OLS regression has limitations. It does not deal well with multicollinear independent variables, interactions between independent variables must be specified, the functional relationship between dependent and independent variables must be known (or ap...