TL;DR: It is argued that the arcsine transform should not be used in either binomial or non-binomial data, and the logit transformation is proposed as an alternative approach to address these issues.
Abstract: The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.
TL;DR: Full model tests and P value adjustments can be used as a guide to how frequently type I errors arise by sampling variation alone, and favour the presentation of full models, since they best reflect the range of predictors investigated and ensure a balanced representation also of non-significant results.
Abstract: Fitting generalised linear models (GLMs) with more than one predictor has become the standard method of analysis in evolutionary and behavioural research. Often, GLMs are used for exploratory data analysis, where one starts with a complex full model including interaction terms and then simplifies by removing non-significant terms. While this approach can be useful, it is problematic if significant effects are interpreted as if they arose from a single a priori hypothesis test. This is because model selection involves cryptic multiple hypothesis testing, a fact that has only rarely been acknowledged or quantified. We show that the probability of finding at least one ‘significant’ effect is high, even if all null hypotheses are true (e.g. 40% when starting with four predictors and their two-way interactions). This probability is close to theoretical expectations when the sample size (N) is large relative to the number of predictors including interactions (k). In contrast, type I error rates strongly exceed even those expectations when model simplification is applied to models that are over-fitted before simplification (low N/k ratio). The increase in false-positive results arises primarily from an overestimation of effect sizes among significant predictors, leading to upward-biased effect sizes that often cannot be reproduced in follow-up studies (‘the winner's curse’). Despite having their own problems, full model tests and P value adjustments can be used as a guide to how frequently type I errors arise by sampling variation alone. We favour the presentation of full models, since they best reflect the range of predictors investigated and ensure a balanced representation also of non-significant results.
TL;DR: This work develops a novel estimation and uniformly valid inference method for the treatment effect in this setting, called the "post-double-selection" method, which resolves the problem of uniform inference after model selection for a large, interesting class of models.
Abstract: We propose robust methods for inference on the effect of a treatment variable on a scalar outcome in the presence of very many controls. Our setting is a partially linear model with possibly non-Gaussian and heteroscedastic disturbances. Our analysis allows the number of controls to be much larger than the sample size. To make informative inference feasible, we require the model to be approximately sparse; that is, we require that the effect of confounding factors can be controlled for up to a small approximation error by conditioning on a relatively small number of controls whose identities are unknown. The latter condition makes it possible to estimate the treatment effect by selecting approximately the right set of controls. We develop a novel estimation and uniformly valid inference method for the treatment effect in this setting, called the "post-double-selection" method. Our results apply to Lasso-type methods used for covariate selection as well as to any other model selection method that is able to find a sparse model with good approximation properties.
The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard post-model selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. Thus our method resolves the problem of uniform inference after model selection for a large, interesting class of models. We illustrate the use of the developed methods with numerical simulations and an application to the effect of abortion on crime rates.
TL;DR: Empirical results with three well-known real data sets indicate that the proposed model can be an effective way to improve forecasting accuracy achieved by traditional hybrid models and also either of the components models used separately.
Abstract: Improving forecasting especially time series forecasting accuracy is an important yet often difficult task facing decision makers in many areas. Both theoretical and empirical findings have indicated that integration of different models can be an effective way of improving upon their predictive performance, especially when the models in combination are quite different. Artificial neural networks (ANNs) are flexible computing frameworks and universal approximators that can be applied to a wide range of forecasting problems with a high degree of accuracy. However, using ANNs to model linear problems have yielded mixed results, and hence; it is not wise to apply ANNs blindly to any type of data. Autoregressive integrated moving average (ARIMA) models are one of the most popular linear models in time series forecasting, which have been widely applied in order to construct more accurate hybrid models during the past decade. Although, hybrid techniques, which decompose a time series into its linear and nonlinear components, have recently been shown to be successful for single models, these models have some disadvantages. In this paper, a novel hybridization of artificial neural networks and ARIMA model is proposed in order to overcome mentioned limitation of ANNs and yield more general and more accurate forecasting model than traditional hybrid ARIMA-ANNs models. In our proposed model, the unique advantages of ARIMA models in linear modeling are used in order to identify and magnify the existing linear structure in data, and then a neural network is used in order to determine a model to capture the underlying data generating process and predict, using preprocessed data. Empirical results with three well-known real data sets indicate that the proposed model can be an effective way to improve forecasting accuracy achieved by traditional hybrid models and also either of the components models used separately.
TL;DR: Two very simple but effective shrinkage methods and an extension of the nonnegative garrote estimator are introduced, which avoid having to use nonparametric testing methods for which there is no general reliable distributional theory.
TL;DR: This DSU series of Technical Support Documents (TSDs) is intended to complement the Methods Guide by providing detailed information on how to implement specific methods by providing clear recommendations on the implementation of methods and reporting standards where it is appropriate to do so.
Abstract: This paper sets out a generalised linear model (GLM) framework for the synthesis of data from randomised controlled trials (RCTs). We describe a common model taking the form of a linear regression for both fixed and random effects synthesis, that can be implemented with Normal, Binomial, Poisson, and Multinomial data. The familiar logistic model for meta- analysis with Binomial data is a GLM with a logit link function, which is appropriate for probability outcomes. The same linear regression framework can be applied to continuous outcomes, rate models, competing risks, or ordered category outcomes, by using other link functions, such as identity, log, complementary log-log, and probit link functions. The common core model for the linear predictor can be applied to pair-wise meta-analysis, indirect comparisons, synthesis of multi-arm trials, and mixed treatment comparisons, also known as network meta-analysis, without distinction.We take a Bayesian approach to estimation and provide WinBUGS program code for a Bayesian analysis using Markov chain Monte Carlo (MCMC) simulation. An advantage of this approach is that it is straightforward to extend to shared parameter models where different RCTs report outcomes in different formats but from a common underlying model. Use of the GLM framework allows us to present a unified account of how models can be compared using the Deviance Information Criterion (DIC), and how goodness of fit can be assessed using the residual deviance. WinBUGS code for model critique is provided. Our approach is illustrated through a range of worked examples for the commonly encountered evidence formats, including shared parameter models.We give suggestions on computational issues that sometimes arise in MCMC evidence synthesis, and comment briefly on alternative software.
TL;DR: In this article, the authors consider the problem of estimating a sparse linear regression vector β* under a Gaussian noise model, for the purpose of both prediction and model selection, and establish oracle inequalities for the prediction and l2 estimation errors of this estimator.
Abstract: We consider the problem of estimating a sparse linear regression vector β* under a Gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β*. We establish oracle inequalities for the prediction and l2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p=∞, this result implies that a thresholded version of the Group Lasso estimator selects the sparsity pattern of β* with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and l2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation errors as compared to the Lasso. An important application of our results is provided by the problem of estimating multiple regression equations simultaneously or multi-task learning. In this case, we obtain refinements of the results in [In Proc. of the 22nd Annual Conference on Learning Theory (COLT) (2009)], which allow us to establish a quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest.
TL;DR: This article presents several models that allow for the commensurability of the information in the historical and current data to determine how much historical information is used in hierarchical Bayesian methods for incorporating historical data that are adaptively robust to prior information that reveals itself to be inconsistent with the accumulating experimental data.
Abstract: Bayesian clinical trial designs offer the possibility of a substantially reduced sample size, increased statistical power, and reductions in cost and ethical hazard. However when prior and current information conflict, Bayesian methods can lead to higher than expected type I error, as well as the possibility of a costlier and lengthier trial. This motivates an investigation of the feasibility of hierarchical Bayesian methods for incorporating historical data that are adaptively robust to prior information that reveals itself to be inconsistent with the accumulating experimental data. In this article, we present several models that allow for the commensurability of the information in the historical and current data to determine how much historical information is used. A primary tool is elaborating the traditional power prior approach based upon a measure of commensurability for Gaussian data. We compare the frequentist performance of several methods using simulations, and close with an example of a colon cancer trial that illustrates a linear models extension of our adaptive borrowing approach. Our proposed methods produce more precise estimates of the model parameters, in particular, conferring statistical significance to the observed reduction in tumor size for the experimental regimen as compared to the control regimen.
TL;DR: In this paper, the authors proposed scaled sparse linear regression (SRL) to jointly estimate the regression coefficients and the noise level in a linear model, which is a convex minimization of a penalized joint loss function.
Abstract: Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for the least squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization.
TL;DR: Differences between various cerebral white-matter tract property measurements of multiple sclerosis patients and controls are analyzed to analyze differences between various Cerebral White-matter demyelination via diffusion tensor imaging (DTI).
Abstract: We develop fast fitting methods for generalized functional linear models. The functional predictor is projected onto a large number of smooth eigenvectors and the coefficient function is estimated using penalized spline regression; confidence intervals based on the mixed model framework are obtained. Our method can be applied to many functional data designs including functions measured with and without error, sparsely or densely sampled. The methods also extend to the case of multiple functional predictors or functional predictors with a natural multilevel structure. The approach can be implemented using standard mixed effects software and is computationally fast. The methodology is motivated by a study of white-matter demyelination via diffusion tensor imaging (DTI). The aim of this study is to analyze differences between various cerebral white-matter tract property measurements of multiple sclerosis (MS) patients and controls. While the statistical developments proposed here were motivated by the DTI st...
TL;DR: In this article, a Bayesian nonparametric approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes, and additionally employs automatic relevance determination to infer a sparse set of dynamic dependencies allowing to learn SLDS with varying state dimension or switching VAR processes with varying autoregressive order.
Abstract: Many complex dynamical phenomena can be effectively modeled by a system that switches among a set of conditionally linear dynamical modes. We consider two such models: the switching linear dynamical system (SLDS) and the switching vector autoregressive (VAR) process. Our Bayesian nonparametric approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes. We additionally employ automatic relevance determination to infer a sparse set of dynamic dependencies allowing us to learn SLDS with varying state dimension or switching VAR processes with varying autoregressive order. We develop a sampling algorithm that combines a truncated approximation to the Dirichlet process with efficient joint sampling of the mode and state sequences. The utility and flexibility of our model are demonstrated on synthetic data, sequences of dancing honey bees, the IBOVESPA stock index and a maneuvering target tracking application.
TL;DR: It is found that the LN cascade provides accurate estimates of the firing rates of spiking neurons in most of parameter space, and an adaptive timescale rate model is introduced in which the timescale of the linear filter depends on the instantaneous firing rate.
Abstract: Neurons transform time-varying inputs into action potentials emitted stochastically at a time dependent rate. The mapping from current input to output firing rate is often represented with the help of phenomenological models such as the linear-nonlinear (LN) cascade, in which the output firing rate is estimated by applying to the input successively a linear temporal filter and a static non-linear transformation. These simplified models leave out the biophysical details of action potential generation. It is not a priori clear to which extent the input-output mapping of biophysically more realistic, spiking neuron models can be reduced to a simple linear-nonlinear cascade. Here we investigate this question for the leaky integrate-and-fire (LIF), exponential integrate-and-fire (EIF) and conductance-based Wang-Buzsaki models in presence of background synaptic activity. We exploit available analytic results for these models to determine the corresponding linear filter and static non-linearity in a parameter-free form. We show that the obtained functions are identical to the linear filter and static non-linearity determined using standard reverse correlation analysis. We then quantitatively compare the output of the corresponding linear-nonlinear cascade with numerical simulations of spiking neurons, systematically varying the parameters of input signal and background noise. We find that the LN cascade provides accurate estimates of the firing rates of spiking neurons in most of parameter space. For the EIF and Wang-Buzsaki models, we show that the LN cascade can be reduced to a firing rate model, the timescale of which we determine analytically. Finally we introduce an adaptive timescale rate model in which the timescale of the linear filter depends on the instantaneous firing rate. This model leads to highly accurate estimates of instantaneous firing rates.
TL;DR: In this paper, the authors explore what can be learned when the function of interest is identified through an instrumental variable but is not assumed to be known up to finitely many parameters.
Abstract: Instrumental variables are widely used in applied econometrics to achieve identification and carry out estimation and inference in models that contain endogenous explanatory variables. In most applications, the function of interest (e.g., an Engel curve or demand function) is assumed to be known up to finitely many parameters (e.g., a linear model), and instrumental variables are used identify and estimate these parameters. However, linear and other finite-dimensional parametric models make strong assumptions about the population being modeled that are rarely if ever justified by economic theory or other a priori reasoning and can lead to seriously erroneous conclusions if they are incorrect. This paper explores what can be learned when the function of interest is identified through an instrumental variable but is not assumed to be known up to finitely many parameters. The paper explains the differences between parametric and nonparametric estimators that are important for applied research, describes an easily implemented nonparametric instrumental variables estimator, and presents empirical examples in which nonparametric methods lead to substantive conclusions that are quite different from those obtained using standard, parametric estimators.
TL;DR: Positive association between the number of transactions and the volatility process of a certain stock is discovered and it is proved that the maximum likelihood estimator of the vector of unknown parameters is asymptotically normal with a covariance matrix that can be consistently estimated.
TL;DR: The classic regression based estimator of counterfactual means studied by Oaxaca and Blinder as mentioned in this paper constitutes a propensity score reweighting estimator based upon a linear model for the conditional odds of being treated.
Abstract: The classic regression based estimator of counterfactual means studied by Ronald Oaxaca (1973) and Alan Blinder (1973) is shown to constitute a propensity score reweighting estimator based upon a linear model for the conditional odds of being treated.
TL;DR: A variational Bayesian inference algorithm which can be widely applied to sparse linear models and is based on the spike and slab prior, which is the golden standard for sparse inference is introduced.
Abstract: We introduce a variational Bayesian inference algorithm which can be widely applied to sparse linear models. The algorithm is based on the spike and slab prior which, from a Bayesian perspective, is the golden standard for sparse inference. We apply the method to a general multi-task and multiple kernel learning model in which a common set of Gaussian process functions is linearly combined with task-specific sparse weights, thus inducing relation between tasks. This model unifies several sparse linear models, such as generalized linear models, sparse factor analysis and matrix factorization with missing values, so that the variational algorithm can be applied to all these cases. We demonstrate our approach in multi-output Gaussian process regression, multi-class classification, image processing applications and collaborative filtering.
TL;DR: Practical relevance of nonlinear methods trying to improve over linear correlation might be limited by the fact that the data are indeed almost Gaussian, and this framework for testing and estimating the deviation from Gaussianity is presented.
TL;DR: An improved LDA framework is proposed, the local LDA (LLDA), which can perform well without needing to satisfy the above two assumptions, and can effectively capture the local structure of samples.
Abstract: The linear discriminant analysis (LDA) is a very popular linear feature extraction approach. The algorithms of LDA usually perform well under the following two assumptions. The first assumption is that the global data structure is consistent with the local data structure. The second assumption is that the input data classes are Gaussian distributions. However, in real-world applications, these assumptions are not always satisfied. In this paper, we propose an improved LDA framework, the local LDA (LLDA), which can perform well without needing to satisfy the above two assumptions. Our LLDA framework can effectively capture the local structure of samples. According to different types of local data structure, our LLDA framework incorporates several different forms of linear feature extraction approaches, such as the classical LDA and principal component analysis. The proposed framework includes two LLDA algorithms: a vector-based LLDA algorithm and a matrix-based LLDA (MLLDA) algorithm. MLLDA is directly applicable to image recognition, such as face recognition. Our algorithms need to train only a small portion of the whole training set before testing a sample. They are suitable for learning large-scale databases especially when the input data dimensions are very high and can achieve high classification accuracy. Extensive experiments show that the proposed algorithms can obtain good classification results.
TL;DR: In this article, the random lasso method for variable selection in linear models is proposed, which consists of two major steps, in step 1, the lasso is applied to many bootstrap samples, each using a set of randomly selected covariates A measure of importance is yielded from this step for each covariate in step 2, a similar procedure to the first step is implemented with the exception that for each bootstrap sample, a subset of covariates is randomly selected with unequal selection probabilities determined by the covariates' importance.
Abstract: We propose a computationally intensive method, the random lasso method, for variable selection in linear models The method consists of two major steps In step 1, the lasso method is applied to many bootstrap samples, each using a set of randomly selected covariates A measure of importance is yielded from this step for each covariate In step 2, a similar procedure to the first step is implemented with the exception that for each bootstrap sample, a subset of covariates is randomly selected with unequal selection probabilities determined by the covariates' importance Adaptive lasso may be used in the second step with weights determined by the importance measures The final set of covariates and their coefficients are determined by averaging bootstrap results obtained from step 2 The proposed method alleviates some of the limitations of lasso, elastic-net and related methods noted especially in the context of microarray data analysis: it tends to remove highly correlated variables altogether or select them all, and maintains maximal flexibility in estimating their coefficients, particularly with different signs; the number of selected variables is no longer limited by the sample size; and the resulting prediction accuracy is competitive or superior compared to the alternatives We illustrate the proposed method by extensive simulation studies The proposed method is also applied to a Glioblastoma microarray data analysis
TL;DR: In this article, the authors show that under moderate sparsity levels, that is, 0 ≤ α ≤ 1/2, the analysis of variance (ANOVA) is essentially optimal under some conditions on the design.
Abstract: Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose we have p covariates and that under the alternative, the response only depends upon the order of p^(1−α) of those, 0 ≤ α ≤ 1. Under moderate sparsity levels, that is, 0 ≤ α ≤ 1/2, we show that ANOVA is essentially optimal under some conditions on the design. This is no longer the case under strong sparsity constraints, that is, α > 1/2. In such settings, a multiple comparison procedure is often preferred and we establish its optimality when α ≥ 3/4. However, these two very popular methods are suboptimal, and sometimes powerless, under moderately strong sparsity where 1/2 1/2. This optimality property is true for a variety of designs, including the classical (balanced) multi-way designs and more modern “p > n” designs arising in genetics and signal processing. In addition to the standard fixed effects model, we establish similar results for a random effects model where the nonzero coefficients of the regression vector are normally distributed.
TL;DR: In this paper, a series of model-predictive control (MPC) techniques have been explored for optimizing control sequences for window operation in mixedmode (MM) buildings using EnergyPlus, and results for a simplified MM office building have been presented.
TL;DR: In this paper, a neural network-based nonlinear autoregressive model with external inputs (NNARX) was developed to predict the thermal behavior of an open office in a modern building.
TL;DR: Under certain regularity conditions, it is shown that the LAND estimator is able to identify the underlying true model structure correctly and at the same time estimate the multivariate regression function consistently.
Abstract: Partially linear models provide a useful class of tools for modeling complex data by naturally incorporating a combination of linear and nonlinear effects within one framework. One key question in partially linear models is the choice of model structure, that is, how to decide which covariates are linear and which are nonlinear. This is a fundamental, yet largely unsolved problem for partially linear models. In practice, one often assumes that the model structure is given or known and then makes estimation and inference based on that structure. Alternatively, there are two methods in common use for tackling the problem: hypotheses testing and visual screening based on the marginal fits. Both methods are quite useful in practice but have their drawbacks. First, it is difficult to construct a powerful procedure for testing multiple hypotheses of linear against nonlinear fits. Second, the screening procedure based on the scatterplots of individual covariate fits may provide an educated guess on the regressio...
TL;DR: This paper derives conditions under which the Lasso estimator for the autoregressive coefficients is model selection consistent, estimation consistent and prediction consistent and derives theoretical results establishing various types of consistency.
TL;DR: In this article, a nonparametric linear model is proposed to estimate the link function nonparametrically and an approach to multi-index modeling is proposed using adaptively defined linear projections of functional data.
Abstract: Fully nonparametric methods for regression from functional data have poor accuracy
from a statistical viewpoint, reflecting the fact that their convergence rates are slower
than nonparametric rates for the estimation of high-dimensional functions. This difficulty
has led to an emphasis on the so-called functional linear model, which is much more
flexible than common linear models in finite dimension, but nevertheless imposes structural
constraints on the relationship between predictors and responses. Recent advances have
extended the linear approach by using it in conjunction with link functions, and by
considering multiple indices, but the flexibility of this technique is still limited. For
example, the link may be modeled parametrically or on a grid only, or may be constrained by
an assumption such as monotonicity; multiple indices have been modeled by making
finite-dimensional assumptions. In this paper we introduce a new technique for estimating
the link function nonparametrically, and we suggest an approach to multi-index modeling
using adaptively defined linear projections of functional data. We show that our methods
enable prediction with polynomial convergence rates. The finite sample performance of our
methods is studied in simulations, and is illustrated by an application to a functional
regression problem.
TL;DR: The forecasting model with which the author participated in the NN5 forecasting competition is introduced, to utilize the concept of forecast combination, which has proven to be an effective methodology in the forecasting literature.
TL;DR: In this article, a model-based supervisory and optimal control strategy for central chiller plants is presented to enhance their energy efficiency and control performance. And the optimal strategy is formulated using simplified models of major components and the genetic algorithm (GA).
TL;DR: The results show that the traditional recursive partial least squares algorithm struggles to deliver accurate predictions, and by exploiting the two-level adaptation scheme, the proposed algorithm delivers more accurate results.
TL;DR: It is shown that, although the predicted values can vary with the assumed distribution, the prediction accuracy is little affected for mild-to-moderate violations of the assumptions, and standard approaches, readily available in statistical software, will often suffice.
Abstract: Statistical models that include random effects are commonly used to analyze longitudinal and correlated data, often with the assumption that the random effects follow a Gaussian distribution. Via theoretical and numerical calculations and simulation, we investigate the impact of misspecification of this distribution on both how well the predicted values recover the true underlying distribution and the accuracy of prediction of the realized values of the random effects. We show that, although the predicted values can vary with the assumed distribution, the prediction accuracy, as measured by mean square error, is little affected for mild-to-moderate violations of the assumptions. Thus, standard approaches, readily available in statistical software, will often suffice. The results are illustrated using data from the Heart and Estrogen/Progestin Replacement Study using models to predict future blood pressure values.