TL;DR: In this article, a non-parametric method for multivariate analysis of variance, based on sums of squared distances, is proposed. But it is not suitable for most ecological multivariate data sets.
Abstract: Hypothesis-testing methods for multivariate data are needed to make rigorous probability statements about the effects of factors and their interactions in experiments. Analysis of variance is particularly powerful for the analysis of univariate data. The traditional multivariate analogues, however, are too stringent in their assumptions for most ecological multivariate data sets. Non-parametric methods, based on permutation tests, are preferable. This paper describes a new non-parametric method for multivariate analysis of variance, after McArdle and Anderson (in press). It is given here, with several applications in ecology, to provide an alternative and perhaps more intuitive formulation for ANOVA (based on sums of squared distances) to complement the description pro- vided by McArdle and Anderson (in press) for the analysis of any linear model. It is an improvement on previous non-parametric methods because it allows a direct additive partitioning of variation for complex models. It does this while maintaining the flexibility and lack of formal assumptions of other non-parametric methods. The test- statistic is a multivariate analogue to Fisher's F-ratio and is calculated directly from any symmetric distance or dissimilarity matrix. P-values are then obtained using permutations. Some examples of the method are given for tests involving several factors, including factorial and hierarchical (nested) designs and tests of interactions.
TL;DR: In this article, the authors present a case study in least squares fitting and interpretation of a linear model, where they use nonparametric transformations of X and Y to fit a linear regression model.
Abstract: Introduction * General Aspects of Fitting Regression Models * Missing Data * Multivariable Modeling Strategies * Resampling, Validating, Describing, and Simplifying the Model * S-PLUS Software * Case Study in Least Squares Fitting and Interpretation of a Linear Model * Case Study in Imputation and Data Reduction * Overview of Maximum Likelihood Estimation * Binary Logistic Regression * Logistic Model Case Study 1: Predicting Cause of Death * Logistic Model Case Study 2: Survival of Titanic Passengers * Ordinal Logistic Regression * Case Study in Ordinal Regrssion, Data Reduction, and Penalization * Models Using Nonparametic Transformations of X and Y * Introduction to Survival Analysis * Parametric Survival Models * Case Study in Parametric Survival Modeling and Model Approximation * Cox Proportional Hazards Regression Model * Case Study in Cox Regression
TL;DR: The distance-based redundancy analysis (db-RDA) as mentioned in this paper is a nonparametric multivariate analysis of ecological data using permutation tests that is used to partition the variability in the data according to a complex design or model, as is often required in ecological experiments.
Abstract: Nonparametric multivariate analysis of ecological data using permutation tests has two main challenges: (1) to partition the variability in the data according to a complex design or model, as is often required in ecological experiments, and (2) to base the analysis on a multivariate distance measure (such as the semimetric Bray-Curtis measure) that is reasonable for ecological data sets. Previous nonparametric methods have succeeded in one or other of these areas, but not in both. A recent contribution to Ecological Monographs by Legendre and Anderson, called distance-based redundancy analysis (db-RDA), does achieve both. It does this by calculating principal coordinates and subsequently correcting for negative eigenvalues, if they are present, by adding a constant to squared distances. We show here that such a correction is not necessary. Partitioning can be achieved directly from the distance matrix itself, with no corrections and no eigenanalysis, even if the distance measure used is semimetric. An ecological example is given to show the differences in these statistical methods. Empirical simulations, based on parameters estimated from real ecological species abundance data, showed that db-RDA done on multifactorial designs (using the correction) does not have type 1 error consistent with the significance level chosen for the analysis (i.e., does not provide an exact test), whereas the direct method described and advocated here does.
TL;DR: Estimation is improved by using nonlinear spatial filtering to smooth the estimated autocorrelation, but only within tissue type, and reduced bias to close to zero at probability levels as low as 1 x 10(-5).
TL;DR: In this paper, the authors present a model for estimating the effect of random effects on a set of variables in a linear mixed model with the objective of finding the probability of a given variable having a given effect.
Abstract: Preface. Preface to the First Edition. 1. Introduction. 1.1 Models. 1.2 Factors, Levels, Cells, Effects And Data. 1.3 Fixed Effects Models. 1.4 Random Effects Models. 1.5 Linear Mixed Models (Lmms). 1.6 Fixed Or Random? 1.7 Inference. 1.8 Computer Software. 1.9 Exercises. 2. One-Way Classifications. 2.1 Normality And Fixed Effects. 2.2 Normality, Random Effects And MLE. 2.3 Normality, Random Effects And REM1. 2.4 More On Random Effects And Normality. 2.5 Binary Data: Fixed Effects. 2.6 Binary Data: Random Effects. 2.7 Computing. 2.8 Exercises. 3. Single-Predictor Regression. 3.1 Introduction. 3.2 Normality: Simple Linear Regression. 3.3 Normality: A Nonlinear Model. 3.4 Transforming Versus Linking. 3.5 Random Intercepts: Balanced Data. 3.6 Random Intercepts: Unbalanced Data. 3.7 Bernoulli - Logistic Regression. 3.8 Bernoulli - Logistic With Random Intercepts. 3.9 Exercises. 4. Linear Models (LMs). 4.1 A General Model. 4.2 A Linear Model For Fixed Effects. 4.3 Mle Under Normality. 4.4 Sufficient Statistics. 4.5 Many Apparent Estimators. 4.6 Estimable Functions. 4.7 A Numerical Example. 4.8 Estimating Residual Variance. 4.9 Comments On The 1- And 2-Way Classifications. 4.10 Testing Linear Hypotheses. 4.11 T-Tests And Confidence Intervals. 4.12 Unique Estimation Using Restrictions. 4.13 Exercises. 5. Generalized Linear Models (GLMs). 5.1 Introduction. 5.2 Structure Of The Model. 5.3 Transforming Versus Linking. 5.4 Estimation By Maximum Likelihood. 5.5 Tests Of Hypotheses. 5.6 Maximum Quasi-Likelihood. 5.7 Exercises. 6. Linear Mixed Models (LMMs). 6.1 A General Model. 6.2 Attributing Structure To VAR(y). 6.3 Estimating Fixed Effects For V Known. 6.4 Estimating Fixed Effects For V Unknown. 6.5 Predicting Random Effects For V Known. 6.6 Predicting Random Effects For V Unknown. 6.7 Anova Estimation Of Variance Components. 6.8 Maximum Likelihood (Ml) Estimation. 6.9 Restricted Maximum Likelihood (REMl). 6.10 Notes And Extensions. 6.11 Appendix For Chapter 6. 6.12 Exercises. 7. Generalized Linear Mixed Models. 7.1 Introduction. 7.2 Structure Of The Model. 7.3 Consequences Of Having Random Effects. 7.4 Estimation By Maximum Likelihood. 7.5 Other Methods Of Estimation. 7.6 Tests Of Hypotheses. 7.7 Illustration: Chestnut Leaf Blight. 7.8 Exercises. 8. Models for Longitudinal data. 8.1 Introduction. 8.2 A Model For Balanced Data. 8.3 A Mixed Model Approach. 8.4 Random Intercept And Slope Models. 8.5 Predicting Random Effects. 8.6 Estimating Parameters. 8.7 Unbalanced Data. 8.8 Models For Non-Normal Responses. 8.9 A Summary Of Results. 8.10 Appendix. 8.11 Exercises. 9. Marginal Models. 9.1 Introduction. 9.2 Examples Of Marginal Regression Models. 9.3 Generalized Estimating Equations. 9.4 Contrasting Marginal And Conditional Models. 9.5 Exercises. 10. Multivariate Models. 10.1 Introduction. 10.2 Multivariate Normal Outcomes. 10.3 Non-Normally Distributed Outcomes. 10.4 Correlated Random Effects. 10.5 Likelihood Based Analysis. 10.6 Example: Osteoarthritis Initiative. 10.7 Notes And Extensions. 10.8 Exercises. 11. Nonlinear Models. 11.1 Introduction. 11.2 Example: Corn Photosynthesis. 11.3 Pharmacokinetic Models. 11.4 Computations For Nonlinear Mixed Models. 11.5 Exercises. 12. Departures From Assumptions. 12.1 Introduction. 12.2 Misspecifications Of Conditional Model For Response. 12.3 Misspecifications Of Random Effects Distribution. 12.4 Methods To Diagnose And Correct For Misspecifications. 12.5 Exercises. 13. Prediction. 13.1 Introduction. 13.2 Best Prediction (BP). 13.3 Best Linear Prediction (BLP). 13.4 Linear Mixed Model Prediction (BLUP). 13.5 Required Assumptions. 13.6 Estimated Best Prediction. 13.7 Henderson's Mixed Model Equations. 13.8 Appendix. 13.9 Exercises. 14. Computing. 14.1 Introduction. 14.2 Computing Ml Estimates For LMMs. 14.3 Computing Ml Estimates For GLMMs. 14.4 Penalized Quasi-Likelihood And Laplace. 14.5 Exercises. Appendix M: Some Matrix Results. M.1 Vectors And Matrices Of Ones. M.2 Kronecker (Or Direct) Products. M.3 A Matrix Notation. M.4 Generalized Inverses. M.5 Differential Calculus. Appendix S: Some Statistical Results. S.1 Moments. S.2 Normal Distributions. S.3 Exponential Families. S.4 Maximum Likelihood. S.5 Likelihood Ratio Tests. S.6 MLE Under Normality. References. Index.
TL;DR: This work proposes a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term.
Abstract: Correlated response data are common in biomedical studies. Regression analysis based on the generalized estimating equations (GEE) is an increasingly important method for such data. However, there seem to be few model-selection criteria available in GEE. The well-known Akaike Information Criterion (AIC) cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is nonlikelihood based. We propose a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term. Its performance is investigated through simulation studies. For illustration, the method is applied to a real data set.
TL;DR: The authors present a fully constrained least squares (FCLS) linear spectral mixture analysis method for material quantification, where no closed form can be derived for this method and an efficient algorithm is developed to yield optimal solutions.
Abstract: Linear spectral mixture analysis (LSMA) is a widely used technique in remote sensing to estimate abundance fractions of materials present in an image pixel. In order for an LSMA-based estimator to produce accurate amounts of material abundance, it generally requires two constraints imposed on the linear mixture model used in LSMA, which are the abundance sum-to-one constraint and the abundance nonnegativity constraint. The first constraint requires the sum of the abundance fractions of materials present in an image pixel to be one and the second imposes a constraint that these abundance fractions be nonnegative. While the first constraint is easy to deal with, the second constraint is difficult to implement since it results in a set of inequalities and can only be solved by numerical methods. Consequently, most LSMA-based methods are unconstrained and produce solutions that do not necessarily reflect the true abundance fractions of materials. In this case, they can only be used for the purposes of material detection, discrimination, and classification, but not for material quantification. The authors present a fully constrained least squares (FCLS) linear spectral mixture analysis method for material quantification. Since no closed form can be derived for this method, an efficient algorithm is developed to yield optimal solutions. In order to further apply the designed algorithm to unknown image scenes, an unsupervised least squares error (LSE)-based method is also proposed to extend the FCLS method in an unsupervised manner.
TL;DR: This paper provides a summary of recent empirical and theoretical results concerning available methods and gives recommendations for their use in univariate and multivariate applications.
Abstract: The most appropriate strategy to be used to create a permutation distribution for tests of individual terms in complex experimental designs is currently unclear. There are often many possibilities, including restricted permutation or permutation of some form of residuals. This paper provides a summary of recent empirical and theoretical results concerning available methods and gives recommendations for their use in univariate and multivariate applications. The focus of the paper is on complex designs in analysis of variance and multiple regression (i.e., linear models). The assumption of exchangeability required for a permutation test is assured by random allocation of treatments to units in experimental work. For observational data, exchangeability is tantamount to the assumption of independent and identically distributed errors under a null hypothesis. For partial regression, the method of permutation of residuals under a reduced model has been shown to provide the best test. For analysis of variance, o...
TL;DR: In this paper, the authors propose a partially noninformative prior structure related to a Natural Conjugate g-prior speciflcation, where the amount of subjective information requested from the user is limited to the choice of a single scalar hyperparameter g0j.
TL;DR: This article reviews the principle of minimum description length (MDL) for problems of model selection, and illustrates the MDL principle by considering problems in regression, nonparametric curve estimation, cluster analysis, and time series analysis.
Abstract: This article reviews the principle of minimum description length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed attention within the statistics community. Here we review both the practical and the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines we find many interesting interpretations of popular frequentist and Bayesian procedures. As we show, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate the MDL principle by considering problems in regression, nonpar...
TL;DR: In this paper, the authors present a framework for estimating and testing t-tests in terms of statistical models by embedding Hypotheses Hypothesis and Significance Testing and Interpretation of the p-value classes of Statistical Models Data Structures.
Abstract: Statistical Models Mathematical and Statistical Models Functional Aspects of Models The Inferential Steps o Estimation and Testing t-Tests in Terms of Statistical Models Embedding Hypotheses Hypothesis and Significance Testing o Interpretation of the p-Value Classes of Statistical Models Data Structures Introduction Classification by Response Type Classification by Study Type Clustered Data Autocorrelated Data From Independent to Spatial Data o A Progression of Clustering Linear Algebra Tools Introduction Matrices and Vectors Basic Matrix Operations Matrix Inversion o Regular and Generalized Inverse Mean, Variance, and Covariance of Random Vectors The Trace and Expectation of Quadratic Forms The Multivariate Gaussian Distribution Matrix and Vector Differentiation Using Matrix Algebra to Specify Models The Classical Linear Model: Least Squares and Alternatives Introduction Least Squares Estimation and Partitioning of Variation Factorial Classification Diagnosing Regression Models Diagnosing Classification Models Robust Estimation Nonparametric Regression Nonlinear Models Introduction Models as Laws or Tools Linear Polynomials Approximate Nonlinear Models Fitting a Nonlinear Model to Data Hypothesis Tests and Confidence Intervals Transformations Parameterization of Nonlinear Models Applications Generalized Linear Models Introduction Components of a Generalized Linear Model Grouped and Ungrouped Data Parameter Estimation and Inference Modeling an Ordinal Response Overdispersion Applications Linear Mixed Models for Clustered Data Introduction The Laird-Ware Model Choosing the Inference Space Estimation and Inference Correlations in Mixed Models Applications Nonlinear Models for Clustered Data Introduction Nonlinear and Generalized Linear Mixed Models Towards an Approximate Objective Function Applications Statistical Models for Spatial Data Changing the Mindset Semivariogram Analysis and Estimation The Spatial Model Spatial Prediction and the Kriging Paradigm Spatial Regression and Classification Models Autoregressive Models for Lattice Data Analyzing Mapped Spatial Point Patterns Applications Bibliography
TL;DR: In this article, the authors provide a review of linear panel data models with predetermined variables and compare the identification from moment conditions in each case, and the implications of alternative feedback schemes for the time series properties of the errors.
Abstract: This chapter focuses on two of the developments in panel data econometrics since the Handbook chapter by Chamberlain (1984). The first objective of this chapter is to provide a review of linear panel data models with predetermined variables. We discuss the implications of assuming that explanatory variables are predetermined as opposed to strictly exogenous in dynamic structural equations with unobserved heterogeneity. We compare the identification from moment conditions in each case, and the implications of alternative feedback schemes for the time series properties of the errors. We next consider autoregressive error component models under various auxiliary assumptions. There is a trade-off between robustness and efficiency since assumptions of stationary initial conditions or time series homoskedasticity can be very informative, but estimators are not robust to their violation. We also discuss the identification problems that arise in models with predetermined variables and multiple effects. Concerning inference in linear models with predetermined variables, we discuss the form of optimal instruments, and the sampling properties of GMM and LIML-analogue estimators drawing on Monte Carlo results and asymptotic approximations. A number of identification results for limited dependent variable models with fixed effects and strictly exogenous variables are available in the literature, as well as some results on consistent and asymptotically normal estimation of such models. There are also some results available for models of this type including lags of the dependent variable, although even less is known for nonlinear dynamic models. Reviewing the recent work on discrete choice and selectivity models with fixed effects is the second objective of this chapter. A feature of parametric limited dependent variable models is their fragility to auxiliary distributional assumptions. This situation prompted the development of a large literature dealing with semiparametric alternatives (reviewed in Powell, 1994’s chapter). The work that we review in the second part of the chapter is thus at the intersection of the panel data literature and that on cross-sectional semiparametric limited dependent variable models.
TL;DR: This work derives sufficient conditions for the stability of moving horizon state estimation with linear models subject to constraints on the estimate, and discusses smoothing strategies for moving horizon estimation.
TL;DR: In this paper, the authors compare the distributions of the test statistics under various permutation methods and show that the partial correlations under permutation are asymptotically jointly normal with means 0 and variances 1.
Abstract: Summary Several approximate permutation tests have been proposed for tests of partial regression coefficients in a linear model based on sample partial correlations. This paper begins with an explanation and notation for an exact test. It then compares the distributions of the test statistics under the various permutation methods proposed, and shows that the partial correlations under permutation are asymptotically jointly normal with means 0 and variances 1. The method of Freedman & Lane (1983) is found to have asymptotic correlation 1 with the exact test, and the other methods are found to have smaller correlations with this test. Under local alternatives the critical values of all the approximate permutation tests converge to the same constant, so they all have the same asymptotic power. Simulations demonstrate these theoretical results.
TL;DR: In this article, the authors consider the nonparametric and semiparametric methods for estimating regression models with continuous endogenous regressors and identify the "average structural function" as a parameter of central interest.
Abstract: This paper considers the nonparametric and semiparametric methods for estimating regression models with continuous endogenous regressors. We list a number of different generalizations of the linear structural equation model, and discuss how two common estimation approaches for linear equations — the "instrumental variables" and "control function" approaches — may be extended to nonparametric generalizations of the linear model and to their semiparametric variants. We consider the identification and estimation of the "Average Structural Function" and argue that this is a parameter of central interest in the analysis of semiparametric and non- parametric models with endogenous regressors. We consider a particular semiparametric model, the binary response model with linear index function and nonparametric error distribution, and describes in detail how estimation of the parameters of interest can be constructed using the "control function" approach. This estimator is applied to estimating the relation of labor force participation to nonlabor income, viewed as an endogenous regressor.
TL;DR: In this article, the authors extend the two-part regression approach to longitudinal settings by introducing random coefficients into both the logistic and the linear stages, and obtain maximum likelihood estimates for the fixed coefficients and variance components by an approximate Fisher scoring procedure based on high-order Laplace approximations.
Abstract: A semicontinuous variable has a portion of responses equal to a single value (typically 0) and a continuous, often skewed, distribution among the remaining values. In cross-sectional analyses, variables of this type may be described by a pair of regression models; for example, a logistic model for the probability of nonzero response and a conditional linear model for the mean response given that it is nonzero. We extend this two-part regression approach to longitudinal settings by introducing random coefficients into both the logistic and the linear stages. Fitting a two-part random-effects model poses computational challenges similar to those found with generalized linear mixed models. We obtain maximum likelihood estimates for the fixed coefficients and variance components by an approximate Fisher scoring procedure based on high-order Laplace approximations. To illustrate, we apply the technique to data from the Adolescent Alcohol Prevention Trial, examining reported recent alcohol use for students in g...
TL;DR: In this article, the authors present a generalization of the generalized linear model to a nonlinear model and show that the nonlinear models can be transformed to a linear model using a linear regression model.
Abstract: Preface. 1. Introduction to Generalized Linear Models. 1.1 Linear Models. 1.2 Nonlinear Models. 1.3 The Generalized Linear Model. 2. Linear Regression Models. 2.1 The Linear Regression Model and Its Application. 2.2 Multiple Regression Models. 2.3 Parameter Estimation Using Maximum Likelihood. 2.4 Model Adequacy Checking. 2.5 Using R to Perform Linear Regression Analysis. 2.6 Parameter Estimation by Weighted Least Squares. 2.7 Designs for Regression Models. 3. Nonlinear Regression Models. 3.1 Linear and Nonlinear Regression Models. 3.2 Transforming to a Linear Model. 3.3 Parameter Estimation in a Nonlinear System. 3.4 Statistical Inference in Nonlinear Regression. 3.5 Weighted Nonlinear Regression. 3.6 Examples of Nonlinear Regression Models. 3.7 Designs for Nonlinear Regression Models. 4. Logistic and Poisson Regression Models. 4.1 Regression Models Where the Variance Is a Function of theMean. 4.2 Logistic Regression Models. 4.3 Poisson Regression. 4.4 Overdispersion in Logistic and Poisson Regression. 5. The Generalized Linear Model. 5.1 The Exponential Family of Distributions. 5.2 Formal Structure for the Class of Generalized LinearModels. 5.3 Likelihood Equations for Generalized Linear models. 5.4 Quasi-Likelihood. 5.5 Other Important Distributions for Generalized LinearModels. 5.6 A Class of Link Functions The Power Function. 5.7 Inference and Residual Analysis for Generalized LinearModels. 5.8 Examples with the Gamma Distribution. 5.9 Using R to Perform GLM Analysis. 5.10 GLM and Data Transformation. 5.11 Modeling Both a Process Mean and Process Variance UsingGLM. 5.12 Quality of Asymptotic Results and Related Issues. 6. Generalized Estimating Equations. 6.1 Data Layout for Longitudinal Studies. 6.2 Impact of the Correlation Matrix R. 6.3 Iterative Procedure in the Normal Case, Identity Link. 6.4 Generalized Estimating Equations for More Generalized LinearModels. 6.5 Examples. 6.6 Summary. 7. Random Effects in Generalized Linear Models. 7.1 Linear Mixed Effects Models. 7.2 Generalized Linear Mixed Models. 7.3 Generalized Linear Mixed Models Using Bayesian. 8. Designed Experiments and the Generalized LinearModel. 8.1 Introduction. 8.2 Experimental Designs for Generalized Linear Models. 8.3 GLM Analysis of Screening Experiments. Appendix A.1 Background on Basic Test Statistics. Appendix A.2 Background from the Theory of LinearModels. Appendix A.3 The Gauss Markov Theorem, Var( ) = 2I. Appendix A.4 The Relationship Between Maximum LikelihoodEstimation of the Logistic Regression Model and Weighted LeastSquares. Appendix A.5 Computational Details for GLMs for a CanonicalLink. Appendix A.6 Computations Details for GLMs for a NoncanonicalLink. References. Index.
TL;DR: The residual index is an ad hoc sequential procedure with no statistical justification, unlike the well-known ancova, and it is suggested that a t-test or an anova of the residuals should never be used in place of an anCova to study condition or any other variable.
Abstract: Summary
1
An analysis of variance (anova) or other linear models of the residuals of a simple linear regression is being increasingly used in ecology to compare two or more groups. Such a procedure (hereafter, ‘residual index’) was used in 8% and 2% of the papers published during 1999 in the Journal of Animal Ecology and in Ecology, respectively, and has been recently recommended for studying condition.
2
Although the residual index is similar to an analysis of covariance (ancova), it is not identical and is incorrect for at least four reasons:
(i)
the regression coefficient used by the residual index differs from the one used in ancova and is not the least-squares estimator of the model.
(ii)
in contrast to the ancova, the error d.f. in the residual index are overestimated because of the estimation of the regression coefficient.
(iii)
the residual index also assumes the homogeneity of regression coefficients (parallelism assumption), which should be tested with a special ancova design.
(iv)
even if the assumptions of the linear model hold for the original variables, they will not hold for the residuals.
3
More importantly, the residual index is an ad hoc sequential procedure with no statistical justification, unlike the well-known ancova. For these reasons, I suggest that a t-test or an anova of the residuals should never be used in place of an ancova to study condition or any other variable.
TL;DR: A method of analyzing collections of related curves in which the individual curves are modeled as spline functions with random coefficients, which produces a low-rank, low-frequency approximation to the covariance structure, which can be estimated naturally by the EM algorithm.
Abstract: We propose a method of analyzing collections of related curves in which the individual curves are modeled as spline functions with random coefficients. The method is applicable when the individual curves are sampled at variable and irregularly spaced points. This produces a low-rank, low-frequency approximation to the covariance structure, which can be estimated naturally by the EM algorithm. Smooth curves for individual trajectories are constructed as best linear unbiased predictor (BLUP) estimates, combining data from that individual and the entire collection. This framework leads naturally to methods for examining the effects of covariates on the shapes of the curves. We use model selection techniques--Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross-validation--to select the number of breakpoints for the spline approximation. We believe that the methodology we propose provides a simple, flexible, and computationally efficient means of functional data analysis.
TL;DR: In this article, a hierarchical generalised linear model (GLM) is developed as a synthesis of generalized linear models, mixed linear models and structured dispersions, and a restricted maximum likelihood method for the estimation of dispersion is extended to a wider class of models.
Abstract: SUMMARY Hierarchical generalised linear models are developed as a synthesis of generalised linear models, mixed linear models and structured dispersions. We generalise the restricted maximum likelihood method for the estimation of dispersion to the wider class and show how the joint fitting of models for mean and dispersion can be expressed by two interconnected generalised linear models. The method allows models with (i) any combination of a generalised linear model distribution for the response with any conjugate distribution for the random effects, (ii) structured dispersion components, (iii) different link and variance functions for the fixed and random effects, and (iv) the use of quasilikelihoods in place of likelihoods for either or both of the mean and dispersion models. Inferences can be made by applying standard procedures, in particular those for model checking, to components of either generalised linear model. We also show by numerical studies that the new method gives an efficient estimation procedure for substantial class of models of practical importance. Likelihood-type inference is extended to this wide class of models in a unified way.
TL;DR: In this article, a unified model-assisted framework has been attempted using a proposed model-calibration technique, which can handle any linear or nonlinear working models and reduce to the conventional calibration estimators of Deville and Sarndal and/or the generalized regression estimators in the linear model case.
Abstract: Suppose that the finite population consists of N identifiable units. Associated with the ith unit are the study variable, yi, and a vector of auxiliary variables, xi. The values x1, x2,…, xN are known for the entire population (i.e., complete) but yi is known only if the ith unit is selected in the sample. One of the fundamental questions is how to effectively use the complete auxiliary information at the estimation stage. In this article, a unified model-assisted framework has been attempted using a proposed model-calibration technique. The proposed model-calibration estimators can handle any linear or nonlinear working models and reduce to the conventional calibration estimators of Deville and Sarndal and/or the generalized regression estimators in the linear model case. The pseudoempirical maximum likelihood estimator of Chen and Sitter, when used in this setting, gives an estimator that is asymptotically equivalent to the model-calibration estimator but with positive weights. Some existing estimators ...
TL;DR: The two main approaches to predictive statistical modeling, content-based and collaborative, are reviewed, and the main techniques used to develop predictive statistical models are discussed.
Abstract: The limitations of traditional knowledge representation methods for modeling complex human behaviour led to the investigation of statistical models. Predictive statistical models enable the anticipation of certain aspects of human behaviour, such as goals, actions and preferences. In this paper, we motivate the development of these models in the context of the user modeling enterprise. We then review the two main approaches to predictive statistical modeling, content-based and collaborative, and discuss the main techniques used to develop predictive statistical models. We also consider the evaluation requirements of these models in the user modeling context, and propose topics for future research.
TL;DR: In this paper, different specifications of conditional expectations are compared with nonparametric techniques that make no assumptions about the distribution of the data, and the conditional mean and variance of the NYSE market return are examined.
TL;DR: It is shown that conditioned on an individual hyper-parameter, the marginal likelihood has a unique maximum which is computable in closed form, and it is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model.
Abstract: The recent introduction of the 'relevance vector machine' has effectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyper-parameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model.
TL;DR: It is demonstrated that standard information criteria may be used to choose the tuning parameter and detect departures from normality, and the approach is illustrated via simulation and using longitudinal data from the Framingham study.
Abstract: Normality of random effects is a routine assumption for the linear mixed model, but it may be unrealistic, obscuring important features of among-individual variation. We relax this assumption by approximating the random effects density by the seminonparameteric (SNP) representation of Gallant and Nychka (1987, Econometrics 55, 363-390), which includes normality as a special case and provides flexibility in capturing a broad range of nonnormal behavior, controlled by a user-chosen tuning parameter. An advantage is that the marginal likelihood may be expressed in closed form, so inference may be carried out using standard optimization techniques. We demonstrate that standard information criteria may be used to choose the tuning parameter and detect departures from normality, and we illustrate the approach via simulation and using longitudinal data from the Framingham study.
TL;DR: In this article, the impact of model violations on the estimate of a regression coefficient in a generalised linear mixed model is investigated, and the authors evaluate the asymptotic relative bias that results from incorrect assumptions regarding the random effects.
Abstract: SUMMARY We investigate the impact of model violations on the estimate of a regression coefficient in a generalised linear mixed model. Specifically, we evaluate the asymptotic relative bias that results from incorrect assumptions regarding the random effects. We compare the impact of model violation for two parameterisations of the regression model. Substantial bias in the conditionally specified regression point estimators can result from using a simple random intercepts model when either the random effects distribution depends on measured covariates or there are autoregressive random effects. A marginally specified regression structure that is estimated using maximum likelihood is much less susceptible to bias resulting from random effects model misspecification.
TL;DR: In this paper, the authors describe strategies for solving large nonlinear water resources models management, which combine GAs with linear programming, by identifying a set of complicating variables in the model which, when fixed, render the problem linear in the remaining variables.
TL;DR: In this article, the authors consider models for panel data in which the individual effects vary over time, but the temporal pattern of variation is arbitrary, but it is the same for all individuals.
TL;DR: This work reviews the developments following introduction of the connectivity indices as molecular descriptors in multiple linear regression analysis (MLRA) for structure-property-activity studies and discusses the results obtained with applications of the variable connectivity index.
Abstract: We review the developments following introduction of the connectivity indices as molecular descriptors in multiple linear regression analysis (MLRA) for structure-property-activity studies. We end the review with discussion of results obtained with applications of the variable connectivity index. A comparison is made between some results obtained with the traditional topological indices and the variable connectivity index.
TL;DR: A new method for evaluating the Jacobian term based on the characteristic polynomial of the spatial weights matrix W is outlined, which approaches linear computational complexity, which makes it the fastest direct method currently available, especially for very large data sets.