TL;DR: This paper describes simultaneous inference procedures in general parametric models, where the experimental questions are specified through a linear combination of elemental model parameters, and extends the canonical theory of multiple comparison procedures in ANOVA models to linear regression problems, generalizedlinear models, linear mixed effects models, the Cox model, robust linear models, etc.
Abstract: Simultaneous inference is a common problem in many areas of application. If multiple null hypotheses are tested simultaneously, the probability of rejecting erroneously at least one of them increases beyond the pre-specified significance level. Simultaneous inference procedures have to be used which adjust for multiplicity and thus control the overall type I error rate. In this paper we describe simultaneous inference procedures in general parametric models, where the experimental questions are specified through a linear combination of elemental model parameters. The framework described here is quite general and extends the canonical theory of multiple comparison procedures in ANOVA models to linear regression problems, generalized linear models, linear mixed effects models, the Cox model, robust linear models, etc. Several examples using a variety of different statistical models illustrate the breadth
TL;DR: It is shown that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior and derive, in parallel, oracle inequalities for the prediction risk in the general nonparametric regression model as well as bounds on the l p estimation loss in the linear model.
Abstract: We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the $\ell_p$ estimation loss for $1\le p\le 2$ in the linear model when the number of variables can be much larger than the sample size.
TL;DR: In this paper, the authors present a method for regression analysis of least squares in the context of social science, which is based on the family of powers and roots of linear models.
Abstract: Preface About the Author 1. Statistical Models and Social Science 1.1 Statistical Models and Social Reality 1.2 Observation and Experiment 1.3 Populations and Samples I. DATA CRAFT 2. What Is Regression Analysis? 2.1 Preliminaries 2.2 Naive Nonparametric Regression 2.3 Local Averaging 3. Examining Data 3.1 Univariate Displays 3.2 Plotting Bivariate Data 3.3 Plotting Multivariate Data 4. Transforming Data 4.1 The Family of Powers and Roots 4.2 Transforming Skewness 4.3 Transforming Nonlinearity 4.4 Transforming Nonconstant Spread 4.5 Transforming Proportions 4.6 Estimating Transformations as Parameters* II. LINEAR MODELS AND LEAST SQUARES 5. Linear Least-Squares Regression 5.1 Simple Regression 5.2 Multiple Regression 6. Statistical Inference for Regression 6.1 Simple Regression 6.2 Multiple Regression 6.3 Empirical Versus Structural Relations 6.4 Measurement Error in Explanatory Variables* 7. Dummy-Variable Regression 7.1 A Dichotomous Factor 7.2 Polytomous Factors 7.3 Modeling Interactions 8. Analysis of Variance 8.1 One-Way Analysis of Variance 8.2 Two-Way Analysis of Variance 8.3 Higher-Way Analysis of Variance 8.4 Analysis of Covariance 8.5 Linear Contrasts of Means 9. Statistical Theory for Linear Models* 9.1 Linear Models in Matrix Form 9.2 Least-Squares Fit 9.3 Properties of the Least-Squares Estimator 9.4 Statistical Inference for Linear Models 9.5 Multivariate Linear Models 9.6 Random Regressors 9.7 Specification Error 9.8 Instrumental Variables and Two-Stage Least Squares 10. The Vector Geometry of Linear Models* 10.1 Simple Regression 10.2 Multiple Regression 10.3 Estimating the Error Variance 10.4 Analysis-of-Variance Models III. LINEAR-MODEL DIAGNOSTICS 11. Unusual and Influential Data 11.1 Outliers, Leverage, and Influence 11.2 Assessing Leverage: Hat-Values 11.3 Detecting Outliers: Studentized Residuals 11.4 Measuring Influence 11.5 Numerical Cutoffs for Diagnostic Statistics 11.6 Joint Influence 11.7 Should Unusual Data Be Discarded? 11.8 Some Statistical Details* 12. Non-Normality, Nonconstant Error Variance, Nonlinearity 12.1 Non-Normally Distributed Errors 12.2 Nonconstant Error Variance 12.3 Nonlinearity 12.4 Discrete Data 12.5 Maximum-Likelihood Methods* 12.6 Structural Dimension 13. Collinearity and Its Purported Remedies 13.1 Detecting Collinearity 13.2 Coping With Collinearity: No Quick Fix IV. GENERALIZED LINEAR MODELS 14. Logit and Probit Models for Categorical Response Variables 14.1 Models for Dichotomous Data 14.2 Models for Polytomous Data 14.3 Discrete Explanatory Variables and Contingency Tables 15. Generalized Linear Models 15.1 The Structure of Generalized Linear Models 15.2 Generalized Linear Models for Counts 15.3 Statistical Theory for Generalized Linear Models* 15.4 Diagnostics for Generalized Linear Models 15.5 Analyzing Data From Complex Sample Surveys V. EXTENDING LINEAR AND GENERALIZED LINEAR MODELS 16. Time-Series Regression and Generalized Leasr Squares* 16.1 Generalized Least-Squares Estimation 16.2 Serially Correlated Errors 16.3 GLS Estimation With Autocorrelated Errors 16.4 Correcting OLS Inference for Autocorrelated Errors 16.5 Diagnosing Serially Correlated Errors 16.6 Concluding Remarks 17. Nonlinear Regression 17.1 Polynomial Regression 17.2 Piece-wise Polynomials and Regression Splines 17.3 Transformable Nonlinearity 17.4 Nonlinear Least Squares* 18. Nonparametric Regression 18.1 Nonparametric Simple Regression: Scatterplot Smoothing 18.2 Nonparametric Multiple Regression 18.3 Generalized Nonparametric Regression 19. Robust Regression* 19.1 M Estimation 19.2 Bounded-Influence Regression 19.3 Quantile Regression 19.4 Robust Estimation of Generalized Linear Models 19.5 Concluding Remarks 20. Missing Data in Regression Models 20.1 Missing Data Basics 20.2 Traditional Approaches to Missing Data 20.3 Maximum-Likelihood Estimation for Data Missing at Random* 20.4 Bayesian Multiple Imputation 20.5 Selection Bias and Censoring 21. Bootstrapping Regression Models 21.1 Bootstrapping Basics 21.2 Bootstrap Confidence Intervals 21.3 Bootstrapping Regression Models 21.4 Bootstrap Hypothesis Tests* 21.5 Bootstrapping Complex Sampling Designs 21.6 Concluding Remarks 22. Model Selection, Averaging, and Validation 22.1 Model Selection 22.2 Model Averaging* 22.3 Model Validation VI. MIXED-EFFECT MODELS 23. Linear Mixed-Effects Models for Hierarchical and Longitudinal Data 23.1 Hierarchical and Longitudinal Data 23.2 The Linear Mixed-Effects Model 23.3 Modeling Hierarchical Data 23.4 Modeling Longitudinal Data 23.5 Wald Tests for Fixed Effects 23.6 Likelihood-Ratio Tests of Variance and Covariance Components 23.7 Centering Explanatory Variables, Contextual Effects, and Fixed-Effects Models 23.8 BLUPs 23.9 Statistical Details* 24. Generalized Linear and Nonlinear Mixed-Effects Models 24.1 Generalized Linear Mixed Models 24.2 Nonlinear Mixed Models Appendix A References Author Index Subject Index Data Set Index
TL;DR: An efficient algorithm is presented, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem.
Abstract: Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.
TL;DR: The nature and uses of Forecasting, and some Comments on Practical Implementation and use of Statistical Forecasting Techniques, are outlined.
Abstract: 1. Introduction to Forecasting. 1.1 The Nature and uses of Forecasts. 1.2 Some Examples of Time Series. 1.3 The Forecasting Process. 1.4 Resources for Forecasting. 2. Statistics Background for Forecasting. 2.1 Introduction. 2.2 Graphical Displays. 2.3 Numerical Description of Time Series Data. 2.4 Use of Data Transformations and Adjustments. 2.5 General Approach to Time Series Analysis and Forecasting. 2.6 Evaluating and Monitoring Forecasting Model Performance. 3. Regression Analysis and Forecasting. 3.1 Introduction. 3.2 Least Squares Estimation in Linear Regression Models. 3.3 Statistical Inference in Linear Regression. 3.4 Prediction of New Observations. 3.5 Model Adequacy Checking. 3.6 Variable Selection Methods in Regression. 3.7 Generalized and Weighted Least Squares. 3.8 Regression Models for General Time Series Data. 4. Exponential Smoothing Methods. 4.1 Introduction. 4.2 First-Order Exponential Smoothing. 4.3 Modeling Time series Data. 4.4 Second-Order Exponential Smoothing. 4.5 Higher-Order Exponential Smoothing. 4.6 Forecasting. 4.7 Exponential Smoothing for Seasonal Data. 4.8 Exponential Smoothers and ARIMA Models. 5. Autoregressive Integrated Moving Average (ARIMA) Models. 5.1 Introduction. 5.2 Linear Models for Stationary Time Series. 5.3 Finite Order Moving Average (MA) Processes. 5.4 Finite Order Autoregressive Processes. 5.5 Mixed Autoregressive-Moving Average (ARMA) Processes. 5.6 Non-stationary Processes. 5.7 Time Series Model Building . 5.8 Forecasting ARIMA Processes . 5.9 Seasonal Processes. 5.10 Final Comments. 6. Transfer Function and Intervention Models. 6.1 Introduction. 6.2 Transfer Function Models. 6.3 Transfer Function-Noise Models. 6.4 Cross Correlation Function. 6.5 Model Specification. 6.6 Forecasting with Transfer Function-Noise Models. 6.7 Intervention Analysis. 7. Survey of Other Forecasting Methods. 7.1 Multivariate Time Series Models and Forecasting. 7.2 State Space Models. 7.3 ARCH and GARCH Models. 7.4 Direct Forecasting of Percentiles. 7.5 Combining Forecasts to Improve Prediction Performance. 7.6 Aggregation and Disaggregation of Forecasts. 7.7 Neural Networks and Forecasting. 7.8 Some Comments on Practical Implementation and use of Statistical Forecasting Techniques. Bibliography. Appendix. Appendix A Statistical Tables. Table A.1 Cumulative Normal Distribution. Table A.2 Percentage Points of the Chi-Square Distribution. Table A.3 Percentage Points of the t Distribution. Table A.4 Percentage Points of the F Distribution. Table A.5 Critical Values of the Durbin-Watson Statistic. Appendix B Data Sets for Exercises. Table B.1 Market Yield on U.S. Treasury Securities at 10-year Constant Maturity. Table B.2 Pharmaceutical Product Sales. Table B.3 Chemical Process Viscosity. Table B.4 U.S Production of Blue and Gorgonzola Cheeses. Table B.5 U.S. Beverage Manufacturer Product Shipments, Unadjusted. Table B.6 Global Mean Surface Air Temperature Anomaly and Global CO22 Concentration. Table B.7 Whole Foods Market Stock Price, Daily Closing Adjusted for Splits. Table B.8 Unemployment Rate - Full-Time Labor Force, Not Seasonally Adjusted. Table B.9 International Sunspot Numbers. Table B.10 United Kingdom Airline Miles Flown. Table B.11 Champagne Sales. Table B.12 Chemical Process Yield, with Operating Temperature (Uncontrolled). Table B.13 U.S. Production of Ice Cream and Frozen Yogurt. Table B.14 Atmospheric CO2 Concentrations at Mauna Loa Observatory. Table B.15 U.S. National Violent Crime Rate. Table B.16 U.S. Gross Domestic Product. Table B.17 U.S. Total Energy Consumption. Table B.18 U.S. Coal Production. Table B.19 Arizona Drowning Rate, Children 1-4 Years Old. Table B.20 U.S. Internal Revenue Tax Refunds. Index.
TL;DR: The authors revisited the effects of spending on student performance using data from the state of Michigan and found that spending has nontrivial and statistically significant effects, although the diminishing effect is not especially pronounced.
TL;DR: In this paper, the authors introduce linear innovations state space models, non-linear and heteroscedastic innovations state spaces models, and regression models for counting and inventory control applications.
Abstract: I. Introduction: Basic concepts.- Getting started. II. Essentials: Linear innovations state space models.- Non-linear and heteroscedastic innovations state space models.- Estimation of innovations state space models.- Prediction distributions and intervals.- Selection of models. III. Further topics: Normalizing seasonal components.- Models with regressor variables.- Some properties of linear models.- Reduced forms and relationships with ARIMA models.- Linear innovations state space models with random seed states.- Conventional state space models.- Time series with multiple seasonal patterns.- Non-linear models for positive data.- Models for count data.- Vector exponential smoothing. IV. Applications: Inventory control application.- Conditional heteroscedasticity and finance applications.- Economic applications: the Beveridge-Nelson decomposition.
TL;DR: In this article, a linear combination of simple rules derived from the data is used for general regression and classification models, where each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables.
Abstract: General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predictive accuracy comparable to the best methods. However, their principal advantage lies in interpretation. Because of its simple form, each rule is easy to understand, as is its influence on individual predictions, selected subsets of predictions, or globally over the entire space of joint input variable values. Similarly, the degree of relevance of the respective input variables can be assessed globally, locally in different regions of the input space, or at individual prediction points. Techniques are presented for automatically identifying those variables that are involved in interactions with other variables, the strength and degree of those interactions, as well as the identities of the other variables with which they interact. Graphical representations are used to visualize both main and interaction effects.
TL;DR: The LBA model successfully accommodates empirical phenomena from binary and multiple choice tasks that have proven difficult for other theoretical accounts, and is encouraging in a field beset by the tradeoff between complexity and completeness.
TL;DR: This work marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings, which results in a nonparametric model for dynamical systems that accounts for uncertainty in the model.
Abstract: We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensional motion capture data. A GPDM is a latent variable model. It comprises a low-dimensional latent space with associated dynamics, as well as a map from the latent space to an observation space. We marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach and compare four learning algorithms on human motion capture data, in which each pose is 50-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces.
TL;DR: A general model that subsumes many parametric models for continuous data that can be inverted using exactly the same scheme, namely, dynamic expectation maximization, and is formulated as a simple neural network that may provide a useful metaphor for inference and learning in the brain.
Abstract: This paper describes a general model that subsumes many parametric models for continuous data. The model comprises hidden layers of state-space or dynamic causal models, arranged so that the output of one provides input to another. The ensuing hierarchy furnishes a model for many types of data, of arbitrary complexity. Special cases range from the general linear model for static data to generalised convolution models, with system noise, for nonlinear time-series analysis. Crucially, all of these models can be inverted using exactly the same scheme, namely, dynamic expectation maximization. This means that a single model and optimisation scheme can be used to invert a wide range of models. We present the model and a brief review of its inversion to disclose the relationships among, apparently, diverse generative models of empirical data. We then show that this inversion can be formulated as a simple neural network and may provide a useful metaphor for inference and learning in the brain.
TL;DR: A nonasymptotic oracle inequality is proved for the empirical risk minimizer with Lasso penalty for high-dimensional generalized linear models with Lipschitz loss functions, and the penalty is based on the coefficients in the linear predictor, after normalization with the empirical norm.
Abstract: We consider high-dimensional generalized linear models with Lipschitz loss functions, and prove a nonasymptotic oracle inequality for the empirical risk minimizer with Lasso penalty. The penalty is based on the coefficients in the linear predictor, after normalization with the empirical norm. The examples include logistic regression, density estimation and classification with hinge loss. Least squares regression is also discussed.
TL;DR: Bayesian regularized artificial neural networks (BRANNs) as mentioned in this paper are more robust than standard back-propagation nets and can reduce or eliminate the need for lengthy cross-validation.
Abstract: Bayesian regularized artificial neural networks (BRANNs) are more robust than standard back-propagation nets and can reduce or eliminate the need for lengthy cross-validation. Bayesian regularization is a mathematical process that converts a nonlinear regression into a "well-posed" statistical problem in the manner of a ridge regression. The advantage of BRANNs is that the models are robust and the validation process, which scales as O(N2) in normal regression methods, such as back propagation, is unnecessary. These networks provide solutions to a number of problems that arise in QSAR modeling, such as choice of model, robustness of model, choice of validation set, size of validation effort, and optimization of network architecture. They are difficult to overtrain, since evidence procedures provide an objective Bayesian criterion for stopping training. They are also difficult to overfit, because the BRANN calculates and trains on a number of effective network parameters or weights, effectively turning off those that are not relevant. This effective number is usually considerably smaller than the number of weights in a standard fully connected back-propagation neural net. Automatic relevance determination (ARD) of the input variables can be used with BRANNs, and this allows the network to "estimate" the importance of each input. The ARD method ensures that irrelevant or highly correlated indices used in the modeling are neglected as well as showing which are the most important variables for modeling the activity data. This chapter outlines the equations that define the BRANN method plus a flowchart for producing a BRANN-QSAR model. Some results of the use of BRANNs on a number of data sets are illustrated and compared with other linear and nonlinear models.
TL;DR: Linear Mixed Models: Part I- Linear Mixed models: Part II- Generalized Linear Mixed Models (GLM): Part I - Generalized linear mixed models (GLM): Part II as discussed by the authors
Abstract: Linear Mixed Models: Part I- Linear Mixed Models: Part II- Generalized Linear Mixed Models: Part I- Generalized Linear Mixed Models: Part II
TL;DR: This article showed that the treatment effect in nonlinear difference-in-differences models such as probit, logit or tobit is not the parameter of interest, but the incremental effect of the coefficient of the interaction term.
Abstract: I demonstrate that Ai and Norton's (2003) point about cross differences is not relevant for the estimation of the treatment effect in nonlinear difference-in-differences models such as probit, logit or tobit, because the cross difference is not equal to the treatment effect, which is the parameter of interest. In a nonlinear difference-in-differences model, the treatment effect is the cross difference of the conditional expectation of the observed outcome minus the cross difference of the conditional expectation of the potential outcome without treatment. Unlike in the linear model, the latter cross difference is not zero in the nonlinear model. It follows that the sign of the treatment effect in a nonlinear difference-in-differences model with a strictly monotonic transformation function is equal to the sign of the coefficient of the interaction term of the time and treatment group indicators. The treatment effect is simply the incremental effect of the coefficient of the interaction term.
TL;DR: This practical, rigorous book treats GLMs, covers all standard exponential family distributions, extends the methodology to correlated data structures, and discusses recent developments which go beyond the GLM.
Abstract: Preface 1. Insurance data 2. Response distributions 3. Exponential family responses and estimation 4. Linear modeling 5. Generalized linear models 6. Models for count data 7. Categorical responses 8. Continuous responses 9. Correlated data 10. Extensions to the Generalized linear model Appendix 1. Computer code and output Bibliography Index.
TL;DR: It is shown that the statistical specification admits a standard mixed-effects linear model representation, with smoothing parameters treated as variance components, in reproducing kernel Hilbert spaces regression.
Abstract: Reproducing kernel Hilbert spaces regression procedures for prediction of total genetic value for quantitative traits, which make use of phenotypic and genomic data simultaneously, are discussed from a theoretical perspective. It is argued that a nonparametric treatment may be needed for capturing the multiple and complex interactions potentially arising in whole-genome models, i.e., those based on thousands of single-nucleotide polymorphism (SNP) markers. After a review of reproducing kernel Hilbert spaces regression, it is shown that the statistical specification admits a standard mixed-effects linear model representation, with smoothing parameters treated as variance components. Models for capturing different forms of interaction, e.g., chromosome-specific, are presented. Implementations can be carried out using software for likelihood-based or Bayesian inference.
TL;DR: In this paper, the authors studied the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size.
Abstract: We study the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge estimators have an oracle property in the sense of Fan and Li [J. Amer. Statist. Assoc. 96 (2001) 1348-1360] and Fan and Peng [Ann. Statist. 32 (2004) 928-961]. In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the covariates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distinguish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size.
TL;DR: The optimal linear estimators including filter, predictor and smoother are developed via an innovation analysis approach based on a packet dropout model and computed recursively in terms of a Riccati difference equation of dimension equal to the order of the system state plus that of the measurement output.
TL;DR: This paper proposes a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood, and investigates the asymptotic behavior of the proposed test and demonstrates its limiting null distribution follows a chi-squared distribution, which is independent of the nuisance parameters.
Abstract: In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and select significant variables for parametric portion. Thus, it is much more challenging than that for parametric models such as linear models and generalized linear models because traditional variable selection procedures including stepwise regression and the best subset selection require model selection to nonparametric components for each submodel. This leads to very heavy computational burden. In this paper, we propose a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood. The newly proposed procedures are distinguished from the traditional ones in that they delete insignificant variables and estimate the coefficients of significant variables simultaneously. This allows us to establish the sampling properties of the resulting estimate. We first establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regularization parameters, we then establish the asymptotic normality of the resulting estimate, and further demonstrate that the proposed procedures perform as well as an oracle procedure. Semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate its limiting null distribution follows a chi-squared distribution, which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.
TL;DR: In this paper, the LS-VCE method is described for three classes of weight matrices: a general weight matrix, a weight matrix derived from the class of elliptically contoured distributions.
Abstract: Least-squares variance component estimation (LS-VCE) is a simple, flexible and attractive method for the estimation of unknown variance and covariance components. LS-VCE is simple because it is based on the well-known principle of LS; it is flexible because it works with a user-defined weight matrix; and it is attractive because it allows one to directly apply the existing body of knowledge of LS theory. In this contribution, we present the LS-VCE method for different scenarios and explore its various properties. The method is described for three classes of weight matrices: a general weight matrix, a weight matrix from the unit weight matrix class; and a weight matrix derived from the class of elliptically contoured distributions. We also compare the LS-VCE method with some of the existing VCE methods. Some of them are shown to be special cases of LS-VCE. We also show how the existing body of knowledge of LS theory can be used to one’s advantage for studying various aspects of VCE, such as the precision and estimability of VCE, the use of a-priori variance component information, and the problem of nonlinear VCE. Finally, we show how the mean and the variance of the fixed effect estimator of the linear model are affected by the results of LS-VCE. Various examples are given to illustrate the theory.
TL;DR: In this article, the authors propose to replace the linearity assumption by an additive structure, leading to a more widely applicable and much more flexible framework for functional regression models, which is suitable for both scalar and functional responses.
Abstract: In commonly used functional regression models, the regression of a scalar or functional response on the functional predictor is assumed to be linear. This means that the response is a linear function of the functional principal component scores of the predictor process. We relax the linearity assumption and propose to replace it by an additive structure, leading to a more widely applicable and much more flexible framework for functional regression models. The proposed functional additive regression models are suitable for both scalar and functional responses. The regularization needed for effective estimation of the regression parameter function is implemented through a projection on the eigenbasis of the covariance operator of the functional components in the model. The use of functional principal components in an additive rather than linear way leads to substantial broadening of the scope of functional regression models and emerges as a natural approach, because the uncorrelatedness of the functional pr...
TL;DR: It is shown that, with non-Gaussian data, causal inference is possible even in the presence of hidden variables (unobserved confounders), even when the existence of such variables is unknown a priori.
TL;DR: A new approach to fitting a linear regression model to symbolic interval data based on the estimation of the average behaviour of both the root mean square error and the square of the correlation coefficient in the framework of a Monte Carlo experiment is introduced.
TL;DR: The authors established estimation and model selection consistency, prediction and estimation bounds and persistence for the group-lasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure.
Abstract: We establish estimation and model selection consistency, prediction and estimation bounds and persistence for the group-lasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure. We consider the case of a fixed-dimensional parameter space with increasing sample size and the double asymptotic scenario where the model complexity changes with the sample size.
TL;DR: In this article, the identification and estimation of panel data models from repeated cross-sectional surveys is discussed, focusing on linear models with fixed individual effects, to models containing lagged dependent variables and to discrete choice models.
Abstract: textIn many countries there is a lack of genuine panel data where specific individuals or firms are followed over time. However, repeated cross-sectional surveys may be available, where a random sample is taken from the population at consecutive points in time. In this paper we discuss the identification and estimation of panel data models from repeated cross sections. In particular, attention will be paid to linear models with fixed individual effects, to models containing lagged dependent variables and to discrete choice models.
TL;DR: A new procedure to predict time series using paradigms such as: fuzzy systems, neural networks and evolutionary algorithms, so that the linear model can be identified automatically, without the need of human expert participation is presented.
TL;DR: An analysis quantifying the contribution of uncertainty in each step during the model-building sequence to variation in model validity and climate change projection uncertainty found that model type and data quality dominated this analysis.
Abstract: Sophisticated statistical analyses are common in ecological research, particularly in species distribution modeling. The effects of sometimes arbitrary decisions during the modeling procedure on the final outcome are difficult to assess, and to date are largely unexplored. We conducted an analysis quantifying the contribution of uncertainty in each step during the model-building sequence to variation in model validity and climate change projection uncertainty. Our study system was the distribution of the Great Grey Shrike in the German federal state of Saxony. For each of four steps (data quality, collinearity method, model type, and variable selection), we ran three different options in a factorial experiment, leading to 81 different model approaches. Each was subjected to a fivefold cross-validation, measuring area under curve (AUC) to assess model quality. Next, we used three climate change scenarios times three precipitation realizations to project future distributions from each model, yielding 729 projections. Again, we analyzed which step introduced most variability (the four model-building steps plus the two scenario steps) into predicted species prevalences by the year 2050. Predicted prevalences ranged from a factor of 0.2 to a factor of 10 of present prevalence, with the majority of predictions between 1.1 and 4.2 (inter-quartile range). We found that model type and data quality dominated this analysis. In particular, artificial neural networks yielded low cross-validation robustness and gave very conservative climate change predictions. Generalized linear and additive models were very similar in quality and predictions, and superior to neural networks. Variations in scenarios and realizations had very little effect, due to the small spatial extent of the study region and its relatively small range of climatic conditions. We conclude that, for climate projections, model type and data quality were the most influential factors. Since comparison of model types has received good coverage in the ecological literature, effects of data quality should now come under more scrutiny.
TL;DR: In this article, a regression-rules model is proposed for the prediction of soil properties from diffuse infrared reflectance spectra, in which each rule is a linear model of the predictors.
TL;DR: A unified framework for generalized LDA is proposed, which elucidates the properties of various algorithms and their relationships, and shows that the matrix computations involved in LDA-based algorithms can be simplified so that the cross-validation procedure for model selection can be performed efficiently.
Abstract: High-dimensional data are common in many domains, and dimensionality reduction is the key to cope with the curse-of-dimensionality. Linear discriminant analysis (LDA) is a well-known method for supervised dimensionality reduction. When dealing with high-dimensional and low sample size data, classical LDA suffers from the singularity problem. Over the years, many algorithms have been developed to overcome this problem, and they have been applied successfully in various applications. However, there is a lack of a systematic study of the commonalities and differences of these algorithms, as well as their intrinsic relationships. In this paper, a unified framework for generalized LDA is proposed, which elucidates the properties of various algorithms and their relationships. Based on the proposed framework, we show that the matrix computations involved in LDA-based algorithms can be simplified so that the cross-validation procedure for model selection can be performed efficiently. We conduct extensive experiments using a collection of high-dimensional data sets, including text documents, face images, gene expression data, and gene expression pattern images, to evaluate the proposed theories and algorithms.