TL;DR: This work uses classification and regression trees to analyze survey data from the Australian central Great Barrier Reef, comprising abundances of soft coral taxa and physical and spatial environmental information and shows how linear models fail to find patterns uncovered by the trees.
Abstract: Classification and regression trees are ideally suited for the analysis of com- plex ecological data. For such data, we require flexible and robust analytical methods, which can deal with nonlinear relationships, high-order interactions, and missing values. Despite such difficulties, the methods should be simple to understand and give easily interpretable results. Trees explain variation of a single response variable by repeatedly splitting the data into more homogeneous groups, using combinations of explanatory var- iables that may be categorical and/or numeric. Each group is characterized by a typical value of the response variable, the number of observations in the group, and the values of the explanatory variables that define it. The tree is represented graphically, and this aids exploration and understanding. Trees can be used for interactive exploration and for description and prediction of patterns and processes. Advantages of trees include: (1) the flexibility to handle a broad range of response types, including numeric, categorical, ratings, and survival data; (2) invariance to monotonic transformations of the explanatory variables; (3) ease and ro- bustness of construction; (4) ease of interpretation; and (5) the ability to handle missing values in both response and explanatory variables. Thus, trees complement or represent an alternative to many traditional statistical techniques, including multiple regression, analysis of variance, logistic regression, log-linear models, linear discriminant analysis, and survival models. We use classification and regression trees to analyze survey data from the Australian central Great Barrier Reef, comprising abundances of soft coral taxa (Cnidaria: Octocorallia) and physical and spatial environmental information. Regression tree analyses showed that dense aggregations, typically formed by three taxa, were restricted to distinct habitat types, each of which was defined by combinations of 3-4 environmental variables. The habitat definitions were consistent with known experimental findings on the nutrition of these taxa. When used separately, physical and spatial variables were similarly strong predictors of abundances and lost little in comparison with their joint use. The spatial variables are thus effective surrogates for the physical variables in this extensive reef complex, where infor- mation on the physical environment is often not available. Finally, we compare the use of regression trees and linear models for the analysis of these data and show how linear models fail to find patterns uncovered by the trees.
TL;DR: Characteristics of Time Series * Time Series Regression and ARIMA Models * Dynamic Linear Models and Kalman Filtering * Spectral Analysis and Its Applications.
Abstract: Characteristics of Time Series * Time Series Regression and ARIMA Models * Dynamic Linear Models and Kalman Filtering * Spectral Analysis and Its Applications.
TL;DR: In this article, the authors discuss the role of theory and experimental design in multivariate analysis and mathematical modeling, and propose an approach for the use of multiple regression analysis in mathematical models.
Abstract: Introduction. H.E.A. Tinsley and S.D. Brown, Multivariate Statistics and Mathematical Modeling. J. Hetherington, Role of Theory and Experimental Design in Multivariate Analysis and Mathematical Modeling. R.V. Dawis, Scale Construction and Psychometric Considerations. H.E.A. Tinsley and D.J. Weiss, Interrater Reliability and Agreement. M. Hallahan and R. Rosenthal, Interpreting and Reporting Results. A. Venter and S.E. Maxwell, Issues in the Use and Application of Multiple Regression Analysis. C.J. Huberty and M.D. Petoskey, Multivariate Analysis of Variance and Covariance. M.T. Brown and L.R. Wicker, Discriminant Analysis. R.M. Thorndike, Canonical Correlation Analysis. R. Cudeck, Exploratory Factor Analysis. P.A. Gore, Jr., Cluster Analysis. M.L. Davison and S.G. Sireci, Multidimensional Scaling. M.M. Mark, C.S. Reichardt, and L.J. Sanna, Time-Series Designs and Analyses. P.B. Imrey, Poisson Regression, Logistic Regression, and Loglinear Models for Random Counts. L.F. Dilalla, Structural Equation Modeling: Uses and Issues. R.H. Hoyle, Confirmatory Factor Analysis. B.J. Becker, Multivariate Meta-analysis. G.A. Marcoulides, Generalizability Theory. R.K. Hambelton, F. Robin, and D. Xing, Item Response Models for the Analysis of Educational and Psychological Test Data. L. Dumenci, Multitrait-Multimethod Analysis. I.G.G. Kreft, Using Random Coefficient Linear Models for the Analysis of Hierarchically Nested Data. T.J.G. Tracey, Analysis of Circumplex Models. J.B. Willett and M.K. Keiley, Using Covariance Structure Analysis to Model Change over Time. Author Index. Subject Index.
TL;DR: This paper formulates a class of models for the joint behaviour of a sequence of longitudinal measurements and an associated sequence of event times, including single-event survival data, using results from a clinical trial into the treatment of schizophrenia.
Abstract: This paper formulates a class of models for the joint behaviour of a sequence of longitudinal measurements and an associated sequence of event times, including single-event survival data. This class includes and extends a number of specific models which have been proposed recently, and, in the absence of association, reduces to separate models for the measurements and events based, respectively, on a normal linear model with correlated errors and a semi-parametric proportional hazards or intensity model with frailty. Special cases of the model class are discussed in detail and an estimation procedure which allows the two components to be linked through a latent stochastic process is described. Methods are illustrated using results from a clinical trial into the treatment of schizophrenia.
TL;DR: In this article, the covariance structure of repeated measures data is modelled in the SAS((R)) system, and the results of the analysis are used to predict the fixed effects of covariance structures.
Abstract: The term 'repeated measures' refers to data with multiple observations on the same sampling unit. In most cases, the multiple observations are taken over time, but they could be over space. It is usually plausible to assume that observations on the same unit are correlated. Hence, statistical analysis of repeated measures data must address the issue of covariation between measures on the same unit. Until recently, analysis techniques available in computer software only offered the user limited and inadequate choices. One choice was to ignore covariance structure and make invalid assumptions. Another was to avoid the covariance structure issue by analysing transformed data or making adjustments to otherwise inadequate analyses. Ignoring covariance structure may result in erroneous inference, and avoiding it may result in inefficient inference. Recently available mixed model methodology permits the covariance structure to be incorporated into the statistical model. The MIXED procedure of the SAS((R)) System provides a rich selection of covariance structures through the RANDOM and REPEATED statements. Modelling the covariance structure is a major hurdle in the use of PROC MIXED. However, once the covariance structure is modelled, inference about fixed effects proceeds essentially as when using PROC GLM. An example from the pharmaceutical industry is used to illustrate how to choose a covariance structure. The example also illustrates the effects of choice of covariance structure on tests and estimates of fixed effects. In many situations, estimates of linear combinations are invariant with respect to covariance structure, yet standard errors of the estimates may still depend on the covariance structure.
TL;DR: The emphasis of this monograph is on methodologies rather than on the theory, with a particular focus on applications of partially linear regression techniques to various statistical problems, including least squares regression, asymptotically efficient estimation, bootstrap resampling, censored data analysis and nonlinear and nonparametric time series models.
Abstract: In the last ten years, there has been increasing interest and activity in the general area of partially linear regression smoothing in statistics. Many methods and techniques have been proposed and studied. This monograph hopes to bring an up-to-date presentation of the state of the art of partially linear regression techniques. The emphasis of this monograph is on methodologies rather than on the theory, with a particular focus on applications of partially linear regression techniques to various statistical problems. These problems include least squares regression, asymptotically efficient estimation, bootstrap resampling, censored data analysis, linear measurement error models, nonlinear measurement models, nonlinear and nonparametric time series models.
We hope that this monograph will serve as a useful reference for theoretical and applied statisticians and to graduate students and others who are interested in the area of partially linear regression. While advanced mathematical ideas have been valuable in some of the theoretical development, the methodological power of partially linear regression can be demonstrated and discussed without advanced mathematics.
This monograph can be divided into three parts: part one–Chapter 1 through Chapter 4; part two–Chapter 5; and part three–Chapter 6. In the first part, we discuss various estimators for partially linear regression models, establish theo- retical results for the estimators, propose estimation procedures, and implement the proposed estimation procedures through real and simulated examples.
The second part is of more theoretical interest. In this part, we construct several adaptive and efficient estimates for the parametric component. We show that the LS estimator of the parametric component can be modified to have both Bahadur asymptotic efficiency and second order asymptotic efficiency. In the third part, we consider partially linear time series models. First, we propose a test procedure to determine whether a partially linear model can be used to fit a given set of data. Asymptotic test criteria and power investigations are presented. Second, we propose a Cross-Validation (CV) based criterion to select the optimum linear subset from a partially linear regression and estab- lish a CV selection criterion for the bandwidth involved in the nonparametric kernel estimation. The CV selection criterion can be applied to the case where the observations fitted by the partially linear model (1.1.1) are independent and identically distributed (i.i.d.). Due to this reason, we have not provided a sepa- rate chapter to discuss the selection problem for the i.i.d. case. Third, we provide recent developments in nonparametric and semiparametric time series regression.
This work of the authors was supported partially by the Sonderforschungs- bereich373“QuantifikationundSimulationO konomischerProzesse”.Thesecond author was also supported by the National Natural Science Foundation of China and an Alexander von Humboldt Fellowship at the Humboldt University, while the third author was also supported by the Australian Research Council. The second and third authors would like to thank their teachers: Professors Raymond Car- roll, Guijing Chen, Xiru Chen, Ping Cheng and Lincheng Zhao for their valuable inspiration on the two authors’ research efforts. We would like to express our sin- cere thanks to our colleagues and collaborators for many helpful discussions and stimulating collaborations, in particular, Vo Anh, Shengyan Hong, Enno Mam- men, Howell Tong, Axel Werwatz and Rodney Wolff. For various ways in which they helped us, we would like to thank Adrian Baddeley, Rong Chen, Anthony Pettitt, Maxwell King, Michael Schimek, George Seber, Alastair Scott, Naisyin Wang, Qiwei Yao, Lijian Yang and Lixing Zhu.
The authors are grateful to everyone who has encouraged and supported us to finish this undertaking. Any remaining errors are ours.
TL;DR: A new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes is introduced and the results suggest that variational approximations are a viable method for inference and learning in switching state-space models.
Abstract: We introduce a new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time-series models— hidden Markov models and linear dynamical systems—and is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (Jacobs, Jordan, Nowlan, & Hinton, 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact expectation maximization (EM) algorithm cannot be applied. However, we present a variational approximation that maximizes a lower bound on the log-likelihood and makes use of both the forward and backward recursions for hidden Markov models and the Kalman filter recursions for linear dynamical systems. We tested the algorithm on artificial data sets and a natural data set of respiration force from a patient with sleep apnea. The results suggest that variational approximations are a viable method for inference and learning in switching state-space models.
TL;DR: In this article, the standard error formulas for estimated coefficients are derived and empirically tested, and a goodness-of-fit test technique based on a nonparametric maximum likelihood ratio type of test is also proposed to detect whether certain coefficient functions in a varying-coefficient model are constant or whether any covariates are statistically significant in the model.
Abstract: This article deals with statistical inferences based on the varying-coefficient models proposed by Hastie and Tibshirani. Local polynomial regression techniques are used to estimate coefficient functions, and the asymptotic normality of the resulting estimators is established. The standard error formulas for estimated coefficients are derived and are empirically tested. A goodness-of-fit test technique, based on a nonparametric maximum likelihood ratio type of test, is also proposed to detect whether certain coefficient functions in a varying-coefficient model are constant or whether any covariates are statistically significant in the model. The null distribution of the test is estimated by a conditional bootstrap method. Our estimation techniques involve solving hundreds of local likelihood equations. To reduce the computational burden, a one-step Newton-Raphson estimator is proposed and implemented. The resulting one-step procedure is shown to save computational cost on an order of tens with no...
TL;DR: In this paper, a two-step local polynomial smoothing spline and kernel method is proposed to estimate the coefficient functions of functional linear models for longitudinal data analysis, which is a simple and powerful two-stage alternative.
Abstract: Functional linear models are useful in longitudinal data analysis. They include many classical and recently proposed statistical models for longitudinal data and other functional data. Recently, smoothing spline and kernel methods have been proposed for estimating their coefficient functions nonparametrically but these methods are either intensive in computation or inefficient in performance. To overcome these drawbacks, in this paper, a simple and powerful two-step alternative is proposed. In particular, the implementation of the proposed approach via local polynomial smoothing is discussed. Methods for estimating standard deviations of estimated coefficient functions are also proposed. Some asymptotic results for the local polynomial estimators are established. Two longitudinal data sets, one of which involves time-dependent covariates, are used to demonstrate the approach proposed. Simulation studies show that our two-step approach improves the kernel method proposed by Hoover and co-workers in several aspects such as accuracy, computational time and visual appeal of the estimators.
TL;DR: In this article, the authors investigated the consequences of relaxing both the linearity and the additivity assumption for the interpretation of linear instrumental variables estimators without these assumptions, the standard linear instrumental variable estimator identifies a weighted average of the derivative of the behavioral relationship of interest.
Abstract: In markets where prices are determined by the intersection of supply and demand curves, standard identification results require the presence of instruments that shift one curve but not the other These results are typically presented in the context of linear models with fixed coefficients and additive residuals The first contribution of this paper is an investigation of the consequences of relaxing both the linearity and the additivity assumption for the interpretation of linear instrumental variables estimators Without these assumptions, the standard linear instrumental variables estimator identifies a weighted average of the derivative of the behavioural relationship of interest A second contribution is the formulation of critical identifying assumptions in terms of demand and supply at different prices and instruments, rather than in terms of functional-form specific residuals Our approach to the simultaneous equations problem and the average-derivative interpretation of instrumental variables estimates is illustrated by estimating the demand for fresh whiting at the Fulton fish market Strong and credible instruments for identification of this demand function are available in the form of weather conditions at sea
TL;DR: This paper evaluates different methods of projection regression and derives a nonlinear function approximator based on them, which is the first truly incremental spatially localized learning method to combine all these properties.
Abstract: Locally weighted projection regression is a new algorithm that achieves nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it uses locally linear models, spanned by a small number of univariate regressions in selected directions in input space. This paper evaluates different methods of projection regression and derives a nonlinear function approximator based on them. This nonparametric local learning system i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic cross validation to learn iii) adjusts its weighting kernels based on local information only, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of - possibly redundant - inputs, as shown in evaluations with up to 50 dimensional data sets. To our knowledge, this is the first truly incremental spatially localized learning method to combine all these properties.
TL;DR: In this article, a hierarchical regression model was proposed for combining estimates of the pollution-mortality relationship across cities. But the results are largely insensitive to the specific choice of vague but proper prior distribution.
Abstract: Summary. Reports over the last decade of association between levels of particles in outdoor air and daily mortality counts have raised concern that air pollution shortens life, even at concentrations within current regulatory limits. Criticisms of these reports have focused on the statistical techniques that are used to estimate the pollution-mortality relationship and the inconsistency in findings between cities. We have developed analytical methods that address these concerns and combine evidence from multiple locations to gain a unified analysis of the data. The paper presents log-linear regression analyses of daily time series data from the largest 20 US cities and introduces hierarchical regression models for combining estimates of the pollution-mortality relationship across cities. We illustrate this method by focusing on mortality effects of PM10 (particulate matter less than 10 ,um in aerodynamic diameter) and by performing univariate and bivariate analyses with PM10 and ozone (03) level. In the first stage of the hierarchical model, we estimate the relative mortality rate associated with PM10 for each of the 20 cities by using semiparametric log-linear models. The second stage of the model describes between-city variation in the true relative rates as a function of selected city-specific covariates. We also fit two variations of a spatial model with the goal of exploring the spatial correlation of the pollutant-specific coefficients among cities. Finally, to explore the results of considering the two pollutants jointly, we fit and compare univariate and bivariate models. All posterior distributions from the second stage are estimated by using Markov chain Monte Carlo techniques. In univariate analyses using concurrent day pollution values to predict mortality, we find that an increase of 10 ,ig m3 in PM10 on average in the USA is associated with a 0.48% increase in mortality (95% interval: 0.05, 0.92). With adjustment for the 03 level the PM10coefficient is slightly higher. The results are largely insensitive to the specific choice of vague but proper prior distribution. The models and estimation methods are general and can be used for any number of locations and pollutant measurements and have potential applications to other environmental agents.
TL;DR: In this article, a gray-box identification approach to three classes of block-oriented models, namely Hammerstein models, Wiener models, and feedback blockoriented models with output multiplicities, is presented.
TL;DR: A unified maximum likelihood method for estimating the parameters of the generalized latent trait model will be presented and in addition the scoring of individuals on the latent dimensions is discussed.
Abstract: In this paper we discuss a general model framework within which manifest variables with different distributions in the exponential family can be analyzed with a latent trait model. A unified maximum likelihood method for estimating the parameters of the generalized latent trait model will be presented. We discuss in addition the scoring of individuals on the latent dimensions. The general framework presented allows, not only the analysis of manifest variables all of one type but also the simultaneous analysis of a collection of variables with different distributions. The approach used analyzes the data as they are by making assumptions about the distribution of the manifest variables directly.
TL;DR: In this paper, the authors consider M-estimators of general parametric models and show that the component-wise asymptotic normality of the estimate remains valid if the dimension of the parameter space grows more slowly than some root of the sample size.
TL;DR: The model choice and the interpretation of the parameters are discussed as well as the use of the identifiability concept for fixed partition models and the concept is generalized to "partial identifiable".
Abstract: The model choice and the interpretation of the parameters are discussed as
well as the use of the identifiability concept for fixed partition models.
The concept is generalized to "partial identifiability".
TL;DR: A novel, simple characterization of linearly dependent processes, called observable operator models, is provided, which leads to a constructive learning algorithm for the identification of linially dependent processes.
Abstract: A widely used class of models for stochastic systems is hidden Markov models. Systems that can be modeled by hidden Markov models are a proper subclass of linearly dependent processes, a class of s...
TL;DR: In this article, the authors propose a nonlinear model with a feedback law that describes how the environmental condition at any particular time depends on the population size and composition at that time.
Abstract: This paper is as much about a certain modelling methodology, as it is about the constructive definition of future population states from a descrip- tion of individual behaviour and an initial population state. The key idea is to build a nonlinear model in two steps, by explicitly introducing the en- vironmental condition via the requirement that individuals are independent from one another (and hence equations are linear) when this condition is given (prescribed) as a function of time. A linear physiologically structured population model is defined by two rules, one for reproduction and one for development and survival, both de- pending on the initial individual state and the prevailing environmental con- dition. In Part I we showed how one can constructively define future popu- lation state operators from these two ingredients. A nonlinear model is a linear model together with a feedback law that describes how the environmental condition at any particular time depends on the population size and composition at that time. When applied to the solution of the linear problem, the feedback law yields a fixed point problem. This we solve constructively by means of the contraction mapping principle, for any given initial population state. Using subsequently this fixed point as input in the linear population model, we obtain a population semiflow. We then say that we solved the nonlinear problem. The paper is organized in a top-down spirit: We describe a general ab- stract setting first and then specialise, while becoming more technical. The results are not restricted to a single population but also cover the interaction (including predation) of several structured (and unstructured) populations.
TL;DR: An algorithm to estimate the parameters of a linear model in the presence of heteroscedastic noise, i.e., each data point having a different covariance matrix, achieves the accuracy of nonlinear optimization techniques at much less computational cost.
Abstract: We present an algorithm to estimate the parameters of a linear model in the presence of heteroscedastic noise, i.e., each data point having a different covariance matrix. The algorithm is motivated by the recovery of bilinear forms, one of the fundamental problems in computer vision which appears whenever the epipolar constraint is imposed, or a conic is fit to noisy data points. We employ the errors-in-variables (EIV) model and show why already at moderate noise levels most available methods fail to provide a satisfactory solution. The improved behavior of the new algorithm is due to two factors: taking into account the heteroscedastic nature of the errors arising from the linearization of the bilinear form, and the use of generalized singular value decomposition (GSVD) in the computations. The performance of the algorithm is compared with several methods proposed in the literature for ellipse fitting and estimation of the fundamental matrix. It is shown that the algorithm achieves the accuracy of nonlinear optimization techniques at much less computational cost.
TL;DR: The regulatory interactions between genes are modeled by a linear genetic network that is estimated from gene expression data that is supported by the fact that biological genetic networks are thought to be redundant and sparsely connected.
Abstract: Currently, several different types of models are studIn this paper, the regulatory interactions between genes are modeled by a linear genetic network that is estimated from gene expression data. The inference of such a genetic network is hampered by the dimensionality problem. This problem is inherent in all gene expression data since the number of genes by far exceeds the number of measured time points. Consequently, there are infinitely many solutions that fit the data set perfectly. In this paper, this problem is tackled by combining genes with similar expression profiles in a single prototypical ’gene’. Instead of modeling the genes individually, the relations between prototypical genes are modeled. In this way, genes that cannot be distinguished based on their expression profiles are grouped together and their common control action is modeled instead. This process reduces the number of signals and imposes a structure on the model that is supported by the fact that biological genetic networks are thought to be redundant and sparsely connected. In essence, the ambiguity in model solutions is represented explicitly by providing a generalized model that expresses the basic regulatory interactions between groups of similarly expressed genes. The modeling approach is illustrated on artificial as well as real data.
TL;DR: In this article, the authors explore the class of varying-coefficient linear models in which the index is unknown and is estimated as a linear combination of regression and/or other variables.
Abstract: Varying-coefficient linear models arise from multivariate nonparametric regression, nonlinear time series modelling and forecasting, functional data analysis, longitudinal data analysis, and others. It has been a common practice to assume that the vary-coefficients are functions of a given variable which is often called an index. A frequently asked question is which variable should be used as the index. In this paper, we explore the class of the varying-coefficient linear models in which the index is unknown and is estimated as a linear combination of regression and/or other variables. This will enlarge the modelling capacity substantially. We search for the index such that the derived varying-coefficient model provides the best approximation to the underlying unknown multi-dimensional regression function in the least square sense. The search is implemented through the newly proposed hybrid backfitting algorithm. The core of the algorithm is the alternative iteration between estimating the index through a one-step scheme and estimating coefficient functions through a one-dimensional local linear smoothing. The generalised cross-validation method for choosing bandwidth is efficiently incorporated into the algorithm. The locally significant variables are selected in terms of the combined use of t-statistic and Akaike information criterion. We further extend the algorithm for the models with two indices. Simulation shows that the proposed methodology has appreciable flexibility to model complex multivariate nonlinear structure and is practically feasible with average modern computers. The methods are further illustrated through the Canadian mink-muskrat data in 1925-1994 and the pound/dollar exchange rates in 1974-1983.
TL;DR: An algorithm is provided for fitting flexible TRM's that is relatively easy to program with the generalized additive model software in S-PLUS and finds evidence of residual autocorrelation; however, when the TRM is relaxed to allow for a nonparametric smooth trend, the autoc orrelation disappears.
Abstract: Environmental epidemiologists often encounter time series data in the form of discrete or other nonnormal outcomes; for example, in modeling the relationship between air pollution and hospital admissions or mortality rates. We present a case study examining the association between pollen counts and meteorologic covariates. Although such time series data are inadequately described by standard methods for Gaussian time series, they are often autocorrelated, and warrant an analysis beyond those provided by ordinary generalized linear models (GLMs). Transitional regression models (TRMs), signifying nonlinear regression models expressed in terms of conditional means and variances given past observations, provide a unifying framework for two mainstream approaches to extending the GLM for autocorrelated data. The first approach models current outcomes with a GLM that incorporates past outcomes as covariates, whereas the second models individual outcomes with marginal GLMs and then couples the error term...
TL;DR: In this paper, an approximation to the residual of the model to deal with the nonparametric part so that Owen's (1990) empirical likelihood approach can be applied is proposed and the empirical log-likelihood ratio statistic is asymptotically chi-squared distributed.
TL;DR: On a difficult task, which consists in forecasting the tendency of the Bel 20 stock market index, this method improves the results compared both to linear models and to non- linear ones where the non-linear compression is not used.
Abstract: We developed in this paper a method to predict time series with non-linear tools. The specificity of the method is to use as much information as possible as input to the model (many past values of the series, many exogenous variables), to compress this information (by a non-linear method) in order to obtain a state vector of limited size, facilitating the subsequent regression and the generalization ability of the forecasting algorithm and to fit a non-linear regressor (here a RBF neural network) on the reduced vectors. We show that this method is able to find non-linear relationships in artificial and real-world financial series. On a difficult task, which consists in forecasting the tendency of the Bel 20 stock market index, we show that this method improves the results compared both to linear models and to non-linear ones where the non-linear compression is not used.
TL;DR: In this article, a series of seasonally varying linear Markov models are constructed in a reduced multivariate empirical orthogonal function (MEOF) space of observed sea surface temperature, surface wind stress, and sea level analysis.
Abstract: A series of seasonally varying linear Markov models are constructed in a reduced multivariate empirical orthogonal function (MEOF) space of observed sea surface temperature, surface wind stress, and sea level analysis. The Markov models are trained in the 1980–95 period and are verified in the 1964–79 period. It is found that the Markov models that include seasonality fit to the data better in the training period and have a substantially higher skill in the independent period than the models without seasonality. The authors conclude that seasonality is an important component of ENSO and should be included in Markov models. This conclusion is consistent with that of statistical models that take seasonality into account using different methods. The impact of each variable on the prediction skill of Markov models is investigated by varying the weightings among the three variables in the MEOF space. For the training period the Markov models that include sea level information fit the data better than ...
TL;DR: In this paper, the problem of estimating an additive partially linear model using general series estimation methods with polynomial and splines as two leading cases was considered, and it was shown that the finite-dimensional parameter is identified under weak conditions.
Abstract: I consider the problem of estimating an additive partially linear model using general series estimation methods with polynomial and splines as two leading cases. I show that the finite-dimensional parameter is identified under weak conditions. I establish the root-n-normality result for the finite-dimensional parameter in the linear part of the model and show that it is asymptotically more efficient than a semiparametric estimator that ignores the additive structure. When the error is conditional homoskedastic, my finite-dimensional parameter estimator reaches the semiparametric efficiency bound. Efficient estimation when the error is conditional heteroskedastic is also discussed.
TL;DR: In this article, the kernel number per plant (KNP) was calculated for maize (Zea mays L.) using linear models based on intercepted photosynthetically active radiation per plant.
Abstract: When water and nutrients are not limiting growth, kernel number per plant (KNP) can be calculated for maize (Zea mays L.) using linear models based on intercepted photosynthetically active radiation per plant (IPARP). These models do not include the concept of a threshold IPARP for KNP. Our objective was to study the response of KNP to IPARP in order to improve current models. Published information from several field experiments, performed with two maize hybrids in the temperate region of Argentina, was used for the analysis. Both linear and curvilinear (inverse linear and exponential) models based on IPARP explained more than 75% of the variability in KNP and kernel number per apical ear (KNA). Curvilinear models gave a better fit at low ( 1. The relationship between KNP and IPARP did not improve when IPARP was expressed on a thermal time basis (photothermal quotient), probably because air temperature did not vary much among the situations explored in this work.
TL;DR: The dual Kalman filtering method is developed as a method for minimizing a variety of dual estimation cost functions, and is shown to be an effective general method for estimating the signal, model parameters, and noise variances in both on-line and off-line environments.
Abstract: Numerous applications require either the estimation or prediction of a noisy time-series. Examples include speech enhancement, economic forecasting, and geophysical modeling. A noisy time-series can be described in terms of a probabilistic model, which accounts for both the deterministic and stochastic components of the dynamics. Such a model can be used with a Kalman filter (or extended Kalman filter) to estimate and predict the time-series from noisy measurements. When the model is unknown, it must be estimated as well; dual estimation refers to the problem of estimating both the time-series, and its underlying probabilistic model, from noisy data. The majority of dual estimation techniques in the literature are for signals described by linear models, and many are restricted to off-line application domains. Using a probabilistic approach to dual estimation, this work unifies many of the approaches in the literature within a common theoretical and algorithmic framework, and extends their capabilities to include sequential dual estimation of both linear and nonlinear signals. The dual Kalman filtering method is developed as a method for minimizing a variety of dual estimation cost functions, and is shown to be an effective general method for estimating the signal, model parameters, and noise variances in both on-line and off-line environments.
TL;DR: In this paper, a GMM method of estimating linear dynamic models from a time series of independent cross-sections is presented, which involves subjecting the model to a quasi-differencing transformation across pairs of individuals that belong to the same group.
TL;DR: In this paper, robust estimation in linear models of the form: y i = x 1i β 1 + x 2i β 2 +e i (i=1,…,n), in which the x 1 i are fixed 0-1 vectors and the x 2 i are continuous random variables which may contain leverage points is studied.