TL;DR: In this paper, the authors present a general approach that accommodates most forms of experimental layout and ensuing analysis (designed experiments with fixed effects for factors, covariates and interaction of factors).
Abstract: + Abstract: Statistical parametric maps are spatially extended statistical processes that are used to test hypotheses about regionally specific effects in neuroimaging data. The most established sorts of statistical parametric maps (e.g., Friston et al. (1991): J Cereb Blood Flow Metab 11:690-699; Worsley et al. 119921: J Cereb Blood Flow Metab 12:YOO-918) are based on linear models, for example ANCOVA, correlation coefficients and t tests. In the sense that these examples are all special cases of the general linear model it should be possible to implement them (and many others) within a unified framework. We present here a general approach that accommodates most forms of experimental layout and ensuing analysis (designed experiments with fixed effects for factors, covariates and interaction of factors). This approach brings together two well established bodies of theory (the general linear model and the theory of Gaussian fields) to provide a complete and simple framework for the analysis of imaging data. The importance of this framework is twofold: (i) Conceptual and mathematical simplicity, in that the same small number of operational equations is used irrespective of the complexity of the experiment or nature of the statistical model and (ii) the generality of the framework provides for great latitude in experimental design and analysis.
TL;DR: In this article, the authors extended Hamilton's Markov-switching model to a general state-space model and proposed a filtering and smoothing algorithm to estimate a broad class of models.
TL;DR: Modelling and Analysis of Cross-sectional Data: A Review of Univariate Generalized Linear Models and Models for Multicategorical Responses and Semi- and Nonparametric Approaches to Regression Analysis.
Abstract: Introduction * Modelling and Analysis of Cross-sectional Data: A Review of Univariate Generalized Linear Models * Models for Multicategorical Responses: Multivariate Extensions of Generalized Linear Models * Selecting and Checking Models * Semi- and Nonparametric Approaches to Regression Analysis * Fixed Parameter Models for Time Series and Longitual Data * Random Effects Models * State Space and Hidden Markov Models * Survival Models
TL;DR: This article discusses the asymptotic behavior of likelihood ratio tests for nonzero variance components in the longitudinal mixed effects linear model described by Laird and Ware (1982, Biometrics 38, 963-974).
Abstract: This article discusses the asymptotic behavior of likelihood ratio tests for nonzero variance components in the longitudinal mixed effects linear model described by Laird and Ware (1982, Biometrics 38, 963-974). Our discussion of the large-sample behavior of likelihood ratio tests for nonzero variance components is based on the results for nonstandard testing situations by Self and Liang (1987, Journal of the American Statistical Association 82, 605-610).
TL;DR: A subclass of dynamic linear models with unknown hyperparameters called d-inverse-gamma models is defined and it is proved that the regularity conditions for convergence hold.
Abstract: We define a subclass of dynamic linear models with unknown hyperparameters called d-inverse-gamma models. We then approximate the marginal p.d.f.s of the hyperparameter and the state vector by the data augmentation algorithm of Tanner/Wong. We prove that the regularity conditions for convergence hold. A sampling based scheme for practical implementation is discussed. Finally, we illustrate how to obtain an iterative importance sampling estimate of the model likelihood. (author's abstract)
TL;DR: A semiparametric model for longitudinal data which is illustrated by its application to data on the time evolution of CD4 cell numbers in HIV seroconverters, finding that the onset of HIV infection is associated with a sudden drop in CD4 cells followed by a longer-term slower decay.
Abstract: The paper describes a semiparametric model for longitudinal data which is illustrated by its application to data on the time evolution of CD4 cell numbers in HIV seroconverters. The essential ingredients of the model are a parametric linear model for covariate adjustment, a nonparametric estimation of a smooth time trend, serial correlation between measurements on an individual subject, and random measurement error. A back-fitting algorithm is used in conjunction with a cross-validation prescription to fit the model. A notable feature in the application is that the onset of HIV infection is associated with a sudden drop in CD4 cells followed by a longer-term slower decay. The model is also used to estimate an individual's curve by combining his data with the population curve. Shrinkage toward the population mean trajectory is controlled in a natural way by the estimated covariance structure of the data.
TL;DR: In this article, the power properties of the LM type tests in small samples are compared to those of other tests like the CUSUM and Fluctuation Test by simulation and found very satisfactory.
TL;DR: This paper shows that linear models can provide accurate forecasts provided that the parameters involved are estimated adaptively and focuses on forecasting long-memory time series analysis.
Abstract: This paper considers some recent developments in non-linear and linear time series analysis. It consists of two main components. The first emphasizes the advances in non-linear modelling and in Bayesian inference via the Gibbs sampler. Advantages and the usefulness of these advances are illustrated by real examples. The second component is concerned with adaptive forecasting. This shows that linear models can provide accurate forecasts provided that the parameters involved are estimated adaptively. In particular, we focus on forecasting long-memory time series. Again, a real example is used to illustrate the results.
TL;DR: In this article, two approaches to estimating sub-pixel land cover composition are investigated, a linear mixture model and a regression model based on fuzzy membership functions, and significant correlation coefficients, all > 0·7, between the actual and predicted proportion of a land cover type within a pixel were obtained.
Abstract: Mixed pixels occur commonly in remotely-sensed imagery, especially those with a coarse spatial resolution. They are a problem in land-cover mapping applications since image classification routines assume ‘pure’ or homogeneous pixels. By unmixing a pixel into its component parts it is possible to enableinter alia more accurate estimation of the areal extent of different land cover classes. In this paper two approaches to estimating sub-pixel land cover composition are investigated. One is a linear mixture model the other is a regression model based on fuzzy membership functions. For both approaches significant correlation coefficients, all >0·7, between the actual and predicted proportion of a land cover type within a pixel were obtained. Additionally a case study is presented in which the accuracy of the estimation of tropical forest extent is increased significantly through the use of sub-pixel estimates of land-cover composition rather than a conventional image classification.
TL;DR: A new class of fuzzy linear regression models based on Tanaka's approach, here all training data influence the estimated interval, and an adaptation of the fuzzy regression equation to new data becomes possible.
TL;DR: In this article, the authors provide an overview of asymptotic results available for parametric estimators in dynamic models, including multivariate least squares estimation of a dynamic conditional mean, quasi-maximum likelihood estimation, and generalized method of moments estimation of orthogonality conditions.
Abstract: This chapter provides an overview of asymptotic results available for parametric estimators in dynamic models. Three cases are treated: stationary (or essentially stationary) weakly dependent data, weakly dependent data containing deterministic trends, and nonergodic data (or data with stochastic trends). Estimation of asymptotic covariance matrices and computation of the major test statistics are covered. Examples include multivariate least squares estimation of a dynamic conditional mean, quasi-maximum likelihood estimation of a jointly parameterized conditional mean and conditional variance, and generalized method of moments estimation of orthogonality conditions. Some results for linear models with integrated variables are provided, as are some abstract limiting distribution results for nonlinear models with trending data.
TL;DR: An efficient and straightforward procedure is described for specifying and estimating parameters of general mixed models which contain both hierarchical and crossed random factors.
Abstract: An efficient and straightforward procedure is described for specifying and estimating parameters of general mixed models which contain both hierarchical and crossed random factors. This is done using a model formulated for purely hierarchically structured data and generalizes the results of Raudenbush (1993) . The exposition is for the continuous response linear model with natural extensions to generalized linear, nonlinear, and multivariate models.
TL;DR: The conclusions are: 1) the Gibbs sampler converged to the true posterior distributions, as suggested by CASE I; 2) it provides a richer description of uncertainty about genetic
Abstract: Summary - The Gibbs sampling is a Monte-Carlo procedure for generating random samples from joint distributions through sampling from and updating conditional distributions. Inferences about unknown parameters are made by: 1) computing directly summary statistics from the samples; or 2) estimating the marginal density of an unknown, and then obtaining summary statistics from the density. All conditional distributions needed to implement the Gibbs sampling in a univariate Gaussian mixed linear model are presented in scalar algebra, so no matrix inversion is needed in the computations. For location parameters, all conditional distributions are univariate normal, whereas those for variance components are scaled inverted chi-squares. The procedure was applied to solve a Gaussian animal model for litter size in the Gamito strain of Iberian pigs. Data were 1 213 records from 426 dams. The model had farrowing season (72 levels) and parity (4) as fixed effects; breeding values (597), permanent environmental effects (426) and residuals were random. In CASE I, variances were assumed known, with REML (restricted maximum likelihood) estimates used as true parameter values. Here, means and variances of the posterior distributions of all effects were obtained, by inversion, from the mixed model equations. These exact solutions were used to check the Monte-Carlo estimates given by Gibbs, using 120 000 samples. Linear regression slopes of true posterior means on Gibbs means were almost exactly 1 for fixed, additive genetic and permanent environmental effects. Regression slopes of true posterior variances on Gibbs variances were 1.00, 1.01 and 0.96, respectively. In CASE II, variances were treated as unknown, with a flat prior assigned to these. Posterior densities of selected location parameters, variance components, heritability and repeatability were estimated. Marginal posterior distributions of dispersion parameters were skewed, save the residual variance; the means, modes and medians of these distributions differed from the REML estimates, as expected from theory. The conclusions are: 1) the Gibbs sampler converged to the true posterior distributions, as suggested by CASE I; 2) it provides a richer description of uncertainty about genetic
TL;DR: In this article, the authors present an analysis of a two-factor experiment with two treatments and two experiments at different sites in the US and Europe, showing that the regression is linear regression.
Abstract: INTRODUCTION The Need for Statistics Types of Data The Use of Computers in Statistics PROBABILITY AND DISTRIBUTIONS Probability Populations and Samples Means and Variances The Normal Distribution Sampling Distributions ESTIMATION AND HYPOTHESIS TESTING Estimation of the Population Mean Testing Hypotheses about the Population Mean Population Variance Unknown Comparison of Samples A Pooled Estimate of Variance A SIMPLE EXPERIMENT Randomization and Replication Analysis of a Completely Randomized Design with Two Treatments A Completely Randomized Design with Several Treatments Testing Overall Variation Between the Treatments CONTROL OF RANDOM VARIATION BY BLOCKING Local Control of Variation Analysis of a Randomized Block Design Meaning of the Error Mean Square Latin Square Designs Multiple Latin Squares Design The Benefit of Blocking and the Use of Natural Blocks PARTICULAR QUESTIONS ABOUT TREATMENTS Treatment Structure Treatment Contrasts Factorial Treatment Structure Main Effects and Interactions Analysis of Variance for a Two-Factor Experiment Partial Factorial Structure Comparing Treatment Means - Are Multiple Comparison Methods Helpful? MORE ON FACTORIAL TREATMENT STRUCTURE More than Two Factors Factors with Two Levels The Double Benefit of Factorial Structure Many Factors and Small Blocks The Analysis of Confounded Experiments Split Plot Experiments Analysis of a Split Plot Experiment Experiments Repeated at Different Sites THE ASSUMPTIONS BEHIND THE ANALYSIS Our Assumptions Normality Variance Homogeneity Additivity Transformations of Data for Theoretical Reasons A More General Form of Analysis Empirical Detection of the Failure of Assumptions and Selection of Appropriate Transformations Practice and Presentation STUDYING LINEAR RELATIONSHIPS Linear Regression Assessing the Regression Line Inferences about the Slope of a Line Prediction Using a Regression Line Correlation Testing Whether the Regression is Linear Regression Analysis Using Computer Packages MORE COMPLEX RELATIONSHIPS Making the Crooked Straight Two Independent Variables Testing the Components of a Multiple Relationship Multiple Regression Possible Problems in Computer Multiple Regression LINEAR MODELS The Use of Models Models for Factors and Variables Comparison of Regressions Fitting Parallel Lines Covariance Analysis Regression in the Analysis of Treatment Variation NONLINEAR MODELS Advantages of Linear and Nonlinear Models Fitting Nonlinear Models to Data Inferences about Nonlinear Parameters Exponential Models Inverse Polynomial Models Logistic Models for Growth Curves THE ANALYSIS OF PROPORTIONS Data in the Form of Frequencies The 2 ' 2 Contingency Table More than Two Situations or More than Two Outcomes General Contingency Tables Estimation of Proportions Sample Sizes for Estimating Proportions MODELS AND DISTRIBUTIONS FOR FREQUENCY DATA Models for Frequency Data Testing the Agreement of Frequency Data with Simple Models Investigating More Complex Models The Binomial Distribution The Poisson Distribution Generalized Models for Analyzing Experimental Data Log-Linear Models Logit Analysis of Response Data MAKING AND ANALYZING SEVERAL EXPERIMENTAL MEASUREMENTS Different Measurements on the Same Units Interdependence of Different Variables Repeated Measurements Joint (Bivariate) Analysis Indices of Combined Yield Investigating Relationships with Experimental Data ANALYZING AND SUMMARIZING MANY MEASUREMENTS Introduction to Multivariate Data Principal Component Analysis Covariance or Correlation Matrix Cluster Analysis Similarity and Dissimilarity Measures Hierarchical Clustering Comparison of PCA and Cluster Analysis CHOOSING THE MOST APPROPRIATE EXPERIMENTAL DESIGN The Components of Design Units and Treatments Replication and Precision Different Levels of Variation and Within-Unit Replication Variance Components and Split Plot Designs Randomization Managing with Limited Resources Factors with Quantitative Levels Screening and Selection On-Farm Experiments SAMPLING FINITE POPULATIONS Experiments and Sample Surveys Simple Random Sampling Stratified Random Sampling Cluster Sampling, Multistage Sampling and Sampling Proportional to Size Ratio and Regression Estimates REFERENCES APPENDIX INDEX
TL;DR: Three false steps are identified and discussed: they concern constraints on parameters, neglect of marginality constraints, and confusion between non-centrality parameters and corresponding hypotheses.
Abstract: Inference from the fitting of linear models is basic to statistical practice, but the development of strategies for analysis has been hindered by unnecessary complexities in the descriptions of such models. Three false steps are identified and discussed: they concern constraints on parameters, neglect of marginality constraints, and confusion between non-centrality parameters and corresponding hypotheses. Useful primitive statistical steps are discussed, and the need for strategies, rather than tactics, of analysis stressed. The implications for the development of good, fully interactive, computing software are set out, and illustrated with examples.
TL;DR: The purpose of this paper is to review and examine some of the approaches to fuzzy linear regression, to discuss their strengths and weaknesses relative to each other, and to suggest possible improvements.
TL;DR: An EM-based algorithm in which the M-step is computationally straightforward principal components analysis (PCA), and incorporating tangent-plane information about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.
Abstract: We construct a mixture of locally linear generative models of a collection of pixel-based images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their log-likelihoods under each model. We use an EM-based algorithm in which the M-step is computationally straightforward principal components analysis (PCA). Incorporating tangent-plane information [12] about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.
TL;DR: In this paper, a linear model of timing and error-corrections was constructed that aims at an explanation of the mechanisms underlying a subject's performance in an experimental paradigm, in which the task is to synchronize a sequence of motor acts to the sequence of stimuli.
Abstract: In Part I (Mates 1994), a linear model of timing and error-corrections was constructed that aims at an explanation of the mechanisms underlying a subject's performance in an experimental paradigm, in which the task is to synchronize a sequence of motor acts to a sequence of stimuli. The model consists of two error-corrective mechanisms: (1) corrections of period (inverted frequency) of the sequence of responses; (2) corrections of phase shift of that sequence (synchronization error). In this paper, the influence of the physiologically justifiable model variables and of initial conditions on the steady-state response sequence as well as the stability of performance of the model are analyzed. The model is stable for error-correction gains in the range from 0 to 2. Comparison with known empirical data supports the assumption that reasonable values are less than 1. Furthermore, an alternative to the basic linear model is introduced in which the possible character of the process of subjective acquisition of the synchronization error is discussed. On the basis of findings from other experimental paradigms (fusion and order threshold) it can be assumed that the subjective estimate is a nonlinear function of the difference between the temporal central availability of internal representations of the stimulus and response-feedback events. Some other known synchronization data are simulated by the nonlinear modification of the model in this paper. A good fit of the simulation results achieved further justifies the model structure proposed. Finally, the possible effect of the subjective synchronization-error estimation on empirical data is discussed.
TL;DR: In this paper, a model reduction method for large-scale power systems is presented, which searches for the optimal subset of the high-order model that best represents the power system.
Abstract: Eigenanalysis and signal analysis techniques of deriving representations of power system oscillatory dynamics result in very high-order linear models. In order to apply many modern control design methods, the models must be reduced to a more manageable order while preserving essential characteristics. Presented in this paper is a model reduction method well suited for large-scale power systems. The method searches for the optimal subset of the high-order model that best represents the system. An Akaike information criterion is used to define the optimal reduced model. The method is first presented, and then examples of applying it to Prony analysis and eigenanalysis models of power systems are given. >
TL;DR: In this paper, a Bayesian analysis of a threshold model with multiple ordered categories is presented, where marginalization is achieved by means of the Gibbs sampler, and it is shown that use of data augmentation leads to conditional posterior distributions which are easy to sample from.
Abstract: Summary - A Bayesian analysis of a threshold model with multiple ordered categories is presented. Marginalizations are achieved by means of the Gibbs sampler. It is shown that use of data augmentation leads to conditional posterior distributions which are easy to sample from. The conditional posterior distributions of thresholds and liabilities are independent uniforms and independent truncated normals, respectively. The remaining parameters of the model have conditional posterior distributions which are identical to those in the Gaussian linear model. The methodology is illustrated using a sire model, with an analysis of hip dysplasia in dogs, and the results are compared with those obtained in a previous study, based on approximate maximum likelihood. Two independent Gibbs chains of length 620 000 each were run, and the Monte-Carlo sampling error of moments of posterior densities were assessed using time series methods. Differences between results obtained from both chains were within the range of the Monte-Carlo sampling error. With the exception of the sire variance and heritability, marginal posterior distributions seemed normal. Hence inferences using the present method were in good agreement with those based on approximate maximum likelihood. Threshold estimates were strongly autocorrelated in the Gibbs sequence, but this can be alleviated using an alternative parameterization.
TL;DR: In this paper, seven established models relating speed to density for vehicular flow were tested against a set of pedestrian data and the performance of each model was described by the results of statistical tests and by visual examination of the flow-density-speed curves.
Abstract: Understanding the relationships among pedestrian speed, flow, and density is essential for improving the design and operation of pedestrian facilities. Seven established models relating speed to density for vehicular flow were tested against a set of pedestrian data. The seven models were Greenshields (single-regime linear), May's bell-shaped curve, Underwood's transposed exponential curve, Greenberg's modified exponential curve, Edie's discontinuous exponential form, two-regime linear, and three-regime linear. The evaluation procedure closely follows that developed by Drake, Schofer, and May in 1967. The study site was near the entrance to a pedestrian tunnel that caused a single, extensive queue. The walkway portion closest to the tunnel had a capacity equal to or slightly greater than the tunnel. Pedestrian demand at the location increased from near zero to over capacity and then returned to near zero. Flow parameters were derived from videotape. The performance of each model is described both by the results of statistical tests and by visual examination of the flow-density-speed curves. The three-regime linear model was not found to be statistically significant. Of the three one-regime models, the bell-shaped was judged to be superior to the Greenshields and Underwood models because of its better predictions of optimum density and optimum speed. Of the three two-regime linear models, the Edie was judged best on the basis of statistical tests and predictions of flow parameters. Since two distinct regimes were found, the Edie model was deemed to be the best model for this data set.
TL;DR: Thomas J. Archdeacon provides historians with a practical introduction to the use of correlation and regression analysis and introduces statistical techniques that are useful to historians and enhances the presentation of them with practical examples from scholarly works.
Abstract: In "Correlation and Regression Analysis" Thomas J. Archdeacon provides historians with a practical introduction to the use of correlation and regression analysis. The book concentrates on the kinds of analysis that form the broad range of methods used in the social sciences. It should enable historians to understand and to evaluate critically the quantitative analyses that they are likely to encounter in journal literature and monographs reporting research findings in the social sciences. Without attempting to be a text in basic statistics, the book provides enough background information to allow readers to grasp the mathematical essentials of correlation and regression. Correlation analysis refers to the measurement of association between or among variables, and regression analysis focuses primarily on the use of linear models to predict changes in the value taken by one variable in terms of changes in the values of a set of explanatory variables. The book also discusses diagnostic methods for identifying shortcomings in regression models, the use of regression to analyse causation, and the application of regression and related procedures to the study of problems containing categorical as well as numerical data. Archdeacon asserts that knowing how statistical procedures are computed can clarify the theoretical structures underlying them and is essential for recognising the conditions under which their use is appropriate. The book does not shy away from the mathematics of statistical analysis, but Archdeacon presents concepts carefully and explains the operation of equations step by step. Unlike many works in the field, the book does not assume that readers have mathematical training beyond basic algebra and geometry. In the hope of promoting the role of quantitative analysis in his discipline, Archdeacon discusses the theory and methods behind the most important interpretive paradigm for quantitative research in the social sciences. "Correlation and Regression Analysis" introduces statistical techniques that are useful to historians and enhances the presentation of them with practical examples from scholarly works.
TL;DR: In this paper, a feedback controller design based on a static PCA/PCR model is developed and demonstrated on a binary distillation column, and a regression estimation using multiple tray temperature measurements and a manipulated variable to estimate and control distillate composition.
Abstract: Novel ways of using multivariate statistical methods to develop process models for on-line monitoring and control are proposed. On a binary distillation column, PLS is used to develop a regression estimation using multiple tray temperature measurements and a manipulated variable to estimate and control distillate composition. Additionally, a feedback controller design based on a static PCA/PCR model is developed and demonstrated on the binary column. This controller's performance is compared with a PI controller for disturbance rejection and setpoint tracking. On a real-world chemical process, it is shown how both PLS and PCS are necessary to model normal plant operations. These models permit real-time monitoring and detection in a reduced subspace defined by the statistical independent variations in the data. Techniques for real-time monitoring and fault detection are demonstrated.
TL;DR: Methods for data with exponential family distributions are presented with the Gaussian distribution as a special case and attention is given to interpretation of fixed effects and the correlation structures implied by RCR models.
Abstract: We review random coefficient regression (RCR) models and methods for fitting these models from an applications perspective. Methods for data with exponential family distributions are presented with the Gaussian distribution as a special case. Attention is given to interpretation of fixed effects and the correlation structures implied by RCR models. Estimation methods are presented wtih computational approaches. Problems associated with testing fixed effects include accurate variance estimation and robustness to misspecification of the covariance structure. Methods for model selection and assessment are presented. An example is used to demonstrate recommended approaches.
TL;DR: A survey of experiments in the field of experimental design can be found in this article, where the authors discuss the nature and role of theory in science, and three principles of Experimental Design are discussed.
Abstract: 1. The Processes of Science. 1.1 Introduction. 1.2 Development of Theory. 1.3 The Nature and Role of Theory in Science. 1.4 Varieties of Theory. 1.5 The Problem of General Science. 1.6 Causality. 1.7 The Upshot. 1.8 What Is An Experiment?. 1.9 Statistical Inference. 2. Principles of Experimental Design. 2.1 Confirmatory and Exploratory Experiments. 2.2 Steps of Designed Investigations. 2.3 The Linear Model. 2.4 Illustrating Individual Steps: Study 1. 2.5 Three Principles of Experimental Design. 2.6 The Statistical Triangle and Study 2. 2.7 Planning the Experiment. 2.8 Cooperation between Scientist and Statistician. 2.9 General Principle of Inference. 2.10 Other Considerations for Experimental Designs. 3. Survey of Designs and Analyses. 3.1 Introduction. 3.2 Error-Control Designs. 3.3 Treatment Designs. 3.4 Combining Ideas. 3.5 Sampling Designs. 3.6 Analysis and Statistical Software. 3.7 Summary. 4. Linear Model Theory. 4.1 Introduction. 4.2 Representation of Linear Models. 4.3 Functional and Classificatory Linear Models. 4.4 The Fitting Of Y .= X-. 4.5 The Moore-Penrose Generalized Inverse. 4.6 The Conditioned Linear Model. 4.7 The Two-Part Linear Model. 4.8 A Special Case of a Partitioned Model. 4.9 Three-Part Models. 4.10 The Two-Way Classification Without Interaction. 4.11 The K-Part Linear Model. 4.12 Balanced Classificatory Structures. 4.13 Unbalanced Data Structures. 4.14 Analysis of Covariance Model. 4.15 From Data Analysis to Statistical Inference. 4.16 The Simple Normal Stochastic Linear Model. 4.17 Distribution Theory with GMNLM. 4.18 Mixed Models. 5. Randomization. 5.1 Introduction. 5.2 The Tea Tasting Lady. 5.3 A Triangular Experiment. 5.4 The Simple Arithmetical Experiment. 5.5 Randomization Ideas for Intervention Experiments. 5.6 The General Idea of the Experiment Randomization Test. 5.7 Introduction to Subsequent. 6. The Completely Randomized Design. 6.1 Introduction and Definition. 6.2 The Randomization Process. 6.3 The Derived Linear Model. 6.4 Analysis Of Variance. 6.5 Statistical Tests. 6.6 Approximating the Randomization Test. 6.7 CRD with Unequal Numbers of Replications. 6.8 Number of Replications. 6.9 Subsampling In A CRD. 6.10 Transformations. 6.11 Examples Using SASR. 7. Comparisons of Treatments. 7.1 Introduction. 7.2 Comparisons for Qualitative Treatments. 7.3 Orthogonality and Orthogonal Comparisons. 7.4 Comparisons for Quantitative Treatments. 7.5 Multiple Comparison Procedures. 7.6 Grouping Treatments. 7.7 Examples Using SAS. 8. Use of Supplementary Information. 8.1 Introduction. 8.2 Motivation of the Procedure. 8.3 Analysis of Covariance Procedure. 8.4 Treatment Comparisons. 8.5 Violation of Assumptions. 8.6 Analysis of Covariance with Subsampling. 8.7 The Case of Several Covariates. 8.8 Examples Using SASR. 9. Randomized Block Designs. 9.1 Introduction. 9.2 Randomized Complete Block Design. 9.3 Relative Efficiency of the RCBD. 9.4 Analysis of Covariance. 9.5 Missing Observations. 9.6 Nonadditivity in the RCBD. 9.7 The Generalized Randomized Block Design. 9.8 Incomplete Block Designs. 9.9 Systematic Block Designs. 9.10 Examples Using SASR. 10. Latin Square Type Designs. 10.1 Introduction and Motivation. 10.2 Latin Square Design. 10.3 Replicated Latin Squares. 10.4 Latin Rectangles. 10.5 Incomplete Latin Squares. 10.6 Orthogonal Latin Squares. 10.7 Change-Over Designs. 10.8 Examples Using SAS. 11. Factorial Experiments: Basic Ideas. 11.1 Introduction. 11.2 Inferences from Factorial Experiments. 11.3 Experiments with Factors at Two Levels. 11.4 The Interpretation of Effects and Interactions. 11.5 Interactions: A Case Study. 11.6 2n Factorials in Incomplete Blocks. 11.7 Fractions of 2n Factorials. 11.8 Orthogonal Main Effect Plans for 2n Factorials. 11.9 Experiments with Factors at Three Levels. 11.10experimentswith Factors at Two and Three Levels. 11.11examples Using SAS. 12. Response Surface Designs. 12.1 Introduction. 12.2 Formulation of the Problem. 12.3 First-Order Models and Designs. 12.4 Second-Order Models and Designs. 12.5 Integrated Mean Squared Error Designs. 12.6 Searching For an Optimum. 12.7 Experiments with Mixtures. 12.8 Examples Using SAS. 13. Split-Plot Type Designs. 13.1 Introduction. 13.2 The Simple Split-Plot Design. 13.3 Relative Efficiency of Split-Plot Design. 13.4 Other Forms of Split-Plot Designs. 13.5 Split-Block Design. 13.6 The Split-Split-Plot Design. 13.7 Examples Using SAS. 14. Designs with Repeated Measures. 14.1 Introduction. 14.2 Methods for Analyzing Repeated Measures Data. 14.3 Examples Using SAS. 14.4 Exercises.
TL;DR: This thesis investigates the use of the Backpropagation neural model for time-series forecasting using a Neural Forecasting System (NFS) and develops a new method to enhance input representations to a neural network, referred to as model sNx.
Abstract: Neural networks demonstrate great potential for discovering non-linear relationships
in time-series and extrapolating from them. Results of forecasting using financial data are
particularly good [LapFar87, Schone90, ChaMeh92]. In contrast, traditional statistical
methods are restrictive as they try to express these non-linear relationships as linear models.
This thesis investigates the use of the Backpropagation neural model for time-series
forecasting. In general, neural forecasting research [Hinton87] can be approached in three
ways: research into, the weight space, into the physical representation of inputs, and into the
learning algorithms. A new method to enhance input representations to a neural network,
referred to as model sNx, has been developed. It has been studied alongside a traditional
method in model N. The two methods reduce the unprocessed network inputs to a value
between 0 and 1. Unlike the method in model N, the variants of model sNx, sN1 and sN2,
accentuate the contracted input value by different magnitudes. This different approach to
data reduction exploits the characteristics of neural extrapolation to achieve better forecasts.
The feasibility of the principle of model sNx has been shown in forecasting the direction of
the FFSE-100 Index.
The experimental strategy involved optimisation procedures using one data set and
the application of the optimal network from each model to make forecasts on different data
sets with similar and dissimilar patterns to the first.
A Neural Forecasting System (NFS) has been developed as a vehicle for the research.
The NFS offers historical and live simulations, and supports: a data alignment facility for
standardising data files with non-uniform sampling times and volumes, and merging them
into a spreadsheet; a parameter specification table for specifications of neural and system
control parameter values; a pattern specification language for specification of input pattern
formation using one or more time-series, and loading to a configured network; a snapshot
facility for re-construction of a partially trained network to continue or extend a training
session, or re-construction of a trained network to forecast for live tests; and a log facility for
recording experimental results.
Using the NFS, specific pattern features selected from major market trends have been
investigated [Pring8O]: triple-top ('three peaks'), double-top ('two peaks'), narrow
band ('modulating'), bull ('rising') and recovery ('U-turn'). Initially, the triple-top pattern
was used in the N model to select between the logarithmic or linear data form for presenting
raw input data. The selected linear method was then used in models sN1, sN2 and N for
network optimisations. Experiments undertaken used networks of permutations of sizes of
input nodes (I), hidden nodes (H), and tolerance value. Selections were made for: the best
method, by value, direction, or value and direction, for measuring prediction accuracy; the best configuration function, H - I 4), with 4) equal to 0.9, 2 or 3; and the better of sN1 and
sN2. The evaluation parameters were, among others, the prediction accuracy (%), the
weighted return (%), the Relative Threshold Prediction Index (RTPI) indicator, the forecast
error margins. The RTPI was developed to filter out networks forecasting above a minimum
prediction accuracy with a credit in the weighted return (%). Two optimal networks, one
representing model sNx and one N were selected and then tested on the double-top, narrow
band, bull and recovery patterns.
This thesis made the following research conthbutions.
• A new method in model sNx capable of more consistent and accurate predictions.
• The new RTPI neural forecasting indicator.
• A method to forecast during the consolidation ('non-diversifying') trend which most
traditional methods are not good at.
• A set of improvements for more effective neural forecasting systems.
TL;DR: In this paper, a test based on residual partial autocorrelations is proposed which is particularly powerful in cases when the fitted model underestimates the order of the moving average component.
Abstract: SUMMARY This note proposes a test of goodness of fit for time series models based on the sum of the squared residual partial autocorrelations. The test statistic is asymptotically x2. Its small-sample performance is studied through a Monte Carlo experiment. It appears sensitive to erroneous specifications especially when the fitted model understates the order of the moving average component. Residual analysis is a fundamental step in building empirical time series models. When checking the adequacy of the model one usually tests for the absence of residual autocorrelation. There is a number of tests designed for this purpose both in the time and frequency domains; see, for instance, Quenouille (1947, 1949), Bartlett (1954), Box & Pierce (1970), Ljung & Box (1978), Ansley & Newbold (1979), Godfrey (1979). In this paper a test based on residual partial autocorrelations is proposed which is particularly powerful in cases when the fitted model underestimates the order of the moving average component. Let X, be a zero mean process generated by a ARMA model O(B)X, = 0(B)at, where B is the backshift operator, #)(B) is a polynomial of order p, 0(B) is a polynomial of order q and a, is a white noise. Let a', .. , atn be the residuals obtained after estimating the model. A very popular goodness-of-fit test, proposed by Box & Pierce (1970) and improved by Ljung & Box (1978), is based on the statistic m
TL;DR: In this paper, the authors give an overview of some of the key issues in empirical nonlinear modeling for chemical process applications. But their focus is on specific subclasses of nonlinear models that have analytically useful structural characteristics and comparisons will be made both between theses classes and with the more familiar linear models.
TL;DR: This paper capitalizes on a necessary condition characterizing the LTS fit to develop a probabilistic ‘feasible solution’ algorithm that takes random starting trial solutions and refines each to the local optimum satisfying this necessary condition.
TL;DR: In this article, the authors evaluated several mathematical models to determine which best describes the relationship between the activity time or cost and the cycle number and found that the cubic models that best describe completed activities are poor predictors of future performance.
Abstract: Many repetitive construction field operations exhibit a learning curve, over which the time or cost per cycle decreases as the cycle number increases. This paper evaluates several mathematical models to determine which best describes the relationship between the activity time or cost and the cycle number. For completed activities, cubic learning curve models are found to provide the most reliable statistical fit, and linear models provide the least reliable fit. The real potential value of learning curves is their ability to predict the time or cost needed to perform future activities. This paper presents a methodology for predicting future activity time or cost based on completed activity data. The best predictors of future performance are found to be linear models. The cubic models that best describe completed activities are poor predictors of future performance.