TL;DR: A modified General Linear Model (GLM) with Expectation-Maximization (EM) algorithm, EMSEV, distinguishes biological variance from noise, outperforming traditional GLM, with promising applications in biological science and statistical inference, despite deviations in noise estimation at similar variance levels.
Abstract: The general linear model (GLM) has been widely used in research, where the error term has been treated as noise. However, compelling evidence suggests that in biological systems, the target variables may possess their innate variances. A modified GLM was proposed to explicitly model biological variance and nonbiological noise. Using the expectation and maximization (EM) scheme can distinguish biological variance from noise, termed EMSEV (EM for separating variances). The performance of EMSEV was evaluated by varying noise levels, dimensions of the design matrix, and covariance structures of the target variables. The deviation between EMSEV outputs and the predefined distribution parameters increased with noise level. With a proper initial guess, when the noise magnitude and the variance of the target variables were similar, there were deviations of 3% and 10%-16% in the estimated mean and covariance of the target variables, respectively, along with a 1.7% deviation in noise estimation. EMSEV appears promising for distinguishing signal variance from noise in biological systems. The potential applications and implications in biological science and statistical inference are discussed.
TL;DR: This study compares model selection and model averaging for nested linear models, showing that model averaging can significantly improve estimation risk under certain conditions, particularly with heteroscedastic and autocorrelated errors and sparse coefficients.
Abstract: Model selection (MS) and model averaging (MA) are two popular approaches when many candidate models exist. Theoretically, the estimation risk of an oracle MA is not larger than that of an oracle MS because the former is more flexible, but a foundational issue is this: Does MA offer a substantial improvement over MS? Recently, seminal work by Peng and Yang (2022) has answered this question under nested models with linear orthonormal series expansion. In the current paper, we further respond to this question under linear nested regression models. A more general nested framework, heteroscedastic and autocorrelated random errors, and sparse coefficients are allowed in the current paper, giving a scenario that is more common in practice. A remarkable implication is that MS can be significantly improved by MA under certain conditions. In addition, we further compare MA techniques with different weight sets. Simulation studies illustrate the theoretical findings in a variety of settings.
TL;DR: Researchers develop an equivalent linear mapping for large language models, revealing low-dimensional semantic structures in next-token predictions, and demonstrate that LLMs operate in extremely low-dimensional subspaces, enabling interpretable semantic concept decoding.
Abstract: Despite significant progress in transformer interpretability, an understanding of the computational mechanisms of large language models (LLMs) remains a fundamental challenge. Many approaches interpret a network's hidden representations but remain agnostic about how those representations are generated. We address this by mapping LLM inference for a given input sequence to an equivalent and interpretable linear system which reconstructs the predicted output embedding with relative error below $10^{-13}$ at double floating-point precision, requiring no additional model training. We exploit a property of transformers wherein every operation (gated activations, attention, and normalization) can be expressed as $A(x) \cdot x$, where $A(x)$ represents an input-dependent linear transform and $x$ preserves the linear pathway. To expose this linear structure, we strategically detach components of the gradient computation with respect to an input sequence, freezing the $A(x)$ terms at their values computed during inference, such that the Jacobian yields an equivalent linear mapping. This detached Jacobian of the model reconstructs the output with one linear operator per input token, which is shown for Qwen 3, Gemma 3 and Llama 3, up to Qwen 3 14B. These linear representations demonstrate that LLMs operate in extremely low-dimensional subspaces where the singular vectors can be decoded to interpretable semantic concepts. The computation for each intermediate output also has a linear equivalent, and we examine how the linear representations of individual layers and their attention and multilayer perceptron modules build predictions, and use these as steering operators to insert semantic concepts into unrelated text. Despite their global nonlinearity, LLMs can be interpreted through equivalent linear representations that reveal low-dimensional semantic structures in the next-token prediction process.
TL;DR: This comprehensive guide to linear regression covers its conceptual roots, practical implementation, and real-world applications, spanning from basic concepts to advanced topics, including deep learning, to equip readers with skills for diverse analytical scenarios.
Abstract: In the rapidly evolving field of data science and machine learning, Linear Regression remains one of the most foundational and widely applied statistical modeling techniques. Despite the emergence of advanced algorithms and deep learning architectures, linear regression continues to serve as the first step in understanding relationships among variables, making predictions, and drawing meaningful insights from data. This book is a comprehensive guide to linear regression— starting from its conceptual roots to practical implementation and real-world applications. The content has been meticulously structured to support both beginners and intermediate learners in gaining a deep understanding of linear regression. Chapter 1 introduces the basic concept of linear regression, tracing its historical development and emphasizing its relevance across various real-life domains such as economics, healthcare, and social sciences. Chapter 2 delves into Simple Linear Regression, explaining the mathematical formulation, the least squares approach, and essential assumptions underlying the model. Chapter 3 expands the discussion to Multiple Linear Regression, enabling readers to understand how models evolve when multiple predictors are introduced. Key concepts such as multicollinearity and model evaluation are covered to build a more robust analytical mindset. Chapter 4 provides the theoretical underpinnings of linear regression, including linear algebraic formulations, matrix operations, and solution techniques such as the normal equation and an introduction to gradient descent. In the practical section, Chapter 5 focuses on implementation, guiding the reader through real coding exercises using Python libraries such as NumPy and Scikit-learn. From preprocessing data to evaluating models and visualizing predictions, this chapter translates theory into hands-on learning. This is followed by Chapter 6, which presents a case study on house price prediction, demonstrating how the principles learned can be applied to a real-world dataset to build a predictive model. Chapter 7 offers a balanced view of the advantages and limitations of linear regression, helping readers critically assess when and how to use this technique effectively. Finally, Chapter 8 concludes the book by summarizing key insights and discussing the transition from linear to non- linear models and modern techniques such as deep learning, offering a bridge to more advanced topics in machine learning. This book is designed not only to explain linear regression but also to inspire critical thinking about model selection, performance evaluation, and the broader implications of statistical modeling. Whether you are a student, researcher, data analyst, or practitioner, the journey through these chapters will enhance your understanding and equip you with the skills to apply linear regression confidently in diverse analytical scenarios.
TL;DR: This study compares linear and ensemble models for in-hospital length of stay prediction, employing competing risk analysis and evaluating four models, with Random Survival Forest using Gray's test split outperforming clinical early warning scores NEWS and MEWS.
Abstract: Length of Stay (LoS) for in-hospital patients is a relevant indicator of efficiency in healthcare. Moreover, it is often related to the occurrence of hospital-acquired complications. In this work, we aim to explore time-to-event analysis for modelling LoS. We employed competing risk models (CR), as we considered two mutually exclusive outcomes: favorable discharge and deterioration. The explanatory variables included the patient's sex, age, and longitudinal vital signs collected from a dataset comprising [Formula: see text] admissions. To address sparse measurements, we transformed longitudinal vital signs into cross-sectional statistics. Our approach involves data pre-processing, imputation of missing data, and variable selection. We proposed four types of CR models: Cause-specific Cox, Sub-distribution hazard, and two variants of Random Survival Forests, with both generalised Log-Rank test (cause-specific hazard estimates) and Gray's test (cumulative incidences estimations) as node splitting rules. Performance in LoS CR models was evaluated over a time frame from 2 to 15 days. Additionally, we considered baselines with two well-established clinical early warning scores the National Early Warning Score (NEWS) and the Modified Early Warning Score (MEWS). The best model was Random Survival Forest using Gray's test split, with Integrated Brier Score[×100] of 0.386, C-Index above 99%, and Brier Score below 0.006, along the entire time frame. Employing cross-sectional statistics derived from vital signs, along with rigorous data pre-processing, outperformed the degree of correctness of modelling LoS, compared to NEWS and MEWS.
Abstract: Predictors include month (June – December), and sex. The outcome variables include proportion of moving, feeding, grooming, and resting scans. (DOCX)
TL;DR: This paper develops a global kernel estimator for partially linear varying coefficient additive hazards models, leveraging non-varying nuisance parameters, and establishes consistency and asymptotic normality, outperforming local methods in simulations and a cancer genomic study.
Abstract: We study kernel-based estimation methods for partially linear varying coefficient additive hazards models, where the effects of one type of covariates can be modified by another. Existing kernel estimation methods for varying coefficient models often use a “local” approach, where only a small local neighborhood of subjects are used for estimating the varying coefficient functions. Such a local approach, however, is generally inefficient as information about some non-varying nuisance parameter from subjects outside the neighborhood is discarded. In this paper, we develop a “global” kernel estimator that simultaneously estimates the varying coefficients over the entire domains of the functions, leveraging the non-varying nature of the nuisance parameter. We establish the consistency and asymptotic normality of the proposed estimators. The theoretical developments are substantially more challenging than those of the local methods, as the dimension of the global estimator increases with the sample size. We conduct extensive simulation studies to demonstrate the feasibility and superior performance of the proposed methods compared with existing local methods and provide an application to a motivating cancer genomic study.
TL;DR: This study investigates the linear quadratic model in radiation biology, employing advanced analytical techniques to derive exact solutions for wave profiles, revealing the model's nonlinear dynamics and enhancing understanding of cancer progression and treatment optimization.
Abstract: This study investigates the dynamic behavior of the linear quadratic model (LQM), a fundamental framework in radiation biology that describes cellular response to radiation, particularly in the context of DNA damage and cancer progression. The LQM was originally developed to quantify radiation-induced cell death and repair mechanisms, with a focus on double-stranded DNA breaks, the most critical type of radiation damage. Despite advances in tracking tumor cell dissemination, the mechanisms underlying cancer invasion remain poorly understood. Mathematical modeling, particularly through partial differential equations, has become an essential tool for simulating tumor growth and optimizing therapeutic strategies, bridging the gap between theoretical biology and clinical applications. In this work, we employ advanced analytical techniques, including the generalized Arnous method, modified F-expansion method, and generalized exponential rational function approaches to solve the model for the first time. By transforming the governing PDE into an ordinary differential equation using β-derivative and wave transformations, we derive exact solutions in the form of dark, bright, singular, mixed, complex, and combined soliton waves. These solutions, visualized through 2D and 3D plots, reveal the system's behavior under varying parameters, demonstrating the computational power and effectiveness of the applied methods. The results not only validate the proposed techniques but also enhance our understanding of the model's nonlinear dynamics. The novel findings presented here are expected to advance future research in radiation biology and cancer treatment optimization.
Abstract: Recent research demonstrates that linear models achieve forecasting performance competitive with complex architectures, yet methodologies for enhancing linear models remain underexplored. Motivated by the hypothesis that distinct time series instances may follow heterogeneous linear mappings, we propose the Classification Auxiliary Trend-Seasonal Decoupling Linear Model CATS-Linear, employing Classification Auxiliary Channel-Independence (CACI). CACI dynamically routes instances to dedicated predictors via classification, enabling supervised channel design. We further analyze the theoretical expected risks of different channel settings. Additionally, we redesign the trend-seasonal decomposition architecture by adding a decoupling -- linear mapping -- recoupling framework for trend components and complex-domain linear projections for seasonal components. Extensive experiments validate that CATS-Linear with fixed hyperparameters achieves state-of-the-art accuracy comparable to hyperparameter-tuned baselines while delivering SOTA accuracy against fixed-hyperparameter counterparts.
Abstract: This study explores data-driven approaches for modeling industrial processes by employing linear and nonlinear techniques to predict output variables based on available input measurements. Linear regression-based techniques are compared with nonlinear machine learning models to evaluate their predictive capabilities. The analysis considers models trained on high-accuracy & low-frequency laboratory data alongside models leveraging low-accuracy & high-frequency sensor measurements. A hybrid methodology enhances predictive performance by integrating additional process information in the training process. Our findings show that this hybrid approach reduces the RMSE from 0.74 to 0.38 compared to models that rely solely on sensor measurements.
TL;DR: This study explores the suitability of piecewise-linear dynamical system models for cognitive neural dynamics, demonstrating their potential for modeling brain activity, particularly in controlled settings, and outperforming linear models in predicting future states.
Abstract: Dynamical system models have proven useful for decoding the current brain state from neural activity. So far, neuroscience has largely relied on either linear models or non-linear models based on artificial neural networks (ANNs). Piecewise linear approximations of non-linear dynamics have proven useful in other technical applications. Moreover, such explicit models provide a clear advantage over ANN-based models when the dynamical system is not only supposed to be observed, but also controlled, in particular when a controller with guarantees is needed. Here we explore whether piecewise-linear dynamical system models (recurrent Switching Linear Dynamical System or rSLDS models) could be useful for modeling brain dynamics, in particular in the context of cognitive tasks. These models have the advantage that they can be estimated not only from continuous observations like field potentials or smoothed firing rates, but also from sparser single-unit spiking data. We first generate artificial neural data based on a non-linear computational model of perceptual decision-making and demonstrate that piecewise-linear dynamics can be successfully recovered from these observations. We then demonstrate that the piecewise-linear model outperforms a linear model in terms of predicting future states of the system and associated neural activity. Finally, we apply our approach to a publicly available dataset recorded from monkeys performing perceptual decisions. Much to our surprise, the piecewise-linear model did not provide a significant advantage over a linear model for these particular data, although linear models that were estimated from different trial epochs showed qualitatively different dynamics. In summary, we present a dynamical system modeling approach that could prove useful in situations, where the brain state needs to be controlled in a closed-loop fashion, for example, in new neuromodulation applications for treating cognitive deficits. Future work will have to show under what conditions the brain dynamics are sufficiently non-linear to warrant the use of a piecewise-linear model over a linear one.
TL;DR: This study models Nigeria's economic growth using robust principal component regression, addressing multicollinearity and outliers, and finds that internal and external debt, interest rate, exchange rate, and economic openness significantly influence growth, with M-estimation providing the most reliable predictions.
Abstract: This study was conducted to model, estimate, and predict Nigeria’s economic growth (RGDP) by examining the influence of key macroeconomic drivers: internal debt (INDT), external debt (EXDT), interest rate (RINR), exchange rate (REXR), and the degree of economic openness (OPEN). Preliminary exploratory and diagnostic analyses revealed significant challenges to classical linear regression assumptions, particularly the presence of multicollinearity and outliers. To address these issues, robust principal component regression (PCR) estimation methods were employed. Principal component analysis (PCA) extracted two uncorrelated predictors (PC1 and PC2), which captured the joint variability of the original determinants while addressing collinearity. Subsequently, robust estimation techniques—namely M-estimation, S-estimation, and MM-estimation—were used to generate efficient estimated parameters. A comparative evaluation based on root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and Theil’s inequality coefficient established that the M-estimation method outperformed its alternatives, providing the most stable and reliable predictions of RGDP. Empirical findings revealed that both PC1 and PC2 had positive and statistically significant influences on RGDP, with contributions of 35.39% and 22.15%, respectively. These results highlight the importance of robust PCR in addressing econometric anomalies and offer valuable policy insights into how structural shocks—such as exchange rate volatility, oil price fluctuations, and COVID-19 disruptions—affected Nigeria’s economic performance.
TL;DR: This paper proposes a robust penalized estimator for high-dimensional generalized linear models, providing consistency and asymptotic normality under suitable assumptions, and evaluates its performance through Monte Carlo simulations and an empirical application.
Abstract: Robust estimators for generalized linear models (GLMs) are not easy to develop due to the nature of the distributions involved. Recently, there has been growing interest in robust estimation methods, particularly in contexts involving a potentially large number of explanatory variables. Transformed M-estimators (MT-estimators) provide a natural extension of M-estimation techniques to the GLM framework, offering robust methodologies. We propose a penalized variant of MT-estimators to address high-dimensional data scenarios. Under suitable assumptions, we demonstrate the consistency and asymptotic normality of this novel class of estimators. Our theoretical development focuses on redescending ρ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho $$\end{document}-functions and penalization functions that satisfy specific regularity conditions. We present an Iterative re-weighted least-squares algorithm, together with a deterministic initialization procedure, which is crucial since the estimating equations may have multiple solutions. We evaluate the finite-sample performance of this method for Poisson distribution and well-known penalization functions through Monte Carlo simulations that consider various types of contamination, as well as an empirical application using a real dataset.
TL;DR: This study examines the consistency of heritability estimation from summary statistics in high-dimensional linear models, specifically LDSC regression and GWAS heritability, under various conditions and modifications, including weighting and standardization, and population stratification.
Abstract: In Genome-Wide Association Studies (GWAS), heritability is defined as the fraction of variance of an outcome explained by a large number of genetic predictors in a high-dimensional polygenic linear model. This work studies the asymptotic properties of the most common estimator of heritability from summary statistics called linkage disequilibrium score (LDSC) regression, together with a simpler and closely related estimator called GWAS heritability (GWASH). These estimators are analyzed in their basic versions and under various modifications used in practice including weighting and standardization. We show that, with some variations, two conditions which we call weak dependence (WD) and bounded-kurtosis effects (BKE) are sufficient for consistency of both the basic LDSC with fixed intercept and GWASH estimators, for both Gaussian and non-Gaussian predictors. For Gaussian predictors it is shown that these conditions are also necessary for consistency of GWASH (with truncation) and simulations suggest that necessity holds too when the predictors are non-Gaussian. We also show that, with properly truncated weights, weighting does not change the consistency results, but standardization of the predictors and outcome, as done in practice, introduces bias in both LDSC and GWASH if the two essential conditions are violated. Finally, we show that, when population stratification is present, all the estimators considered are biased, and the bias is not remedied by using the LDSC regression estimator with free intercept, as originally suggested by the authors of that estimator.
Abstract: This study investigates the credibility of using land surface temperature (LST) data retrieved from MODIS (Moderate Resolution Imaging Spectroradiometer) satellite images, and this requires comparing the acquired MODIS data with data from ground-based stations named as Intercantonal Measurement and Information System (IMIS). The study was applied for Swiss Alps covering the period between 2000 and 2023 including four MODIS observation times (i.e., MOD21A1D, MOD21A1N, MYD21A1D, and MYD21A1N). The comparative analysis based mainly on a Harmonic Regression Model which combines harmonic and linear regressions, and enables calculating the trends for both data sources. Therefore, analytical approaches were applied to support the comparison and data analysis. Five research questions were primarily identified as a reference for the achievement of the study. In order to compare the actual measurements, plots were created to determine the median absolute deviation "MAD-1", R-squared, slope and mean deviation values. While, comparison between the data means was made using the mean absolute difference "MAD-2" and Pearson’s correlation values, and a compatibility was found between both data with a preference for nighttime. For the comparison of trends, it was performed by comparing the trends of the two data at each specific hour of the day during the four MODIS observation times using the MAD-2 and Pearson’s correlation values. The comparison was also made between the mean of the trend data of both datasets using the mean absolute error (MAE) and standard deviation values. Determining the most representative observation time required to compare IMIS trend data at each of the MODIS observation times with the overall trends, the comparison was made using MAD-2 and Pearson’s correlation values. It revealed that MOD21A1D observation time has the best representativeness of trends. In order to investigate factors resulting changes in data; however, changes in IMIS data was compared with the elevation and aspect of ground stations; while MODIS data was compared with the view angle of satellites’ sensors. Therefore, elevation does not show any noticeable effect on IMIS data, except limited LST trend means which is almost low (< 0.05) at altitudes above 2000 m. This is also the case for the aspect where no relationship with IMIS trends has been reported. Besides, an effect of the view angle on MODIS measurements was noticed, but it differs between various observation times. In addition, Landsat 5, 7, and 8 observation times were utilized for comparison with the representativeness of MODIS observation times; especially that Landsat images are not acquired at nighttime which is a limitation effecting its accuracy. This study performed a comprehensive analytical approach that facilitates understanding the relation between MODIS LST and IMIS data and trends. It supports adopting MODIS data for calculating LST which is significant for future researches on hydroclimate analysis notably MODIS is a daily source of LST data.
Abstract: This paper investigates the degradation of linear regression and correlation models under distributional shifts, a prevalent issue in real-world data applications. Distributional shift, particularly covariate shift, occurs when the input data distribution changes between training and deployment, violating the standard assumption of independent and identically distributed data. This violation can lead to unreliable predictions and inaccurate measures of linear dependence. We propose a comprehensive framework for calibrating these models to maintain their precision and robustness. The methodology involves three core components: (1) a shift detection module using statistical distance measures and discriminative classifiers to identify the presence and severity of covariate shift; (2) an importance-weighting scheme for recalibrating the regression model by re-weighting the training samples to better reflect the target distribution; and (3) a covariate-adjusted correlation technique that recalibrates the Pearson correlation coefficient by accounting for the distorting effects of the distributional shift. We demonstrate through theoretical exposition and simulated experiments that standard models fail significantly when faced with even moderate shifts, whereas our proposed calibration techniques effectively restore predictive accuracy and the fidelity of correlation analysis. The findings underscore the critical need for explicit calibration mechanisms when deploying linear models in non-stationary environments, ensuring that statistical inferences remain valid and reliable over time.
Abstract: Traditional Analysis of Variance (ANOVA) faces significant challenges in high-dimensional settings where the number of factors and their interactions exceeds the number of observations. This paper introduces a framework for analyzing factorial designs using regularized linear models, specifically leveraging the Least Absolute Shrinkage and Selection Operator (LASSO) to enforce sparsity. By representing the full factorial model, including main effects and multi-way interactions, as a linear model, we can apply L1 regularization to simultaneously perform variable selection and parameter estimation. This approach is predicated on the sparsity-of-effects principle, which posits that only a small fraction of potential effects are active. The proposed method effectively identifies significant main effects and interactions from a vast pool of potential candidates, thus providing a tractable solution to the curse of dimensionality in experimental design. We discuss the formulation of the design matrix for a full factorial model, the application of the LASSO penalty, and strategies for interpreting the results in the context of ANOVA. The methodology is particularly suited for screening experiments in fields like genomics, manufacturing, and computer science, where identifying the vital few factors from the trivial many is paramount. We demonstrate through a detailed simulation that the regularized approach can recover the true sparse structure of effects with high probability, offering a powerful alternative to classical techniques.
TL;DR: This paper introduces a partially linear quantile regression model with monotonic constraints, proposing two novel estimation methods: coordinate descent and profile likelihood, which simplify the estimation process and outperform traditional approaches in estimating nonparametric components.
Abstract: The paper brings forward the partially linear quantile regression model by incorporating monotonic constraints, which are common in real‐world relationships between variables. It introduces two novel parameter estimation methods, that is, the coordinate descent method and the profile likelihood method, which eliminate the extensive tuning and simplify the estimation process. Theoretical analysis confirms the estimator’s consistency and a convergence rate of n −1/3 . Numerical simulations and case studies demonstrate the superiority of these methods over traditional approaches, particularly in estimating the nonparametric components of the model, highlighting their potential for practical use in various fields.
TL;DR: A hybrid predictive model combining random forest and multiple linear regression is constructed to analyze and predict multidimensional features, enhancing stability and generalization ability through multi-level model collaboration and data processing optimization.
Abstract: In this study, a prediction framework integrating random forest and multiple linear regression is constructed to focus on the quantitative analysis and prediction of multidimensional features. Firstly, a random forest decision tree is constructed based on the Gini index, and the classification and regression tasks are achieved by ranking the importance of features, completing the model parameter setting and error validation, so as to achieve the modelling and prediction of the non-linear relationship of multi-dimensional features. In addition, the study introduces independent variables such as dichotomous variables, occupancy category indicators and capacity values, fits a multiple linear regression model relying on the least squares method, and tests for multiple covariances through variance-inflated factor tests to quantify the extent to which specific factors influence the results. The framework enhances the stability and generalisation ability of multivariate system modelling through multi-level model collaboration and data processing optimisation, and provides a scalable technical paradigm for related fields.
TL;DR: This study evaluates predictive models for tomato production and cultivation area in Himachal Pradesh, finding cubic and quadratic models best fit area and production, respectively, with 4.60% and 5.90% annual growth rates from 1995-2023.
Abstract: This study analyzed the trend in area and production of tomatoes over a time period is important for understanding past behavior and for future planning. Tomato cultivation is highly sensitive to seasonal fluctuations and climatic factors. Therefore, to understand the prior and posterior patterns of tomato cultivation area and production, these statistical models were applied. The statistical study was carried out on different growth models viz. linear, quadratic, cubic, compound, and power for the area and production of tomatoes in Himachal Pradesh for the study period 1995 -2023. The study revealed the cubic and quadratic model was found to best fit the model for area and production, respectively. The highest value of CDVI for the area is 5.40 which indicates higher level of instability in which the variable is more erratic and has less area over time. The increasing annual growth rate for tomato areas is 4.60 percent and 5.90 percent with respect to production of tomatoes over the studied period of time using the compound model. The best-fit statistical models can be used to predict future values with greater accuracy.