TL;DR: In this paper, a covariance matrix estimator which is consistent even when the disturbances of a linear regression model are heteroskedastic is presented, but does not rely on a (possibly incorrect) specific formal model of the structure of the heter-kedasticity.
Abstract: This paper presents a parameter covariance matrix estimator which is consistent even when the disturbances of a linear regression model are heteroskedastic. This estimator does not depend on a formal model of the structure of the heteroskedasticity. By comparing the elements of the new estimator to those of the usual covariance estimator, one obtains a direct test for heteroskedasticity, since in the absence of heteroskedasticity, the two estimators will be approximately equal, but will generally diverge otherwise. The test has an appealing least squares interpretation. IT IS WELL KNOWN that the presence of heteroskedasticity in the disturbances of an otherwise properly specified linear model leads to consistent but inefficient parameter estimates and inconsistent covariance matrix estimates. As a result, faulty inferences will be drawn when testing statistical hypotheses in the presence of heteroskedasticity. If the investigator has a formal model of the process generating the differing variances, these difficulties are easily eliminated by performing an appropriate linear transformation on the data, based on this model. However, even when such a model is available, it may be incorrect. Often, several models are considered (e.g., Griliches [10]), but still without the certain knowledge that any of them is correct. In this situation one can test each of the alternative transformed models for remaining heteroskedasticity (using any of several available tests), and eliminate those which fail. But what is one to do if all fail the heteroskedasticity test? Although the investigator will have a fairly good idea of the parameter values of the linear model, there remains a considerable difficulty in assessing the precision of the parameter estimates and testing hypotheses due to the possible inconsistency of the usual covariance matrix estimator. In this paper I resolve this difficulty by presenting a covariance matrix estimator which is consistent in the presence of heteroskedasticity, but does not rely on a (possibly incorrect) specific formal model of the structure of the heteroskedasticity. Thus, even when heteroskedasticity cannot be completely eliminated, proper inferences can be drawn. Under appropriate conditions, a natural test for heteroskedasticity can be obtained by comparing the consistent estimator to the usual covariance matrix estimator; in the absence of heteroskedasticity, both estimators will be about the same-otherwise, they will generally diverge. The test shares the advantage of the covariance estimator, in that no formal structure on
TL;DR: The lsmeans package (Lenth 2016) provides a simple way of obtaining least-squares means and contrasts thereof and supports many models fitted by R (R Core Team 2015) core packages that fit linear or mixed models.
Abstract: Least-squares means are predictions from a linear model, or averages thereof. They are useful in the analysis of experimental data for summarizing the effects of factors, and for testing linear contrasts among predictions. The lsmeans package (Lenth 2016) provides a simple way of obtaining least-squares means and contrasts thereof. It supports many models fitted by R (R Core Team 2015) core packages (as well as a few key contributed ones) that fit linear or mixed models, and provides a simple way of extending it to cover more model classes.
TL;DR: In this paper, the authors propose a non-linear minmax model to identify the weights such that the maximum absolute difference between the weight ratios and their corresponding comparisons is minimized, which may result in multiple optimal solutions.
Abstract: The Best Worst Method (BWM) is a multi-criteria decision-making method that uses two vectors of pairwise comparisons to determine the weights of criteria. First, the best (e.g. most desirable, most important), and the worst (e.g. least desirable, least important) criteria are identified by the decision-maker, after which the best criterion is compared to the other criteria, and the other criteria to the worst criterion. A non-linear minmax model is then used to identify the weights such that the maximum absolute difference between the weight ratios and their corresponding comparisons is minimized. The minmax model may result in multiple optimal solutions. Although, in some cases, decision-makers prefer to have multiple optimal solutions, in other cases they prefer to have a unique solution. The aim of this paper is twofold: firstly, we propose using interval analysis for the case of multiple optimal solutions, in which we show how the criteria can be weighed and ranked. Secondly, we propose a linear model for BWM, which is based on the same philosophy, but yields a unique solution.
TL;DR: In this article, two novel models using deep neural networks (DNNs) were proposed to automatically learn effective patterns from categorical feature interactions and make predictions of users' ad clicks.
Abstract: Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features that we usually found in the image and audio domains, the input features in web space are always of multi-field and are mostly discrete and categorical while their dependencies are little known. Major user response prediction models have to either limit themselves to linear models or require manually building up high-order combination features. The former loses the ability of exploring feature interactions, while the latter results in a heavy computation in the large feature space. To tackle the issue, we propose two novel models using deep neural networks (DNNs) to automatically learn effective patterns from categorical feature interactions and make predictions of users’ ad clicks. To get our DNNs efficiently work, we propose to leverage three feature transformation methods, i.e., factorisation machines (FMs), restricted Boltzmann machines (RBMs) and denoising auto-encoders (DAEs). This paper presents the structure of our models and their efficient training algorithms. The large-scale experiments with real-world data demonstrate that our methods work better than major state-of-the-art models.
TL;DR: In this paper, a universal, data-driven decomposition of chaos as an intermittently forced linear system is presented, which combines Takens' delay embedding with modern Koopman operator theory and sparse regression to obtain linear representations of strongly nonlinear dynamics.
Abstract: Understanding the interplay of order and disorder in chaotic systems is a central challenge in modern quantitative science. We present a universal, data-driven decomposition of chaos as an intermittently forced linear system. This work combines Takens' delay embedding with modern Koopman operator theory and sparse regression to obtain linear representations of strongly nonlinear dynamics. The result is a decomposition of chaotic dynamics into a linear model in the leading delay coordinates with forcing by low energy delay coordinates; we call this the Hankel alternative view of Koopman (HAVOK) analysis. This analysis is applied to the canonical Lorenz system, as well as to real-world examples such as the Earth's magnetic field reversal, and data from electrocardiogram, electroencephalogram, and measles outbreaks. In each case, the forcing statistics are non-Gaussian, with long tails corresponding to rare events that trigger intermittent switching and bursting phenomena; this forcing is highly predictive, providing a clear signature that precedes these events. Moreover, the activity of the forcing signal demarcates large coherent regions of phase space where the dynamics are approximately linear from those that are strongly nonlinear.
TL;DR: The MRMLM is a multi-locus model including markers selected from the RMLM method with a less stringent selection criterion and is more powerful and accurate than the EMMA in QTN detection and QTN effect estimation.
Abstract: Genome-wide association studies (GWAS) have been widely used in genetic dissection of complex traits. However, common methods are all based on a fixed-SNP-effect mixed linear model (MLM) and single marker analysis, such as efficient mixed model analysis (EMMA). These methods require Bonferroni correction for multiple tests, which often is too conservative when the number of markers is extremely large. To address this concern, we proposed a random-SNP-effect MLM (RMLM) and a multi-locus RMLM (MRMLM) for GWAS. The RMLM simply treats the SNP-effect as random, but it allows a modified Bonferroni correction to be used to calculate the threshold p value for significance tests. The MRMLM is a multi-locus model including markers selected from the RMLM method with a less stringent selection criterion. Due to the multi-locus nature, no multiple test correction is needed. Simulation studies show that the MRMLM is more powerful in QTN detection and more accurate in QTN effect estimation than the RMLM, which in turn is more powerful and accurate than the EMMA. To demonstrate the new methods, we analyzed six flowering time related traits in Arabidopsis thaliana and detected more genes than previous reported using the EMMA. Therefore, the MRMLM provides an alternative for multi-locus GWAS.
TL;DR: In this article, the authors proposed univariate models for short-term load forecasting based on linear regression and patterns of daily cycles of load time series, where the patterns used as input and output variables simplify the forecasting problem by filtering out the trend and seasonal variations of periods longer than the daily one.
TL;DR: Property of statistics used with the general linear model (GLM) and their distributions are exploited to obtain accelerations irrespective of generic software or hardware improvements and method (iv) was found the best as long as symmetric errors can be assumed.
TL;DR: A novel expansion of Gene Expression Programming for the purpose of tensor modeling is described, to give freedom to the algorithm to produce a constraint-free model; its own functional form that was not previously imposed.
TL;DR: In this article, two on-step ahead wind speed forecasting models were compared, one using a linear autoregressive integrated moving average (ARIMA) and the other using a nonlinear auto-regressive exogenous artificial neural network (NARX).
Abstract: Two on step ahead wind speed forecasting models were compared. A univariate model was developed using a linear autoregressive integrated moving average (ARIMA). This method’s performance is well studied for a large number of prediction problems. The other is a multivariate model developed using a nonlinear autoregressive exogenous artificial neural network (NARX). This uses the variables: barometric pressure, air temperature, wind direction and solar radiation or relative humidity, as well as delayed wind speed. Both models were developed from two databases from two sites: an hourly average measurements database from La Mata, Oaxaca, Mexico, and a ten minute average measurements database from Metepec, Hidalgo, Mexico. The main objective was to compare the impact of the various meteorological variables on the performance of the multivariate model of wind speed prediction with respect to the high performance univariate linear model. The NARX model gave better results with improvements on the ARIMA model of between 5.5% and 10. 6% for the hourly database and of between 2.3% and 12.8% for the ten minute database for mean absolute error and mean squared error, respectively.
TL;DR: A simple strategy to cope with missing data in sequential inputs is demonstrated, addressing the task of multilabel classification of diagnoses given clinical time series, and it is shown that for some diseases, what tests are run can be as predictive as the results themselves.
Abstract: We demonstrate a simple strategy to cope with missing data in sequential inputs, addressing the task of multilabel classification of diagnoses given clinical time series. Collected from the pediatric intensive care unit (PICU) at Children's Hospital Los Angeles, our data consists of multivariate time series of observations. The measurements are irregularly spaced, leading to missingness patterns in temporally discretized sequences. While these artifacts are typically handled by imputation, we achieve superior predictive performance by treating the artifacts as features. Unlike linear models, recurrent neural networks can realize this improvement using only simple binary indicators of missingness. For linear models, we show an alternative strategy to capture this signal. Training models on missingness patterns only, we show that for some diseases, what tests are run can be as predictive as the results themselves.
TL;DR: The application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts are discussed.
Abstract: Functional near-infrared spectroscopy (fNIRS) is a noninvasive neuroimaging technique that uses low levels of light to measure changes in cerebral blood oxygenation levels. In the majority of NIRS functional brain studies, analysis of this data is based on a statistical comparison of hemodynamic levels between a baseline and task or between multiple task conditions by means of a linear regression model: the so-called general linear model. Although these methods are similar to their implementation in other fields, particularly for functional magnetic resonance imaging, the specific application of these methods in fNIRS research differs in several key ways related to the sources of noise and artifacts unique to fNIRS. In this brief communication, we discuss the application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts. The objective of this work is to present an overview of these noise properties in the context of the linear model as it applies to fNIRS data. This work is aimed at explaining these mathematical issues to the general fNIRS experimental researcher but is not intended to be a complete mathematical treatment of these concepts.
TL;DR: In this article, a new variant of Lasso called classifier-Lasso is proposed to shrink individual coefficients to the unknown group-specific coefficients, which achieves simultaneous classification and consistent estimation in a single step.
Abstract: This paper provides a novel mechanism for identifying and estimating latent group structures in panel data using penalized techniques. We consider both linear and nonlinear models where the regression coefficients are heterogeneous across groups but homogeneous within a group and the group membership is unknown. Two approaches are considered—penalized profile likelihood (PPL) estimation for the general nonlinear models without endogenous regressors, and penalized GMM (PGMM) estimation for linear models with endogeneity. In both cases, we develop a new variant of Lasso called classifier-Lasso (C-Lasso) that serves to shrink individual coefficients to the unknown group-specific coefficients. C-Lasso achieves simultaneous classification and consistent estimation in a single step and the classification exhibits the desirable property of uniform consistency. For PPL estimation, C-Lasso also achieves the oracle property so that group-specific parameter estimators are asymptotically equivalent to infeasible estimators that use individual group identity information. For PGMM estimation, the oracle property of C-Lasso is preserved in some special cases. Simulations demonstrate good finite-sample performance of the approach in both classification and estimation. Empirical applications to both linear and nonlinear models are presented.
TL;DR: This paper proposes online learning algorithms for estimating ARIMA models under relaxed assumptions on the noise terms, which is suitable to a wider range of applications and enjoys high computational efficiency.
Abstract: Autoregressive integrated moving average (ARIMA) is one of the most popular linear models for time series forecasting due to its nice statistical properties and great flexibility. However, its parameters are estimated in a batch manner and its noise terms are often assumed to be strictly bounded, which restricts its applications and makes it inefficient for handling large-scale real data. In this paper, we propose online learning algorithms for estimating ARIMA models under relaxed assumptions on the noise terms, which is suitable to a wider range of applications and enjoys high computational efficiency. The idea of our ARIMA method is to reformulate the ARIMA model into a task of full information online optimization (without random noise terms). As a consequence, we can online estimation of the parameters in an efficient and scalable way. Furthermore, we analyze regret bounds of the proposed algorithms, which guarantee that our online ARIMA model is provably as good as the best ARIMA model in hindsight. Finally, our encouraging experimental results further validate the effectiveness and robustness of our method.
TL;DR: A deep feature selection (DFS) model is proposed that takes advantages of deep structures to model nonlinearity and conveniently selects a subset of features right at the input level for multiclass data.
Abstract: Sparse linear models approximate target variable(s) by a sparse linear combination of input variables. Since they are simple, fast, and able to select features, they are widely used in classification and regression. Essentially they are shallow feed-forward neural networks that have three limitations: (1) incompatibility to model nonlinearity of features, (2) inability to learn high-level features, and (3) unnatural extensions to select features in a multiclass case. Deep neural networks are models structured by multiple hidden layers with nonlinear activation functions. Compared with linear models, they have two distinctive strengths: the capability to (1) model complex systems with nonlinear structures and (2) learn high-level representation of features. Deep learning has been applied in many large and complex systems where deep models significantly outperform shallow ones. However, feature selection at the input level, which is very helpful to understand the nature of a complex system, is still not well studied. In genome research, the cis-regulatory elements in noncoding DNA sequences play a key role in the expression of genes. Since the activity of regulatory elements involves highly interactive factors, a deep tool is strongly needed to discover informative features. In order to address the above limitations of shallow and deep models for selecting features of a complex system, we propose a deep feature selection (DFS) model that (1) takes advantages of deep structures to model nonlinearity and (2) conveniently selects a subset of features right at the input level for multiclass data. Simulation experiments convince us that this model is able to correctly identify both linear and nonlinear features. We applied this model to the identification of active enhancers and promoters by integrating multiple sources of genomic information. Results show that our model outperforms elastic net in terms of size of discriminative feature subset and classification accuracy.
TL;DR: A comprehensive study to model the recency effect using a big data approach and two interesting findings are presented: 1) the naive models are not useful for benchmark purposes in load forecasting at aggregated level due to their lack of accuracy; and 2) slicing the data into 24 pieces to develop one model for each hour is not necessarily better than building one interaction regression model using all 24 hours together.
TL;DR: In this article, the authors compared linear regression and support vector regression models to predict the future of business with the current data or historical data for better prediction and accuracy, using the training data set in order to use the correct model.
Abstract: In business, consumers interest, behavior, product profits are the insights required to predict the future of business with the current data or historical data. These insights can be generated with the statistical techniques for the purpose of forecasting. The statistical techniques can be evaluated for the predictive model based on the requirements of the data. The prediction and forecasting are done widely with time series data. Most of the applications such as weather forecasting, finance and stock market combine historical data with the current streaming data for better accuracy. However the time series data is analyzed with regression models. In this paper, linear regression and support vector regression model is compared using the training data set in order to use the correct model for better prediction and accuracy.
TL;DR: Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should be considered when choosing between these methods.
Abstract: Background:The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding select...
TL;DR: This work proposes a new class of partially functional linear models to characterize the regression between a scalar response and covariates of both functional and scalar types, and establishes the consistency and oracle properties of the proposed method under mild conditions.
Abstract: SUMMARY In modern experiments, functional and nonfunctional data are often encountered simultaneously when observations are sampled from random processes and high-dimensional scalar covariates. It is difficult to apply existing methods for model selection and estimation. We propose a new class of partially functional linear models to characterize the regression between a scalar response and covariates of both functional and scalar types. The new approach provides a unified and flexible framework that simultaneously takes into account multiple functional and ultrahigh-dimensional scalar predictors, enables us to identify important features, and offers improved interpretability of the estimators. The underlying processes of the functional predictors are considered to be infinite-dimensional, and one of our contributions is to characterize the effects of regularization on the resulting estimators. We establish the consistency and oracle properties of the proposed method under mild conditions, demonstrate its performance with simulation studies, and illustrate its application using air pollution data.
TL;DR: In this paper, the authors demonstrate a simple strategy to cope with missing data in sequential inputs, addressing the task of multilabel classification of diagnoses given clinical time series, and evaluate LSTMs, MLPs, and linear models trained on missingness patterns only.
Abstract: We demonstrate a simple strategy to cope with missing data in sequential inputs, addressing the task of multilabel classification of diagnoses given clinical time series. Collected from the intensive care unit (ICU) of a major urban medical center, our data consists of multivariate time series of observations. The data is irregularly sampled, leading to missingness patterns in re-sampled sequences. In this work, we show the remarkable ability of RNNs to make effective use of binary indicators to directly model missing data, improving AUC and F1 significantly. However, while RNNs can learn arbitrary functions of the missing data and observations, linear models can only learn substitution values. For linear models and MLPs, we show an alternative strategy to capture this signal. Additionally, we evaluate LSTMs, MLPs, and linear models trained on missingness patterns only, showing that for several diseases, what tests are run can be more predictive than the results themselves.
TL;DR: GLMs are thought to fit count data well, and when any necessary steps are taken to correct type I error rates, they should be used rather than LMs, and tests based on models that better fit the data tend to have better power properties and in some instances have considerably higher power.
Abstract: Summary
The two most common approaches for analysing count data are to use a generalized linear model (GLM), or transform data, and use a linear model (LM). The latter has recently been advocated to more reliably maintain control of type I error rates in tests for no association, while seemingly losing little in power. We make three points on this issue.
Point 1 – Choice of statistical model should primarily be made on the grounds of data properties. Choice of testing procedure should be considered and addressed as a separate issue, after model choice. If models with the appropriate data properties nonetheless have statistical problems such as type I error control (i.e. type I error rate greatly exceeds the intended significance level), the best solution is to keep the model but fix the problems.
Point 2 – When a test has problems with type I error control, it can usually be corrected, but this may require departure from software default approaches. In particular, resampling is a good solution for small samples that can be easy to implement.
Point 3 –Tests based on models that better fit the data (e.g. a negative binomial for overdispersed count data) tend to have better power properties and in some instances have considerably higher power.
We illustrate these issues for a 2 × 2 experiment with a count response. This seemingly simple problem becomes hard when the experimental design is unbalanced, and software default procedures using LMs or GLMs can have difficulties, although in both cases the issues can be fixed.
We conclude that, when GLMs are thought to fit count data well, and when any necessary steps are taken to correct type I error rates, they should be used rather than LMs. Nonetheless, standard LM tests are often robust and can have good type I error control, so there is an argument for their use for counts when diagnostics are difficult and statistical models are complex, although at some risk of loss of power and interpretability.
TL;DR: A fairly simple nonlinear regression model known as multivariate adaptive regression splines (MARS) is suggested, as an alternative to forecasting of solar power output, that maintains simplicity of the classical multiple linear regression (MLR) model while possessing the capability of handling nonlinearity.
TL;DR: In this article, a penalized estimation procedure for estimating the regression coefficients and for selecting variables under the linear constraints is developed, which provides valid confidence intervals of the regression coefficient and can be used to obtain the $p$-values, and the proposed methods are applied to a gut microbiome data set and identify four bacterial genera that are associated with the body mass index after adjusting for the total fat and caloric intakes.
Abstract: One important problem in microbiome analysis is to identify the bacterial taxa that are associated with a response, where the microbiome data are summarized as the composition of the bacterial taxa at different taxonomic levels. This paper considers regression analysis with such compositional data as covariates. In order to satisfy the subcompositional coherence of the results, linear models with a set of linear constraints on the regression coefficients are introduced. Such models allow regression analysis for subcompositions and include the log-contrast model for compositional covariates as a special case. A penalized estimation procedure for estimating the regression coefficients and for selecting variables under the linear constraints is developed. A method is also proposed to obtain debiased estimates of the regression coefficients that are asymptotically unbiased and have a joint asymptotic multivariate normal distribution. This provides valid confidence intervals of the regression coefficients and can be used to obtain the $p$-values. Simulation results show the validity of the confidence intervals and smaller variances of the debiased estimates when the linear constraints are imposed. The proposed methods are applied to a gut microbiome data set and identify four bacterial genera that are associated with the body mass index after adjusting for the total fat and caloric intakes.
TL;DR: This work shows the remarkable ability of RNNs to make effective use of binary indicators to directly model missing data, improving AUC and F1 significantly and evaluating LSTMs, MLPs, and linear models trained on missingness patterns only.
Abstract: We demonstrate a simple strategy to cope with missing data in sequential inputs, addressing the task of multilabel classification of diagnoses given clinical time series. Collected from the intensive care unit (ICU) of a major urban medical center, our data consists of multivariate time series of observations. The data is irregularly sampled, leading to missingness patterns in re-sampled sequences. In this work, we show the remarkable ability of RNNs to make effective use of binary indicators to directly model missing data, improving AUC and F1 significantly. However, while RNNs can learn arbitrary functions of the missing data and observations, linear models can only learn substitution values. For linear models and MLPs, we show an alternative strategy to capture this signal. Additionally, we evaluate LSTMs, MLPs, and linear models trained on missingness patterns only, showing that for several diseases, what tests are run can be more predictive than the results themselves.
TL;DR: In this article, the authors present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data.
Abstract: We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness of fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches...
TL;DR: A land use regression model for UFPs in Montreal, Canada is developed using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012 and suggests that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient U FPs.
TL;DR: The authors showed that adding covariates that satisfy instrumental variables assumptions increases the amount of inconsistency in a linear model and showed that regression adjustment using the propensity score based on instrumental variables actually maximizes the inconsistency among regression-type estimators.
TL;DR: A general model where it is only assumed that each observation y_i may depend on a_i only through , which leads to the intriguing conclusion that in the high noise regime, an unknown non-linearity in the observations does not significantly reduce one's ability to determine the signal, even when the non- linearity may be non-invertible.
Abstract: Author(s): Plan, Yaniv; Vershynin, Roman; Yudovina, Elena | Abstract: Consider measuring an n-dimensional vector x through the inner product with several measurement vectors, a_1, a_2, ..., a_m. It is common in both signal processing and statistics to assume the linear response model y_i = + e_i, where e_i is a noise term. However, in practice the precise relationship between the signal x and the observations y_i may not follow the linear model, and in some cases it may not even be known. To address this challenge, in this paper we propose a general model where it is only assumed that each observation y_i may depend on a_i only through . We do not assume that the dependence is known. This is a form of the semiparametric single index model, and it includes the linear model as well as many forms of the generalized linear model as special cases. We further assume that the signal x has some structure, and we formulate this as a general assumption that x belongs to some known (but arbitrary) feasible set K. We carefully detail the benefit of using the signal structure to improve estimation. The theory is based on the mean width of K, a geometric parameter which can be used to understand its effective dimension in estimation problems. We determine a simple, efficient two-step procedure for estimating the signal based on this model -- a linear estimation followed by metric projection onto K. We give general conditions under which the estimator is minimax optimal up to a constant. This leads to the intriguing conclusion that in the high noise regime, an unknown non-linearity in the observations does not significantly reduce one's ability to determine the signal, even when the non-linearity may be non-invertible. Our results may be specialized to understand the effect of non-linearities in compressed sensing.
TL;DR: This paper proposed fLDS, a general class of nonlinear generative models that permits the firing rate of each neuron to vary as an arbitrary smooth function of a latent, linear dynamical state, which allows the model to capture a richer set of neural variability than a purely linear model, but retains an easily visualizable low-dimensional latent space.
Abstract: A body of recent work in modeling neural activity focuses on recovering low- dimensional latent features that capture the statistical structure of large-scale neural populations. Most such approaches have focused on linear generative models, where inference is computationally tractable. Here, we propose fLDS, a general class of nonlinear generative models that permits the firing rate of each neuron to vary as an arbitrary smooth function of a latent, linear dynamical state. This extra flexibility allows the model to capture a richer set of neural variability than a purely linear model, but retains an easily visualizable low-dimensional latent space. To fit this class of non-conjugate models we propose a variational inference scheme, along with a novel approximate posterior capable of capturing rich temporal correlations across time. We show that our techniques permit inference in a wide class of generative models.We also show in application to two neural datasets that, compared to state-of-the-art neural population models, fLDS captures a much larger proportion of neural variability with a small number of latent dimensions, providing superior predictive performance and interpretability.
TL;DR: In this article, the marginal density of a "minimal" data set is typically available in closed form, regardless of the error distribution, and the conditions for the results to hold are explored in some detail for nonnormal linear models and various transformations thereof.
Abstract: SUMMARY. In Bayesian analysis with a "minimal" data set and common non informative priors, the (formal) marginal density of the data is surprisingly often independent of the error distribution. This results in great simplifications in certain model selection methodologies; for instance, the Intrinsic Bayes Factor for models with this property reduces simply to the Bayes factor with respect to the noninformative priors. The basic result holds for comparison of models which are invariant with respect to the same group structure. Indeed the condi tion reduces to a condition on the distributions of the common maximal invariant. In these situations, the marginal density of a "minimal" data set is typically available in closed form, regardless of the error distribution. This provides very useful expressions for computation of Intrinsic Bayes Factors in more general settings. The conditions for the results to hold are explored in some detail for nonnormal linear models and various transformations thereof.