TL;DR: A novel p-value-based multiple testing approach is introduced for generalized linear models, addressing FDR control amidst dependent test statistics, with efficient algorithms and theoretical analysis affirming its performance across diverse simulation settings.
Abstract: This study introduces a novel p-value-based multiple testing approach tailored for generalized linear models. Despite the crucial role of generalized linear models in statistics, existing methodologies face obstacles arising from the heterogeneous variance of response variables and complex dependencies among estimated parameters. Our aim is to address the challenge of controlling the false discovery rate (FDR) amidst arbitrarily dependent test statistics. Through the development of efficient computational algorithms, we present a versatile statistical framework for multiple testing. The proposed framework accommodates a range of tools developed for constructing a new model matrix in regression-type analysis, including random row permutations and Model-X knockoffs. We devise efficient computing techniques to solve the encountered non-trivial quadratic matrix equations, enabling the construction of paired p-values suitable for the two-step multiple testing procedure proposed by Sarkar and Tang (Biometrika 109(4): 1149–1155, 2022). Theoretical analysis affirms the properties of our approach, demonstrating its capability to control the FDR at a given level. Empirical evaluations further substantiate its promising performance across diverse simulation settings.
TL;DR: Researchers propose a method to efficiently eliminate redundant candidate points for D-optimal exact design problems, reducing memory and runtime requirements, and enabling computation of optimal designs for large-scale problems via mixed-integer second-order cone programming.
Abstract: One of the most common problems in statistical experimentation is computing D-optimal designs for linear or locally linearized models on large finite candidate sets. While optimal approximate designs can be efficiently computed using convex methods, constructing optimal exact designs with a prespecified total number of trials is a substantially more difficult integer optimization problem. In this paper, we propose necessary conditions, based on approximate designs, that must be satisfied by any support point of a D-optimal exact design. These conditions enable rapid elimination of redundant candidate points without loss of optimality, thereby reducing memory requirements and runtime of subsequent exact-design algorithms. We also prove that, for a sufficiently large number of trials, the support of every D-optimal exact design is contained in a set that typically coincides with the support of a D-optimal approximate design. We demonstrate the approach on randomly generated benchmark models with candidate sets of up to 100 million points and on commonly used constrained mixture models with up to 1 million points. The proposed approach reduces the initial candidate sets by several orders of magnitude, thereby making it possible to compute D-optimal exact designs for these problems via mixed-integer second-order cone programming, which provides optimality guarantees.
TL;DR: This study proposes a novel Bayesian DLNM-Laplacian-P-splines approach to model nonlinear lagged relationships in spatially referenced data, incorporating spatial dependence using CAR priors and Laplace approximation for improved computational efficiency.
Abstract: Distributed lag non-linear models (DLNMs) have gained popularity for modeling nonlinear lagged relationships between exposures and outcomes. When applied to spatially referenced data, these models must account for spatial dependence, a challenge that has yet to be thoroughly explored within the penalized DLNM framework. This gap is mainly due to the complex model structure and high computational demands, particularly when dealing with large spatio-temporal datasets. To address this, we propose a novel Bayesian DLNM-Laplacian-P-splines (DLNM-LPS) approach that incorporates spatial dependence using conditional autoregressive (CAR) priors, a method commonly applied in disease mapping. Our approach offers a flexible framework for capturing nonlinear associations while accounting for spatial dependence. It uses the Laplace approximation to approximate the conditional posterior distribution of the regression parameters, eliminating the need for Markov chain Monte Carlo (MCMC) sampling, often used in Bayesian inference, thus improving computational efficiency. The methodology is evaluated through simulation studies and applied to analyze the relationship between temperature and mortality in London.
TL;DR: This study introduces a scalable conditional variational inference approach for multinomial probit models, using neural embeddings and a reparameterization trick to efficiently estimate model parameters in high-dimensional choice settings with large samples and choice sets.
Abstract: The multinomial probit (MNP) model is widely used to analyze categorical outcomes due to its ability to capture flexible substitution patterns among alternatives. Conventional likelihood-based and Markov chain Monte Carlo (MCMC) estimators become computationally prohibitive in high-dimensional choice settings. This study introduces a fast and accurate conditional variational inference (CVI) approach to calibrate MNP model parameters, which is scalable to large samples and large choice sets. A flexible variational distribution on correlated latent utilities is defined using neural embeddings, and a reparameterization trick is used to ensure the positive definiteness of the resulting covariance matrix. The resulting CVI estimator is similar to a variational autoencoder, with the variational model being the encoder and the MNP’s data generating process being the decoder. Straight-through-estimation and Gumbel-SoftMax approximation are adopted for the ‘argmax’ operation to select an alternative with the highest latent utility. This eliminates the need to sample from high-dimensional truncated Gaussian distributions, significantly reducing computational costs as the number of alternatives grows. The point estimates from the proposed method align closely with the posterior mean estimates of MCMC. It can calibrate MNP parameters with 20 alternatives and one million observations in approximately 28 minutes - roughly 36 times faster while recovering point estimates with accuracy comparable to the existing benchmarks. Although the proposed approach is primarily designed for efficient point estimation, our experimental results confirm that valid statistical inference can be derived through bootstrapping.
TL;DR: This paper approximates subordinators in Lp spaces using scaled Poisson mixtures, deriving a rate of convergence and developing an approach for simulating tempered stable distributions through randomly stopped Lévy processes.
Abstract: Subordinators are infinitely divisible distributions on the positive half-line. They are often used as mixing distributions in Poisson mixtures. We show that appropriately scaled Poisson mixtures can approximate the mixing subordinator and we derive a rate of convergence in Lp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^p$$\end{document} for each p∈[1,∞]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\in [1,\infty ]$$\end{document}. This includes the Kolmogorov and Wasserstein metrics as special cases. As an application, we develop an approach for approximate simulation of the underlying subordinator. In the interest of generality, we present our results in the context of more general mixtures, specifically those that can be represented as differences of randomly stopped Lévy processes. Particular focus is given to the case where the subordinator belongs to the class of tempered stable distributions.
Abstract: Kuiper’s Vn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_n$$\end{document} statistic, a measure for comparing the difference of ideal distribution and empirical distribution, is of great significance in the goodness-of-fit test. However, Kuiper’s formulae for computing the cumulative distribution function, false positive probability, and the upper tail quantile of Vn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_n$$\end{document} cannot be applied to the case of small sample capacity n since the approximation error is On-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}\left( n^{-1}\right) $$\end{document}. In this work, our contributions lie in three perspectives: firstly the approximation error is reduced to On-(k+1)/2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}\left( n^{-(k+1)/2}\right) $$\end{document} where k is the expansion order with the high order expansion for the exponent of the differential operator; secondly, a novel high order formula with approximation error On-3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}\left( n^{-3}\right) $$\end{document} is obtained by massive calculations; thirdly, the fixed-point algorithms are designed for solving the Kuiper pair of critical values and upper tail quantiles based on the novel formula. The high order expansion method for Kuiper’s Vn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_n$$\end{document} statistic is applicable for various applications where there are more than five samples of data. The principles, algorithms, and code for the high order expansion method are attractive for the goodness-of-fit test.