Valid post-selection inference
TL;DR: In this paper, the problem of post-selection inference is reduced to one of simultaneous inference, and the authors propose to use simultaneous inference for all linear functions that arise as coefficient estimates in all submodels.
read more
Abstract: It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid ``post-selection inference'' by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing ``simultaneity insurance'' for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffe protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Statistical Learning with Sparsity: The Lasso and Generalizations
Trevor Hastie,Robert Tibshirani,Martin J. Wainwright +2 more
- 07 May 2015
TL;DR: Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.
3K
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
Stefan Wager,Susan Athey +1 more
TL;DR: This paper developed a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm, and showed that causal forests are pointwise consistent for the true treatment effect and have an asymptotically Gaussian and centered sampling distribution.
2.1K
On asymptotically optimal confidence regions and tests for high-dimensional models
TL;DR: A general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model and develops the corresponding theory which includes a careful analysis for Gaussian, sub-Gaussian and bounded correlated designs.
987
On asymptotically optimal confidence regions and tests for high-dimensional models
TL;DR: In this paper, a general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model is proposed, which can be easily adjusted for multiplicity taking dependence among tests into account.
•Posted Content
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
Stefan Wager,Susan Athey +1 more
TL;DR: This is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference and is found to be substantially more powerful than classical methods based on nearest-neighbor matching.
816
References
•Book
The Elements of Statistical Learning
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 01 Jan 2001
29.4K
•Book
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 28 Jul 2013
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
21.3K
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
15.4K
•Book
Experimental and Quasi-Experimental Designs for Generalized Causal Inference
William R. Shadish,Thomas D. Cook,Donald T. Campbell +2 more
- 01 Jan 2001
TL;DR: In this article, the authors present experiments and generalized Causal inference methods for single and multiple studies, using both control groups and pretest observations on the outcome of the experiment, and a critical assessment of their assumptions.
15.3K
Estimating causal effects of treatments in randomized and nonrandomized studies.
TL;DR: A discussion of matching, randomization, random sampling, and other methods of controlling extraneous variation is presented in this paper, where the objective is to specify the benefits of randomization in estimating causal effects of treatments.
10.2K