Generalized Bayesian quantification learning

Open AccessPosted Content

Generalized Bayesian quantification learning

- 15 Jan 2020

7

TL;DR: A generalized Bayesian quantification learning (GBQL) approach that uses the entire compositional predictions from probabilistic classifiers and allows for uncertainty in true class labels for the limited labeled test data is proposed.

Abstract: Quantification Learning is the task of prevalence estimation for a test population using predictions from a classifier trained on a different population. Commonly used quantification methods either assume perfect sensitivity and specificity of the classifier, or use the training data to both train the classifier and also estimate its misclassification rates. These methods are inappropriate in the presence of dataset shift, when the misclassification rates in the training population are not representative of those for the test population. A recent Bayesian quantification model addresses dataset shift, but only allows for single-class (categorical) predictions, and assumes perfect knowledge of the true labels on a small number of instances from the test population. We propose a generalized Bayesian quantification learning (GBQL) approach that uses the entire compositional predictions from probabilistic classifiers and allows for uncertainty in true class labels for the limited labeled test data. We use a model-free Bayesian estimating equation approach to compositional data using Kullback-Liebler loss-functions based only on a first-moment assumption. This estimating equation approach coherently links the loss-functions for labeled and unlabeled test cases. We show how our method yields existing quantification approaches as special cases through different prior choices thereby providing an inferential framework around these approaches. Extension to an ensemble GBQL that uses predictions from multiple classifiers yielding inference robust to inclusion of a poor classifier is discussed. We outline a fast and efficient Gibbs sampler using a rounding and coarsening approximation to the loss functions. For large sample settings, we establish posterior consistency of GBQL. Empirical performance of GBQL is demonstrated through simulations and analysis of real data with evident dataset shift.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1073/PNAS.2001238117

Methods for correcting inference based on outcomes predicted by machine learning.

Siruo Wang, +2 more

- 18 Nov 2020

- Proceedings of the National Academy of S...

TL;DR: The postprediction inference (postpi) approach can correct bias and improve variance estimation and subsequent statistical inference with predicted outcomes and can improve inference in two distinct fields: modeling predicted phenotypes in repurposed gene expression data and modeling predicted causes of death in verbal autopsy data.

...read moreread less

57

Bayesian Estimation of Panel Data Fractional Response Models with Endogeneity: An Application to Standardized Test Rates.

Lawrence M. Kessler

- 01 Jan 2013

TL;DR: In this paper, the authors developed new Bayesian estimation procedures for a nonlinear panel data model with a fractional dependent variable which is bounded between zero and one, and applied the model empirically in order to examine the relationship between school spending and student achievement among Florida elementary schools.

...read moreread less

11

•Posted Content

The openVA Toolkit for Verbal Autopsies

Zehang Richard Li, +4 more

- 16 Sep 2021

- arXiv: Applications

TL;DR: The openVA package as mentioned in this paper provides a standardized framework for analyzing VA data that is compatible with all openly available methods and data structure, and provides an open-sourced, R implementation of several most widely used VA methods.

...read moreread less

5

•Posted Content•10.1101/2020.01.21.914002

Post-prediction inference

Siruo Wang, +2 more

- 22 Jan 2020

- bioRxiv

TL;DR: The postpi approach can correct bias and improve variance estimation (and thus subsequent statistical inference) with predicted outcome data and can improve inference in two totally distinct fields: modeling predicted phenotypes in re-purposed gene expression data and modeling predicted causes of death in verbal autopsy data.

...read moreread less

5

•Journal Article•10.32614/rj-2023-020

The openVA Toolkit for Verbal Autopsies

25 Feb 2023

- R Journal

TL;DR: The openVA package as discussed by the authors provides a standardized framework for analyzing VA data that is compatible with all openly available methods and data structure, and demonstrates the pipeline of model fitting, summary, comparison, and visualization in the R environment.

...read moreread less

References

•Journal Article•10.1093/BIOMET/73.1.13

Longitudinal data analysis using generalized linear models

Kung Yee Liang, +1 more

- 01 Apr 1986

- Biometrika

TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.

...read moreread less

18.5K

Journal Article•10.1198/004017002320256422

Generalized Linear Models

Eric R. Ziegel

- 01 Aug 2002

- Technometrics

TL;DR: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.

...read moreread less

14.7K

Journal Article•10.1016/0893-6080(90)90049-Q

Probabilistic neural networks

Donald F. Specht

- 01 Jan 1990

- Neural Networks

TL;DR: A probabilistic neural network that can compute nonlinear decision boundaries which approach the Bayes optimal is formed, and a fourlayer neural network of the type proposed can map any input pattern to any number of classifications.

...read moreread less

4K

•Posted Content

Econometric Methods for Fractional Response Variables with an Application to 401(K) Plan Participation Rates

Leslie E. Papke, +2 more

- 01 Nov 1993

- Social Science Research Network

TL;DR: In this paper, simple quasi-likelihood methods for estimating regression models with a fractional dependent variable and for performing asymptotically valid inference are proposed, and they apply these methods to a data set of employee participation rates in 401(k) pension plans.

...read moreread less

3.5K

Journal Article•10.1016/J.PATCOG.2011.06.019

A unifying view on dataset shift in classification

Jose G. Moreno-Torres, +4 more

- 01 Jan 2012

- Pattern Recognition

TL;DR: This work attempts to present a unifying framework through the review and comparison of some of the most important works in the literature on dataset shift, and uses different names to refer to the same concepts.

...read moreread less

1.1K