Regularized Bayesian transfer learning for population level etiological distributions.

Open AccessPosted Content

Regularized Bayesian transfer learning for population level etiological distributions.

- 24 Oct 2018

10

TL;DR: A parsimonious hierarchical Bayesian transfer learning framework to directly estimate population-level class probabilities in a target domain, using any baseline classifier trained on source-domain, and a small labeled target-domain dataset and introduces a novel shrinkage prior for the transfer error rates.

Abstract: Computer-coded verbal autopsy (CCVA) algorithms predict cause of death from high-dimensional family questionnaire data (verbal autopsies) of a deceased individual. CCVA algorithms are typically trained on non-local data, then used to generate national and regional estimates of cause-specific mortality fractions. These estimates may be inaccurate if the non-local training data is different from the local population of interest. This problem is a special case of transfer learning. However, most transfer learning classification approaches are concerned with individual (e.g. a person's) classification within a target domain (e.g. a particular population) with training performed in data from a source domain. Epidemiologists are often more interested in estimating population-level etiological distributions, using datasets much smaller than those used in common transfer learning applications. We present a parsimonious hierarchical Bayesian transfer learning framework to directly estimate population-level class probabilities in a target domain. To address small sample sizes, we introduce a novel shrinkage prior for the transfer error rates guaranteeing that, in absence of any labeled target domain data or when the baseline classifier has zero transfer error, the calibrated estimate of class probabilities coincides with the naive estimates from the baseline classifier, thereby subsuming the default practice as a special case. A novel Gibbs sampler using data-augmentation enables fast implementation. We extend our approach to use not one, but an ensemble of baseline classifiers. Theoretical and empirical results demonstrate how the ensemble model favors the most accurate baseline classifier. We present extensions allowing class probabilities to vary with covariates, and an EM-algorithm-based MAP estimation. An R-package implementing this method is developed.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1073/PNAS.2001238117

Methods for correcting inference based on outcomes predicted by machine learning.

Siruo Wang, +2 more

- 18 Nov 2020

- Proceedings of the National Academy of S...

TL;DR: The postprediction inference (postpi) approach can correct bias and improve variance estimation and subsequent statistical inference with predicted outcomes and can improve inference in two distinct fields: modeling predicted phenotypes in repurposed gene expression data and modeling predicted causes of death in verbal autopsy data.

...read moreread less

57

•Journal Article•10.1080/01621459.2021.1909599

Generalized Bayes Quantification Learning under Dataset Shift

Jacob Fiksel, +3 more

- 04 May 2021

- Journal of the American Statistical Asso...

TL;DR: Generalized Bayes quantification learning (GBQL) is proposed that uses the entire compositional predictions from probabilistic classifiers and allows for uncertainty in true class labels for the limited labeled test data and uses a model-free Bayesian estimating equation approach to compositional data.

...read moreread less

15

•Posted Content

Generalized Bayesian quantification learning

Jacob Fiksel, +3 more

- 15 Jan 2020

- arXiv: Methodology

TL;DR: A generalized Bayesian quantification learning (GBQL) approach that uses the entire compositional predictions from probabilistic classifiers and allows for uncertainty in true class labels for the limited labeled test data is proposed.

...read moreread less

7

•Posted Content•10.1101/2020.01.21.914002

Post-prediction inference

Siruo Wang, +2 more

- 22 Jan 2020

- bioRxiv

TL;DR: The postpi approach can correct bias and improve variance estimation (and thus subsequent statistical inference) with predicted outcome data and can improve inference in two totally distinct fields: modeling predicted phenotypes in re-purposed gene expression data and modeling predicted causes of death in verbal autopsy data.

...read moreread less

5

•Journal Article•10.1002/SIM.8804

Probabilistic cause-of-disease assignment using case-control diagnostic tests: A latent variable regression approach.

Zhenke Wu, +1 more

- 20 Feb 2021

- Statistics in Medicine

TL;DR: A novel and unified regression modeling framework for estimating covariate‐dependent CSCF functions in case‐control disease etiology studies is proposed and an efficient Markov chain Monte Carlo algorithm for flexible posterior inference is derived.

...read moreread less

2

References

•Journal Article•10.1023/A:1010933404324

Random Forests

Leo Breiman

- 01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

113.1K

•Journal Article•10.1023/A:1022627411411

Support-Vector Networks

Corinna Cortes, +1 more

- 15 Sep 1995

- Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

42K

Journal Article•10.1109/TKDE.2009.191

A Survey on Transfer Learning

Sinno Jialin Pan, +1 more

- 01 Oct 2010

- IEEE Transactions on Knowledge and Data ...

TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.

...read moreread less

24.9K

•Journal Article•10.1186/S40537-016-0043-6

A survey of transfer learning

Karl R. Weiss, +2 more

- 28 May 2016

- Journal of Big Data

TL;DR: This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied toTransfer learning, which can be applied to big data environments.

...read moreread less

5.3K

•Proceedings Article•10.1109/CVPR.2014.222

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks

Maxime Oquab, +4 more

- 23 Jun 2014

TL;DR: This work designs a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset, and shows that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification.

...read moreread less

3.8K

...

Expand

Regularized Bayesian transfer learning for population level etiological distributions.

Chat with Paper

AI Agents for this Paper

Citations

Methods for correcting inference based on outcomes predicted by machine learning.

Generalized Bayes Quantification Learning under Dataset Shift

Generalized Bayesian quantification learning

Post-prediction inference

Probabilistic cause-of-disease assignment using case-control diagnostic tests: A latent variable regression approach.

References

Random Forests

Support-Vector Networks

A Survey on Transfer Learning

A survey of transfer learning

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks

Related Papers (5)

Fully Bayesian Estimation Under Informative Sampling.

Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies.

Generative and Discriminative Learning with Unknown Labeling Bias

Bayesian Classifier Combination

Bayesian Approaches to Distribution Regression