Bayesian Pseudocoresets

Open AccessProceedings Article

Bayesian Pseudocoresets

- 01 Jan 2020

Vol. 33, pp 14950-14960

25

TL;DR: Real and synthetic experiments on high-dimensional data demonstrate that Bayesian pseudocoresets achieve significant improvements in posterior approximation error compared to traditional coresets, and that pseudodata provide privacy without a significant loss in approximation quality.

Abstract: Standard Bayesian inference algorithms are prohibitively expensive in the regime of modern large-scale data. Recent work has found that a small, weighted subset of data (a coreset) may be used in place of the full dataset during inference, taking advantage of data redundancy to reduce computational cost. However, this approach has limitations in the increasingly common setting of sensitive, high-dimensional data. Indeed, we prove that there are situations in which the Kullback-Leibler (KL) divergence between the optimal coreset and the true posterior grows with data dimension; and as coresets include a subset of the original data, they cannot be constructed in a manner that preserves individual privacy. We address both of these issues with a single unified solution, Bayesian pseudocoresets—a small weighted collection of synthetic “pseudodata”—along with a variational optimization method to select both pseudodata and weights. The use of pseudodata (as opposed to the original datapoints) enables both the summarization of high-dimensional data and the differentially private summarization of sensitive data. Real and synthetic experiments on high-dimensional data demonstrate that Bayesian pseudocoresets achieve significant improvements in posterior approximation error compared to traditional coresets, and that pseudocoresets provide privacy without a significant loss in approximation quality.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/3514221.3517836

Camel: Managing Data for Efficient Stream Learning

Yiming Li, +2 more

- 10 Jun 2022

TL;DR: Camel is described, a system that addresses the above issues of high training cost, low data effectiveness, and catastrophic forgetting and can outperform the state-of-the-art methods for anti-forgetting on various data distributions.

...read moreread less

18

•Journal Article•10.3390/ELECTRONICS10202482

Assembly of a Coreset of Earth Observation Images on a Small Quantum Computer

Soronzonbold Otgonbaatar, +1 more

- 12 Oct 2021

- Electronics

TL;DR: In this paper, a support vector machine (SVM) was trained on a D-Wave quantum annealer (D-Wave QA) and a conventional computer for Earth observation (EO) data.

...read moreread less

15

Proceedings Article•10.48550/arXiv.2203.05723

Bayesian inference via sparse Hamiltonian flows

Na Chen, +2 more

- 11 Mar 2022

TL;DR: Real and synthetic experiments demonstrate that sparse Hamiltonian flows provide accurate posterior approximations with significantly reduced runtime compared with competing dynamical-system-based inference methods.

...read moreread less

10

•Proceedings Article•10.1109/igarss46834.2022.9884273

Coreset of Hyperspectral Images on a Small Quantum Computer

17 Jul 2022

TL;DR: In this article , a Support Vector Machine (SVM) is trained on the coreset of a given EO data for training an SVM on a small D-Wave quantum annealer (D-Wave QA) which can solve this QP problem more efficiently than a conventional computer.

...read moreread less

7

Proceedings Article•10.48550/arXiv.2203.09675

Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement

Cian Naik, +2 more

- 18 Mar 2022

TL;DR: A Bayesian coreset construction algorithm that selects a uniformly random subset of data, and then optimizes the weights on those data points using a novel quasi-Newton method that is a simple to implement, black-box method that does not require the user to specify a low-cost posterior approximation.

...read moreread less

7

...

Expand

References

•Book Chapter•10.1007/11681878_14

Calibrating noise to sensitivity in private data analysis

Cynthia Dwork, +3 more

- 04 Mar 2006

TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

...read moreread less

8.9K

•Book

The Algorithmic Foundations of Differential Privacy

Cynthia Dwork, +1 more

- 11 Aug 2014

TL;DR: The preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example.

...read moreread less

7.2K

•Proceedings Article•10.1145/2976749.2978318

Deep Learning with Differential Privacy

Martín Abadi, +6 more

- 24 Oct 2016

TL;DR: In this paper, the authors develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrate that they can train deep neural networks with nonconvex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

...read moreread less

4.6K

•Book•10.1201/B10905

Handbook of Markov Chain Monte Carlo

Steve Brooks, +3 more

- 10 May 2011

TL;DR: A Markov chain Monte Carlo based analysis of a multilevel model for functional MRI data and its applications in environmental epidemiology, educational research, and fisheries science are studied.

...read moreread less

3.6K

•Journal Article

Calibrating noise to sensitivity in private data analysis

Cynthia Dwork, +3 more

- 01 Jan 2006

- Lecture Notes in Computer Science

TL;DR: The study is extended to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f, which is the amount that any single argument to f can change its output.

...read moreread less

3.6K