Open AccessProceedings Article
Bayesian Pseudocoresets
Dionysis Manousakas,Zuheng Xu,Cecilia Mascolo,Trevor Campbell +3 more
- 01 Jan 2020
Vol. 33, pp 14950-14960
TL;DR: Real and synthetic experiments on high-dimensional data demonstrate that Bayesian pseudocoresets achieve significant improvements in posterior approximation error compared to traditional coresets, and that pseudodata provide privacy without a significant loss in approximation quality.
read more
Abstract: Standard Bayesian inference algorithms are prohibitively expensive in the regime of modern large-scale data. Recent work has found that a small, weighted subset of data (a coreset) may be used in place of the full dataset during inference, taking advantage of data redundancy to reduce computational cost. However, this approach has limitations in the increasingly common setting of sensitive, high-dimensional data. Indeed, we prove that there are situations in which the Kullback-Leibler (KL) divergence between the optimal coreset and the true posterior grows with data dimension; and as coresets include a subset of the original data, they cannot be constructed in a manner that preserves individual privacy. We address both of these issues with a single unified solution, Bayesian pseudocoresets—a small weighted collection of synthetic “pseudodata”—along with a variational optimization method to select both pseudodata and weights. The use of pseudodata (as opposed to the original datapoints) enables both the summarization of high-dimensional data and the differentially private summarization of sensitive data. Real and synthetic experiments on high-dimensional data demonstrate that Bayesian pseudocoresets achieve significant improvements in posterior approximation error compared to traditional coresets, and that pseudocoresets provide privacy without a significant loss in approximation quality.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Camel: Managing Data for Efficient Stream Learning
Yiming Li,Yanyan Shen,Lei Chen +2 more
- 10 Jun 2022
TL;DR: Camel is described, a system that addresses the above issues of high training cost, low data effectiveness, and catastrophic forgetting and can outperform the state-of-the-art methods for anti-forgetting on various data distributions.
18
Assembly of a Coreset of Earth Observation Images on a Small Quantum Computer
TL;DR: In this paper, a support vector machine (SVM) was trained on a D-Wave quantum annealer (D-Wave QA) and a conventional computer for Earth observation (EO) data.
Bayesian inference via sparse Hamiltonian flows
Na Chen,Zuheng Xu,Trevor D. J. Campbell +2 more
- 11 Mar 2022
TL;DR: Real and synthetic experiments demonstrate that sparse Hamiltonian flows provide accurate posterior approximations with significantly reduced runtime compared with competing dynamical-system-based inference methods.
Coreset of Hyperspectral Images on a Small Quantum Computer
17 Jul 2022
TL;DR: In this article , a Support Vector Machine (SVM) is trained on the coreset of a given EO data for training an SVM on a small D-Wave quantum annealer (D-Wave QA) which can solve this QP problem more efficiently than a conventional computer.
7
Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement
Cian Naik,Judith Rousseau,Trevor D. J. Campbell +2 more
- 18 Mar 2022
TL;DR: A Bayesian coreset construction algorithm that selects a uniformly random subset of data, and then optimizes the weights on those data points using a novel quasi-Newton method that is a simple to implement, black-box method that does not require the user to specify a low-cost posterior approximation.
7
References
Calibrating noise to sensitivity in private data analysis
Cynthia Dwork,Frank McSherry,Kobbi Nissim,Adam Smith +3 more
- 04 Mar 2006
TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.
•Book
The Algorithmic Foundations of Differential Privacy
Cynthia Dwork,Aaron Roth +1 more
- 11 Aug 2014
TL;DR: The preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example.
Deep Learning with Differential Privacy
Martín Abadi,Andy Chu,Ian Goodfellow,H. Brendan McMahan,Ilya Mironov,Kunal Talwar,Li Zhang +6 more
- 24 Oct 2016
TL;DR: In this paper, the authors develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrate that they can train deep neural networks with nonconvex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
4.6K
Handbook of Markov Chain Monte Carlo
Steve Brooks,Andrew Gelman,Galin L. Jones,Xiao-Li Meng +3 more
- 10 May 2011
TL;DR: A Markov chain Monte Carlo based analysis of a multilevel model for functional MRI data and its applications in environmental epidemiology, educational research, and fisheries science are studied.
3.6K
•Journal Article
Calibrating noise to sensitivity in private data analysis
TL;DR: The study is extended to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f, which is the amount that any single argument to f can change its output.
3.6K
Related Papers (5)
[...]
David Ríos Insua,Fabrizio Ruggeri,Michael P. Wiper +2 more
- 08 Apr 2012
Harry F. Martz,Ray A. Waller +1 more
[...]
Jeff Gill
- 01 Jan 2002
I. J. Good,R. A. Gaskins +1 more