Open AccessProceedings Article
PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees
James Jordon,Jinsung Yoon,Mihaela van der Schaar +2 more
- 27 Sep 2018
TL;DR: This paper investigates a method for ensuring (differential) privacy of the generator of the Generative Adversarial Nets (GAN) framework, and modifies the Private Aggregation of Teacher Ensembles (PATE) framework and applies it to GANs.
read more
Abstract: Machine learning has the potential to assist many communities in using the large datasets that are becoming more and more available. Unfortunately, much of that potential is not being realized because it would require sharing data in a way that compromises privacy. In this paper, we investigate a method for ensuring (differential) privacy of the generator of the Generative Adversarial Nets (GAN) framework. The resulting model can be used for generating synthetic data on which algorithms can be trained and validated, and on which competitions can be conducted, without compromising the privacy of the original dataset. Our method modifies the Private Aggregation of Teacher Ensembles (PATE) framework and applies it to GANs. Our modified framework (which we call PATE-GAN) allows us to tightly bound the influence of any individual sample on the model, resulting in tight differential privacy guarantees and thus an improved performance over models with the same guarantees. We also look at measuring the quality of synthetic data from a new angle; we assert that for the synthetic data to be useful for machine learning researchers, the relative performance of two algorithms (trained and tested) on the synthetic dataset should be the same as their relative performance (when trained and tested) on the original dataset. Our experiments, on various datasets, demonstrate that PATE-GAN consistently outperforms the stateof-the-art method with respect to this and other notions of synthetic data quality.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Synthesizing Individual Consumers′ Credit Historical Data Using Generative Adversarial Networks
TL;DR: This study is significant because it is the first attempt to generate the synthetic data of real-world credit data in practical use and finds that synthetic consumer credit data using GAN shows a substantial utility without severely compromising privacy and would be a useful resource for big data training programs.
5
GRAIMATTER Green Paper: Recommendations for disclosure control of trained Machine Learning (ML) models from Trusted Research Environments (TREs)
Emily Jefferson,James Liley,Maeve E. Malone,Smarti Reel,Alba Crespi-Boixader,Xaroula Kerasidou,Francesco Tava,Andrew McCarthy,Richard J. Preen,Alberto Blanco-Justicia,Esma Mansouri Benssassi,Josep Domingo-Ferrer,Jillian Beggs,Antony Chuter,Chris Cole,Felix Ritchie,Angela Daly,Simon N. Rogers,Jim Smith +18 more
TL;DR: GRAIMATTER has developed a set of usable recommendations for TREs to guard against the additional risks when disclosing trained AI models from TREs as discussed by the authors , which has been published at the end of the sprint research project in September 2022.
5
•Posted Content
Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping
Jaewoo Lee,Daniel Kifer +1 more
TL;DR: In this paper, Renyi Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks by analyzing the back-propagation equations and derive new methods for per-example gradient clipping that are compatible with auto-differentiation and provide better GPU utilization.
5
Differentially Private Generative Adversarial Networks with Model Inversion
Dongjie Chen,Susan Cheung,Chen-Nee Chuah,Sally Ozonoff +3 more
- 07 Dec 2021
TL;DR: Differentially private generative adversarial networks with model inversion outperforms standard DP-GAN method in terms of sample quality and network convergence.
Personalized Privacy-Preserving Framework for Cross-Silo Federated Learning
TL;DR: Wang et al. as mentioned in this paper proposed a Personalized Privacy-Preserving Federated Learning (PPPFL) with a concentration on cross-silo FL to overcome the challenges of nonindependent and identically distributed (non-IID) data among clients.
5
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
- 08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Greedy function approximation: A gradient boosting machine.
TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
•Proceedings Article
Latent Dirichlet Allocation
David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 more
- 03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).