PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees

Open AccessProceedings Article

PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees

- 27 Sep 2018

554

TL;DR: This paper investigates a method for ensuring (differential) privacy of the generator of the Generative Adversarial Nets (GAN) framework, and modifies the Private Aggregation of Teacher Ensembles (PATE) framework and applies it to GANs.

Abstract: Machine learning has the potential to assist many communities in using the large datasets that are becoming more and more available. Unfortunately, much of that potential is not being realized because it would require sharing data in a way that compromises privacy. In this paper, we investigate a method for ensuring (differential) privacy of the generator of the Generative Adversarial Nets (GAN) framework. The resulting model can be used for generating synthetic data on which algorithms can be trained and validated, and on which competitions can be conducted, without compromising the privacy of the original dataset. Our method modifies the Private Aggregation of Teacher Ensembles (PATE) framework and applies it to GANs. Our modified framework (which we call PATE-GAN) allows us to tightly bound the influence of any individual sample on the model, resulting in tight differential privacy guarantees and thus an improved performance over models with the same guarantees. We also look at measuring the quality of synthetic data from a new angle; we assert that for the synthetic data to be useful for machine learning researchers, the relative performance of two algorithms (trained and tested) on the synthetic dataset should be the same as their relative performance (when trained and tested) on the original dataset. Our experiments, on various datasets, demonstrate that PATE-GAN consistently outperforms the stateof-the-art method with respect to this and other notions of synthetic data quality.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content•10.22541/AU.158921777.79483839/V2

Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?

Jeremy Georges-Filteau, +1 more

- 16 Nov 2020

TL;DR: A review of GAN algorithms for OHD can be found in this paper, where the authors conducted a review of generative adversarial networks (GAN) algorithms for Observational Health Data (OHD).

...read moreread less

9

Journal Article•10.1109/access.2024.3354277

A Methodology and an Empirical Analysis to Determine the Most Suitable Synthetic Data Generator

A. Kiran, +1 more

- IEEE Access

TL;DR: After conducting experiments, analyzing metrics, and comparing ML scores across all 11 generators, it was determined that the CTGAN from SDV and PATECTGAN from the SN-synth package were the most effective in mimicking real data for all 13 datasets utilized in this research.

...read moreread less

9

Journal Article•10.48550/arxiv.2312.05114

On the Inadequacy of Similarity-based Privacy Metrics: Reconstruction Attacks against "Truly Anonymous Synthetic Data"

Georgi Ganev, +1 more

- 08 Dec 2023

- arXiv.org

TL;DR: This work reviews the privacy metrics offered by leading companies in this space and sheds light on a few critical flaws in reasoning about privacy entirely via empirical evaluations and serves as a warning to practitioners not to deviate from established privacy-preserving mechanisms.

...read moreread less

9

•Posted Content

Differentially Private Deep Learning with Smooth Sensitivity.

Lichao Sun, +3 more

- 01 Mar 2020

- arXiv: Learning

TL;DR: A novel voting mechanism with smooth sensitivity, which is called Immutable Noisy ArgMax, that, under certain conditions, can bear very large random noising from the teacher without affecting the useful information transferred to the student.

...read moreread less

9

Journal Article•10.3389/frai.2022.813842

Toward Sharing Brain Images: Differentially Private TOF-MRA Images With Segmentation Labels Using Generative Adversarial Networks

Tabea Kossen, +13 more

- 02 May 2022

- Frontiers in artificial intelligence

TL;DR: This study implemented a Wasserstein GAN with and without differential privacy guarantees to generate privacy-preserving labeled Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) image patches for brain vessel segmentation to ensure the patient's privacy while maintaining the predictive properties of the data.

...read moreread less

8

...

Expand

References

•Journal Article•10.1023/A:1010933404324

Random Forests

Leo Breiman

- 01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

113.1K

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K

•Journal Article•10.5555/944919.944937

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

36.2K

•Journal Article•10.1214/AOS/1013203451

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

- 01 Oct 2001

- Annals of Statistics

TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.

...read moreread less

26.4K

•Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

- 03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

25.5K

...

Expand

PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees

Chat with Paper

AI Agents for this Paper

Citations

Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?

A Methodology and an Empirical Analysis to Determine the Most Suitable Synthetic Data Generator

On the Inadequacy of Similarity-based Privacy Metrics: Reconstruction Attacks against "Truly Anonymous Synthetic Data"

Differentially Private Deep Learning with Smooth Sensitivity.

Toward Sharing Brain Images: Differentially Private TOF-MRA Images With Segmentation Labels Using Generative Adversarial Networks

References

Random Forests

Generative Adversarial Nets

Latent dirichlet allocation

Greedy function approximation: A gradient boosting machine.

Latent Dirichlet Allocation

Related Papers (5)

Generative Adversarial Nets

The Algorithmic Foundations of Differential Privacy

Deep Learning with Differential Privacy

Calibrating noise to sensitivity in private data analysis

Membership Inference Attacks Against Machine Learning Models