DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation

doi:10.1145/3460120.3484579

Open AccessProceedings Article10.1145/3460120.3484579

DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation

Boxin Wang, +5 more

- 12 Nov 2021

- pp 2146-2168

30

TL;DR: DataLens as mentioned in this paper proposes a scalable privacy-preserving generative model DataLens, which is able to generate synthetic data in a differentially private (DP) way given sensitive input data.

Abstract: Recent success of deep neural networks (DNNs) hinges on the availability of large-scale dataset; however, training on such dataset often poses privacy risks for sensitive training information. In this paper, we aim to explore the power of generative models and gradient sparsity, and propose a scalable privacy-preserving generative model DataLens, which is able to generate synthetic data in a differentially private (DP) way given sensitive input data. Thus, it is possible to train models for different down-stream tasks with the generated data while protecting the private information. In particular, we leverage the generative adversarial networks (GAN) and PATE framework to train multiple discriminators as "teacher" models, allowing them to vote with their gradient vectors to guarantee privacy. Comparing with the standard PATE privacy preserving framework which allows teachers to vote on one-dimensional predictions, voting on the high dimensional gradient vectors is challenging in terms of privacy preservation. As dimension reduction techniques are required, we need to navigate a delicate tradeoff space between (1) the improvement of privacy preservation and (2) the slowdown of SGD convergence. To tackle this, we propose a novel dimension compression and aggregation approach TopAgg, which combines top-k dimension compression with a corresponding noise injection mechanism. We theoretically prove that the DataLens framework guarantees differential privacy for its generated data, and provide a novel analysis on its convergence to illustrate such a tradeoff on privacy and convergence rate, which requires non-trivial analysis as it requires a joint analysis on gradient compression, coordinate-wise gradient clipping, and DP mechanism. To demonstrate the practical usage of DataLens, we conduct extensive experiments on diverse datasets including MNIST, Fashion-MNIST, and high dimensional CelebA and Place365 datasets. We show that DataLens significantly outperforms other baseline differentially private data generative models. Our code is publicly available at https://github.com/AI-secure/DataLens.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.56553/popets-2023-0050

Distributed GAN-Based Privacy-Preserving Publication of Vertically-Partitioned Data

Xue Jiang, +3 more

- 01 Apr 2023

- Proceedings on Privacy Enhancing Technol...

TL;DR: In this paper , the authors proposed a framework based on a generative adversarial network (GAN) for publishing vertically-partitioned data with privacy protection, which adopts a GAN model comprised of one multi-output global generator and multiple local discriminators.

...read moreread less

3

Journal Article•10.48550/arXiv.2210.00665

β-Stochastic Sign SGD: A Byzantine Resilient and Differentially Private Gradient Compressor for Federated Learning

Ming Xiang, +1 more

- arXiv.org

TL;DR:

...read moreread less

2

Journal Article•10.48550/arXiv.2209.10732

In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning

Jiaqi Wang, +4 more

- 22 Sep 2022

- arXiv.org

TL;DR: This work observes that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information, and encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.

...read moreread less

2

Journal Article•10.1109/tmc.2023.3268323

Energy Efficient and Differentially Private Federated Learning via a Piggyback Approach

Rui-tian Chen, +5 more

- 01 Jan 2023

- IEEE Transactions on Mobile Computing

TL;DR: In this paper , a differential private federated learning (FL) scheme with the least artificial noises added while minimizing the energy consumption of participating mobile devices is proposed, where gradient compression techniques (i.e., gradient quantization and sparsification) and additive white Gaussian noises (AWGN) in wireless channels are jointly leveraged to develop a piggyback DP approach for FL over mobile devices.

...read moreread less

2

Journal Article•10.48550/arXiv.2209.04022

Privacy of Autonomous Vehicles: Risks, Protection Methods, and Future Directions

Chulin Xie, +5 more

- 08 Sep 2022

- arXiv.org

TL;DR: A new taxonomy for privacy risks and protection methods inAVs is provided, and privacy in AVs is categorize into three levels: individual, population, and proprietary .

...read moreread less

2

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K

•Proceedings Article•10.1109/ICCV.2015.425

Deep Learning Face Attributes in the Wild

Ziwei Liu, +3 more

- 07 Dec 2015

TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.

...read moreread less

10.1K

•Proceedings Article•10.1145/2976749.2978318

Deep Learning with Differential Privacy

Martín Abadi, +6 more

- 24 Oct 2016

TL;DR: In this paper, the authors develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrate that they can train deep neural networks with nonconvex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

...read moreread less

4.6K