DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation
Boxin Wang,Fan Wu,Yunhui Long,Luka Rimanic,Ce Zhang,Bo Li +5 more
- 12 Nov 2021
- pp 2146-2168
30
TL;DR: DataLens as mentioned in this paper proposes a scalable privacy-preserving generative model DataLens, which is able to generate synthetic data in a differentially private (DP) way given sensitive input data.
read more
Abstract: Recent success of deep neural networks (DNNs) hinges on the availability of large-scale dataset; however, training on such dataset often poses privacy risks for sensitive training information. In this paper, we aim to explore the power of generative models and gradient sparsity, and propose a scalable privacy-preserving generative model DataLens, which is able to generate synthetic data in a differentially private (DP) way given sensitive input data. Thus, it is possible to train models for different down-stream tasks with the generated data while protecting the private information. In particular, we leverage the generative adversarial networks (GAN) and PATE framework to train multiple discriminators as "teacher" models, allowing them to vote with their gradient vectors to guarantee privacy. Comparing with the standard PATE privacy preserving framework which allows teachers to vote on one-dimensional predictions, voting on the high dimensional gradient vectors is challenging in terms of privacy preservation. As dimension reduction techniques are required, we need to navigate a delicate tradeoff space between (1) the improvement of privacy preservation and (2) the slowdown of SGD convergence. To tackle this, we propose a novel dimension compression and aggregation approach TopAgg, which combines top-k dimension compression with a corresponding noise injection mechanism. We theoretically prove that the DataLens framework guarantees differential privacy for its generated data, and provide a novel analysis on its convergence to illustrate such a tradeoff on privacy and convergence rate, which requires non-trivial analysis as it requires a joint analysis on gradient compression, coordinate-wise gradient clipping, and DP mechanism. To demonstrate the practical usage of DataLens, we conduct extensive experiments on diverse datasets including MNIST, Fashion-MNIST, and high dimensional CelebA and Place365 datasets. We show that DataLens significantly outperforms other baseline differentially private data generative models. Our code is publicly available at https://github.com/AI-secure/DataLens.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Distributed GAN-Based Privacy-Preserving Publication of Vertically-Partitioned Data
TL;DR: In this paper , the authors proposed a framework based on a generative adversarial network (GAN) for publishing vertically-partitioned data with privacy protection, which adopts a GAN model comprised of one multi-output global generator and multiple local discriminators.
In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning
TL;DR: This work observes that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information, and encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.
2
Energy Efficient and Differentially Private Federated Learning via a Piggyback Approach
TL;DR: In this paper , a differential private federated learning (FL) scheme with the least artificial noises added while minimizing the energy consumption of participating mobile devices is proposed, where gradient compression techniques (i.e., gradient quantization and sparsification) and additive white Gaussian noises (AWGN) in wireless channels are jointly leveraged to develop a piggyback DP approach for FL over mobile devices.
2
Privacy of Autonomous Vehicles: Risks, Protection Methods, and Future Directions
TL;DR: A new taxonomy for privacy risks and protection methods inAVs is provided, and privacy in AVs is categorize into three levels: individual, population, and proprietary .
2
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Posted Content
Deep Residual Learning for Image Recognition
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
117.9K
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
- 08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Deep Learning Face Attributes in the Wild
Ziwei Liu,Ping Luo,Xiaogang Wang,Xiaoou Tang +3 more
- 07 Dec 2015
TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.
Deep Learning with Differential Privacy
Martín Abadi,Andy Chu,Ian Goodfellow,H. Brendan McMahan,Ilya Mironov,Kunal Talwar,Li Zhang +6 more
- 24 Oct 2016
TL;DR: In this paper, the authors develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrate that they can train deep neural networks with nonconvex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
4.6K