Journal Article10.48550/arXiv.2302.05552
Algorithmically Effective Differentially Private Synthetic Data
Yi He,Roman Vershynin,Yizhe Zhu +2 more
5
TL;DR: In this article , the authors presented a highly effective algorithm for generating differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance.
read more
Abstract: We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset $X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $Y$ such that the expected 1-Wasserstein distance between the empirical measure of $X$ and $Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon dn)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in (Boedihardjo et al., 2022) for $d\geq 2$.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Sample-efficient private data release for Lipschitz functions under sparsity assumptions
TL;DR: In this paper , the authors presented a differentially private data release algorithm that achieves optimal rates of order $n^{-1/d} , where d being the size of the dataset and n being the dimension, for the worst case error over all Lipschitz continuous statistics.
4
Differentially private low-dimensional representation of high-dimensional data
TL;DR: In this article , a differentially private algorithm is proposed to generate low-dimensional synthetic data efficiently from a high-dimensional dataset with a utility guarantee with respect to the Wasserstein distance.
Sample-efficient private data release for Lipschitz functions under sparsity assumptions
19 Feb 2023
TL;DR: In this article , the authors presented a differentially private data release algorithm that achieves optimal rates of order $n^{-1/d} , where d being the size of the dataset and n being the dimension, for the worst case error over all Lipschitz continuous statistics.
Stability, Generalization and Privacy: Precise Analysis for Random and NTK Features
Simone Bombari,Marco Mondelli +1 more
TL;DR: In this article , the authors study the safety of ERM-trained models against a family of powerful black-box attacks and quantifies this safety via two separate terms: (i) the model stability with respect to individual training samples, and (ii) the feature alignment between the attacker query and the original data.
Online Differentially Private Synthetic Data Generation
TL;DR: An online algorithm is developed that generates a differentially private synthetic dataset at each time $t$ that achieves a near-optimal accuracy bound of O(t^{-1/d}\log(t) for d\geq 2 and $O(t^{-1}\log^{4.5}(t) for d=1$ in the 1-Wasserstein distance.
References
•Book
Optimal Transport: Old and New
Cédric Villani
- 02 Jan 2013
TL;DR: In this paper, the authors provide a detailed description of the basic properties of optimal transport, including cyclical monotonicity and Kantorovich duality, and three examples of coupling techniques.
7.4K
•Book
The Algorithmic Foundations of Differential Privacy
Cynthia Dwork,Aaron Roth +1 more
- 11 Aug 2014
TL;DR: The preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example.
Deep Learning with Differential Privacy
TL;DR: This work develops new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrates that deep neural networks can be trained with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
4.3K
Rademacher and gaussian complexities: risk bounds and structural results
Peter L. Bartlett,Shahar Mendelson +1 more
- 01 Mar 2003
TL;DR: In this paper, the authors investigate the use of data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities, in a decision theoretic setting and prove general risk bounds in terms of these complexities.
High-Dimensional Probability: An Introduction with Applications in Data Science
TL;DR: In this article, a random projection of a set T in R n onto an m-dimensional subspace was shown to preserve the geometry of T if m ⪆ d (T ).
1.8K