Journal Article10.48550/arXiv.2209.00617
Fair mapping
TL;DR: A novel pre-processing method based on the transformation of the distribution of protected groups onto a chosen target one, with additional privacy constraints whose objective is to prevent the inference of sensitive attributes is proposed, which preserves the interpretability of data and can be used without defining exactly the sensitive groups.
read more
Abstract: To mitigate the effects of undesired biases in models, several approaches propose to pre-process the input dataset to reduce the risks of discrimination by preventing the inference of sensitive attributes. Unfortunately, most of these pre-processing methods lead to the generation a new distribution that is very different from the original one, thus often leading to unrealistic data. As a side effect, this new data distribution implies that existing models need to be re-trained to be able to make accurate predictions. To address this issue, we propose a novel pre-processing method, that we coin as fair mapping, based on the transformation of the distribution of protected groups onto a chosen target one, with additional privacy constraints whose objective is to prevent the inference of sensitive attributes. More precisely, we leverage on the recent works of the Wasserstein GAN and AttGAN frameworks to achieve the optimal transport of data points coupled with a discriminator enforcing the protection against attribute inference. Our proposed approach, preserves the interpretability of data and can be used without defining exactly the sensitive groups. In addition, our approach can be specialized to model existing state-of-the-art approaches, thus proposing a unifying view on these methods. Finally, several experiments on real and synthetic datasets demonstrate that our approach is able to hide the sensitive attributes, while limiting the distortion of the data and improving the fairness on subsequent data analysis tasks.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
- 08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
•Proceedings Article
Wasserstein Generative Adversarial Networks
Martin Arjovsky,Soumith Chintala,Léon Bottou +2 more
- 17 Jul 2017
TL;DR: This work introduces a new algorithm named WGAN, an alternative to traditional GAN training that can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches.
•Proceedings Article
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
Marco Cuturi
- 05 Dec 2013
TL;DR: This work smooths the classic optimal transport problem with an entropic regularization term, and shows that the resulting optimum is also a distance which can be computed through Sinkhorn's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transport solvers.
•Proceedings Article
Equality of opportunity in supervised learning
Moritz Hardt,Eric Price,Nathan Srebro +2 more
- 05 Dec 2016
TL;DR: This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition.
t-Closeness: Privacy Beyond k-Anonymity and l-Diversity
Ninghui Li,Tiancheng Li,Suresh Venkatasubramanian +2 more
- 15 Apr 2007
TL;DR: T-closeness as mentioned in this paper requires that the distribution of a sensitive attribute in any equivalence class is close to the distributions of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t).