Open AccessProceedings Article
Fair Algorithms for Clustering
Suman K. Bera,Deeparnab Chakrabarty,Nicolas J. Flores,Maryam Negahbani +3 more
- 08 Jan 2019
Vol. 32, pp 4954-4965
TL;DR: This work significantly generalizes the seminal work of Chierichetti this http URL and transforms any vanilla clustering solution into a fair one incurring only a slight loss in quality.
read more
Abstract: We study the problem of finding low-cost {\em fair clusterings} in data where each data point may belong to many protected groups. Our work significantly generalizes the seminal work of Chierichetti \etal (NIPS 2017) as follows. - We allow the user to specify the parameters that define fair representation. More precisely, these parameters define the maximum over- and minimum under-representation of any group in any cluster. - Our clustering algorithm works on any $\ell_p$-norm objective (e.g. $k$-means, $k$-median, and $k$-center). Indeed, our algorithm transforms any vanilla clustering solution into a fair one incurring only a slight loss in quality. - Our algorithm also allows individuals to lie in multiple protected groups. In other words, we do not need the protected groups to partition the data and we can maintain fairness across different groups simultaneously. Our experiments show that on established data sets, our algorithm performs much better in practice than what our theoretical results suggest.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Fairness in Machine Learning: A Survey.
Simon Caton,Christian Haas +1 more
TL;DR: An overview of the different schools of thought and approaches to mitigating (social) biases and increase fairness in the Machine Learning literature is provided, organises approaches into the widely accepted framework of pre-processing, in- processing, and post-processing methods, subcategorizing into a further 11 method areas.
•Posted Content
Fair Generative Modeling via Weak Supervision
TL;DR: In this article, a weakly supervised algorithm for overcoming dataset bias for deep generative models is presented, which requires access to an additional small, unlabeled reference dataset as the supervision signal, thus sidestepping the need for explicit labels on the underlying bias factors.
104
Clustering without Over-Representation
Sara Ahmadian,Alessandro Epasto,Ravi Kumar,Mohammad Mahdian +3 more
- 25 Jul 2019
TL;DR: This paper obtains an algorithm that has provable guarantees of performance and a simpler combinatorial algorithm for the special case of the problem where no color has an absolute majority in any cluster.
99
Algorithmic fairness datasets: the story so far
TL;DR: In this article , the authors focus on data documentation debt by surveying over two hundred datasets employed in algorithmic fairness research, and producing standardized and searchable documentation for each of them.
•Posted Content
Coresets for Clustering with Fairness Constraints
TL;DR: An approach to clustering with fairness constraints that involve multiple, non-disjoint types, that is also scalable and achieves a speed-up to recent fair clustering algorithms by incorporating the first known coreset construction for theFair clustering problem with thek-median objective.
69
References
Certifying and Removing Disparate Impact
Michael Feldman,Sorelle A. Friedler,John Moeller,Carlos Scheidegger,Suresh Venkatasubramanian +4 more
- 10 Aug 2015
TL;DR: This work links disparate impact to a measure of classification accuracy that while known, has received relatively little attention and proposes a test for disparate impact based on how well the protected class can be predicted from the other attributes.
A Best Possible Heuristic for the k-Center Problem
TL;DR: A 2-approximation algorithm for the k-center problem with triangle inequality is presented, the key combinatorial object used is called a strong stable set, and the NP-completeness of the corresponding decision problem is proved.
The accuracy, fairness, and limits of predicting recidivism.
Julia Dressel,Hany Farid +1 more
TL;DR: It is shown that the widely used commercial risk assessment software COMPAS is no more accurate or fair than predictions made by people with little or no criminal justice expertise.
The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients
I-Cheng Yeh,Che-hui Lien +1 more
TL;DR: Among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default, and its regression intercept is close to zero, and regression coefficient to one.
1K
Fairness-aware classifier with prejudice remover regularizer
Toshihiro Kamishima,Shotaro Akaho,Hideki Asoh,Jun Sakuma +3 more
- 24 Sep 2012
TL;DR: A regularization approach is proposed that is applicable to any prediction algorithm with probabilistic discriminative models and applied to logistic regression and empirically show its effectiveness and efficiency.
Related Papers (5)
Flavio Chierichetti,Ravi Kumar,Silvio Lattanzi,Sergei Vassilvitskii +3 more
- 01 Jan 2017
Cynthia Dwork,Moritz Hardt,Toniann Pitassi,Omer Reingold,Richard S. Zemel +4 more
- 08 Jan 2012
Sara Ahmadian,Alessandro Epasto,Ravi Kumar,Mohammad Mahdian +3 more
- 25 Jul 2019