Open AccessPosted Content
Fair Algorithms for Clustering
TL;DR: In this paper, the authors study the problem of finding low-cost fair clusterings in data where each data point may belong to many protected groups and propose a fair clustering algorithm that allows the user to specify the parameters that define fair representation.
read more
Abstract: We study the problem of finding low-cost Fair Clusterings in data where each data point may belong to many protected groups. Our work significantly generalizes the seminal work of Chierichetti this http URL. (NIPS 2017) as follows.
- We allow the user to specify the parameters that define fair representation. More precisely, these parameters define the maximum over- and minimum under-representation of any group in any cluster.
- Our clustering algorithm works on any $\ell_p$-norm objective (e.g. $k$-means, $k$-median, and $k$-center). Indeed, our algorithm transforms any vanilla clustering solution into a fair one incurring only a slight loss in quality.
- Our algorithm also allows individuals to lie in multiple protected groups. In other words, we do not need the protected groups to partition the data and we can maintain fairness across different groups simultaneously.
Our experiments show that on established data sets, our algorithm performs much better in practice than what our theoretical results suggest.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A survey on datasets for fairness‐aware machine learning
TL;DR: In this paper , the authors focus on tabular data as the most common data representation for fairness-aware ML and identify relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network.
127
•Proceedings Article
Explainable k-Means and k-Medians Clustering
Michal Moshkovitz,Sanjoy Dasgupta,Cyrus Rashtchian,Nave Frost +3 more
- 12 Jul 2020
TL;DR: It is shown that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and that any tree-induced clustering must in general incur an $\Omega(\log k)$ approximation factor compared to the optimal clustering.
Socially Fair k-Means Clustering
Mehrdad Ghadiri,Samira Samadi,Santosh Vempala +2 more
- 03 Mar 2021
TL;DR: It is found that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.
106
Algorithmic fairness datasets: the story so far
TL;DR: In this article , the authors focus on data documentation debt by surveying over two hundred datasets employed in algorithmic fairness research, and producing standardized and searchable documentation for each of them.
•Proceedings Article
Scalable Fair Clustering
Arturs Backurs,Piotr Indyk,Krzysztof Onak,Baruch Schieber,Ali Vakilian,Tal Wagner +5 more
- 10 Feb 2019
TL;DR: In this article, the authors proposed an approximate fairlet decomposition algorithm that runs in nearly linear time for the fair variant of the classic $k$-median problem, where the points are colored and the goal is to minimize the same average distance objective while ensuring that all clusters have an "approximately equal" number of points of each color.
References
•Journal Article
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +15 more
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
•Posted Content
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Andreas Müller,Joel Nothman,Gilles Louppe,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +18 more
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
28.9K
k-means++: the advantages of careful seeding
David Arthur,Sergei Vassilvitskii +1 more
- 07 Jan 2007
TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
Fairness through awareness
Cynthia Dwork,Moritz Hardt,Toniann Pitassi,Omer Reingold,Richard S. Zemel +4 more
- 08 Jan 2012
TL;DR: A framework for fair classification comprising a (hypothetical) task-specific metric for determining the degree to which individuals are similar with respect to the classification task at hand and an algorithm for maximizing utility subject to the fairness constraint, that similar individuals are treated similarly is presented.