Fair Algorithms for Clustering

Open AccessPosted Content

Fair Algorithms for Clustering

- 08 Jan 2019

82

TL;DR: In this paper, the authors study the problem of finding low-cost fair clusterings in data where each data point may belong to many protected groups and propose a fair clustering algorithm that allows the user to specify the parameters that define fair representation.

Abstract: We study the problem of finding low-cost Fair Clusterings in data where each data point may belong to many protected groups. Our work significantly generalizes the seminal work of Chierichetti this http URL. (NIPS 2017) as follows. - We allow the user to specify the parameters that define fair representation. More precisely, these parameters define the maximum over- and minimum under-representation of any group in any cluster. - Our clustering algorithm works on any $\ell_p$-norm objective (e.g. $k$-means, $k$-median, and $k$-center). Indeed, our algorithm transforms any vanilla clustering solution into a fair one incurring only a slight loss in quality. - Our algorithm also allows individuals to lie in multiple protected groups. In other words, we do not need the protected groups to partition the data and we can maintain fairness across different groups simultaneously. Our experiments show that on established data sets, our algorithm performs much better in practice than what our theoretical results suggest.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1002/widm.1452

A survey on datasets for fairness‐aware machine learning

Tai Le Quy

- 03 Mar 2022

- Wiley Interdisciplinary Reviews-Data Min...

TL;DR: In this paper , the authors focus on tabular data as the most common data representation for fairness-aware ML and identify relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network.

...read moreread less

127

•Proceedings Article

Explainable k-Means and k-Medians Clustering

Michal Moshkovitz, +3 more

- 12 Jul 2020

TL;DR: It is shown that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and that any tree-induced clustering must in general incur an $\Omega(\log k)$ approximation factor compared to the optimal clustering.

...read moreread less

122

•Proceedings Article•10.1145/3442188.3445906

Socially Fair k-Means Clustering

Mehrdad Ghadiri, +2 more

- 03 Mar 2021

TL;DR: It is found that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.

...read moreread less

106

•Journal Article•10.1007/s10618-022-00854-z

Algorithmic fairness datasets: the story so far

Alessandro Fabris, +3 more

- 03 Feb 2022

- Data Mining and Knowledge Discovery

TL;DR: In this article , the authors focus on data documentation debt by surveying over two hundred datasets employed in algorithmic fairness research, and producing standardized and searchable documentation for each of them.

...read moreread less

91

•Proceedings Article

Scalable Fair Clustering

Arturs Backurs, +5 more

- 10 Feb 2019

TL;DR: In this article, the authors proposed an approximate fairlet decomposition algorithm that runs in nearly linear time for the fair variant of the classic $k$-median problem, where the points are colored and the goal is to minimize the same average distance objective while ensuring that all clusters have an "approximately equal" number of points of each color.

...read moreread less

91

...

Expand

References

UCI Machine Learning Repository

A. Asuncion

- 01 Jan 2007

24.3K

•Proceedings Article•10.5555/1283383.1283494

k-means++: the advantages of careful seeding

David Arthur, +1 more

- 07 Jan 2007

TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.

...read moreread less

9.5K

•Proceedings Article•10.1145/2090236.2090255

Fairness through awareness

Cynthia Dwork, +4 more

- 08 Jan 2012

TL;DR: A framework for fair classification comprising a (hypothetical) task-specific metric for determining the degree to which individuals are similar with respect to the classification task at hand and an algorithm for maximizing utility subject to the fairness constraint, that similar individuals are treated similarly is presented.

...read moreread less

3.2K

...

Expand

Fair Algorithms for Clustering

Chat with Paper

AI Agents for this Paper

Citations

A survey on datasets for fairness‐aware machine learning

Explainable k-Means and k-Medians Clustering

Socially Fair k-Means Clustering

Algorithmic fairness datasets: the story so far

Scalable Fair Clustering

References

Scikit-learn: Machine Learning in Python

Scikit-learn: Machine Learning in Python

UCI Machine Learning Repository

k-means++: the advantages of careful seeding

Fairness through awareness

Related Papers (5)

Fair Clustering Through Fairlets

Towards Fair Deep Clustering With Multi-State Protected Variables.

The Price of Fair PCA: One Extra Dimension

A rigorous analysis of population stratification with limited data

Distance Metric Learning with Application to Clustering with Side-Information