Differentially private distributed logistic regression using private and public data

doi:10.1186/1755-8794-7-S1-S14

Open AccessJournal Article10.1186/1755-8794-7-S1-S14

Differentially private distributed logistic regression using private and public data

Zhanglong Ji, +4 more

- 08 May 2014

- BMC Medical Genomics

- Vol. 7, Iss: 1, pp 1-10

73

TL;DR: Logistic regression models built with a differentially private distributed logistic regression model based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.

Abstract: Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.18653/V1/N19-5004

Transfer Learning in Natural Language Processing.

Sebastian Ruder, +3 more

- 01 Jun 2019

TL;DR: Transfer learning as discussed by the authors is a set of methods that extend the classical supervised machine learning paradigm by leveraging data from additional domains or tasks to train a model with better generalization properties, which can be used for NLP tasks.

...read moreread less

508

•Posted Content

Differential Privacy and Machine Learning: a Survey and Review.

Zhanglong Ji, +2 more

- 24 Dec 2014

- arXiv: Learning

TL;DR: This paper explores the interplay between machine learning and differential privacy, namely privacy-preserving machine learning algorithms and learning-based data release mechanisms, and describes some theoretical results that address what can be learned differentially privately and upper bounds of loss functions for differentially private algorithms.

...read moreread less

280

•Journal Article•10.1200/CCI.19.00047

Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care.

Fadila Zerka, +9 more

- 05 Mar 2020

TL;DR: The purpose is to review the major implementations of distributed learning in health care and offers an introduction to privacy of patient data and distributed learning as a potential solution to preserving these data.

...read moreread less

128

•Posted Content

Learning Privately from Multiparty Data

Jihun Hamm, +2 more

- 10 Feb 2016

- arXiv: Learning

TL;DR: This work proposes to transfer the `knowledge' of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global $\epsilon$-differentially private classifier, and shows that majority voting is too sensitive and proposes a new risk weighted by class probabilities estimated from the ensemble.

...read moreread less

120

Proceedings Article•10.1109/INFOCOM.2018.8486352

InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy

Lingchen Zhao, +6 more

- 16 Apr 2018

TL;DR: This paper designs and implements a privacy-preserving system for gradient boosting decision tree (GBDT), where different regression trees trained by multiple data owners can be securely aggregated into an ensemble and demonstrates that the system can provide a strong privacy protection for individual data owners while maintaining the prediction accuracy of the original trained model.

...read moreread less

111

...

Expand

References

Journal Article•10.2307/2348743

Applied Logistic Regression.

R. Iyer, +2 more

- 01 Dec 1991

- The Statistician

11.1K

Journal Article•10.1142/S0218488502001648

k -anonymity: a model for protecting privacy

Latanya Sweeney

- 01 Oct 2002

- International Journal of Uncertainty, Fu...

TL;DR: The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment and examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected.

...read moreread less

9.2K

•Book Chapter•10.1007/11681878_14

Calibrating noise to sensitivity in private data analysis

Cynthia Dwork, +3 more

- 04 Mar 2006

TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

...read moreread less

8.9K

•Proceedings Article•10.1109/ICDE.2006.1

L-diversity: privacy beyond k-anonymity

Ashwin Machanavajjhala, +3 more

- 03 Apr 2006

TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.

...read moreread less

4.5K

Journal Article•10.1145/1217299.1217302

L-diversity: Privacy beyond k-anonymity

Ashwin Machanavajjhala, +3 more

- 01 Mar 2007

- ACM Transactions on Knowledge Discovery ...

TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.

...read moreread less

4.3K

...

Expand

Differentially private distributed logistic regression using private and public data

Chat with Paper

AI Agents for this Paper

Citations

Transfer Learning in Natural Language Processing.

Differential Privacy and Machine Learning: a Survey and Review.

Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care.

Learning Privately from Multiparty Data

InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy

References

Applied Logistic Regression.

k -anonymity: a model for protecting privacy

Calibrating noise to sensitivity in private data analysis

L-diversity: privacy beyond k-anonymity

L-diversity: Privacy beyond k-anonymity

Related Papers (5)

Calibrating noise to sensitivity in private data analysis

k -anonymity: a model for protecting privacy

Privacy-Preserving Deep Learning

Differential privacy: a survey of results

Differentially Private Empirical Risk Minimization