Differentially private distributed logistic regression using private and public data
TL;DR: Logistic regression models built with a differentially private distributed logistic regression model based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.
read more
Abstract: Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Transfer Learning in Natural Language Processing.
Sebastian Ruder,Matthew E. Peters,Swabha Swayamdipta,Thomas Wolf +3 more
- 01 Jun 2019
TL;DR: Transfer learning as discussed by the authors is a set of methods that extend the classical supervised machine learning paradigm by leveraging data from additional domains or tasks to train a model with better generalization properties, which can be used for NLP tasks.
508
•Posted Content
Differential Privacy and Machine Learning: a Survey and Review.
TL;DR: This paper explores the interplay between machine learning and differential privacy, namely privacy-preserving machine learning algorithms and learning-based data release mechanisms, and describes some theoretical results that address what can be learned differentially privately and upper bounds of loss functions for differentially private algorithms.
280
Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care.
Fadila Zerka,Samir Barakat,Sean Walsh,Marta Bogowicz,Marta Bogowicz,Ralph T.H. Leijenaar,Arthur Jochems,Benjamin Miraglio,David Townend,Philippe Lambin +9 more
- 05 Mar 2020
TL;DR: The purpose is to review the major implementations of distributed learning in health care and offers an introduction to privacy of patient data and distributed learning as a potential solution to preserving these data.
128
•Posted Content
Learning Privately from Multiparty Data
TL;DR: This work proposes to transfer the `knowledge' of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global $\epsilon$-differentially private classifier, and shows that majority voting is too sensitive and proposes a new risk weighted by class probabilities estimated from the ensemble.
120
InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy
Lingchen Zhao,Lihao Ni,Shengshan Hu,Yaniiao Chen,Pan Zhou,Fu Xiao,Libing Wu +6 more
- 16 Apr 2018
TL;DR: This paper designs and implements a privacy-preserving system for gradient boosting decision tree (GBDT), where different regression trees trained by multiple data owners can be securely aggregated into an ensemble and demonstrates that the system can provide a strong privacy protection for individual data owners while maintaining the prediction accuracy of the original trained model.
111
References
k -anonymity: a model for protecting privacy
TL;DR: The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment and examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected.
9.2K
Calibrating noise to sensitivity in private data analysis
Cynthia Dwork,Frank McSherry,Kobbi Nissim,Adam Smith +3 more
- 04 Mar 2006
TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.
L-diversity: privacy beyond k-anonymity
Ashwin Machanavajjhala,Johannes Gehrke,Daniel Kifer,Muthuramakrishnan Venkitasubramaniam +3 more
- 03 Apr 2006
TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.
L-diversity: Privacy beyond k-anonymity
TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.