Journal Article10.1109/TC.2014.2360516
Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud
TL;DR: This paper investigates the local-recoding problem for big data anonymization against proximity privacy breaches and attempts to identify a scalable solution to this problem, and presents a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes and model the problem of local recoding as a proximity-aware clustering problem.
read more
Abstract: Cloud computing provides promising scalable IT infrastructure to support various processing of a variety of big data applications in sectors such as healthcare and business. Data sets like electronic health records in such applications often contain privacy-sensitive information, which brings about privacy concerns potentially if the information is released or shared to third-parties in cloud. A practical and widely-adopted technique for data privacy preservation is to anonymize data via generalization to satisfy a given privacy model. However, most existing privacy preserving approaches tailored to small-scale data sets often fall short when encountering big data, due to their insufficiency or poor scalability. In this paper, we investigate the local-recoding problem for big data anonymization against proximity privacy breaches and attempt to identify a scalable solution to this problem. Specifically, we present a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes, and model the problem of local recoding as a proximity-aware clustering problem. A scalable two-phase clustering approach consisting of a t -ancestors clustering (similar to k -means) algorithm and a proximity-aware agglomerative clustering algorithm is proposed to address the above problem. We design the algorithms with MapReduce to gain high scalability by performing data-parallel computation in cloud. Extensive experiments on real-life data sets demonstrate that our approach significantly improves the capability of defending the proximity privacy breaches, the scalability and the time-efficiency of local-recoding anonymization over existing approaches.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Privacy-Preserving Energy Trading Using Consortium Blockchain in Smart Grid
TL;DR: The proposed approach mainly addresses energy trading users’ privacy in smart grid and screens the distribution of energy sale of sellers deriving from the fact that various energy trading volumes can be mined to detect its relationships with other information, such as physical location and energy usage.
567
Big data privacy: a technological perspective and review
TL;DR: This paper covers uses of privacy by taking existing methods such as HybrEx, k-anonymity, T-closeness and L-diversity and its implementation in business and presents recent techniques of privacy preserving in big data.
359
A survey on security and privacy issues in big data
Duygu Sinanc Terzi,Ramazan Terzi,Seref Sagiroglu +2 more
- 01 Dec 2015
TL;DR: The big data, its ecosystem, concerns on big data and comparative view of big data privacy and security approaches in literature are presented in literature in terms of infrastructure, application, and data.
158
Handling big data: research challenges and future directions
TL;DR: A classification of some of the most important challenges when handling big data is presented and solutions that could address the identified challenges are recommended.
146
Host-Based Intrusion Detection System with System Calls: Review and Future Trends
TL;DR: A review of the development of system-call-based HIDS and future research trends is provided, namely, the reduction of the false-positive rate, the improvement of detection efficiency, and the enhancement of collaborative security.
References
Least squares quantization in PCM
TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.
Least Squares Quantization in PCM
S. P. Lloyd
- 01 Jan 1982
TL;DR: The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.
9.6K
k -anonymity: a model for protecting privacy
TL;DR: The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment and examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected.
9.2K
L-diversity: privacy beyond k-anonymity
Ashwin Machanavajjhala,Johannes Gehrke,Daniel Kifer,Muthuramakrishnan Venkitasubramaniam +3 more
- 03 Apr 2006
TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.
L-diversity: Privacy beyond k-anonymity
TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.