Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud

doi:10.1109/TC.2014.2360516

Journal Article10.1109/TC.2014.2360516

Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud

Xuyun Zhang, +6 more

- 01 Aug 2015

- IEEE Transactions on Computers

- Vol. 64, Iss: 8, pp 2293-2307

109

TL;DR: This paper investigates the local-recoding problem for big data anonymization against proximity privacy breaches and attempts to identify a scalable solution to this problem, and presents a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes and model the problem of local recoding as a proximity-aware clustering problem.

Abstract: Cloud computing provides promising scalable IT infrastructure to support various processing of a variety of big data applications in sectors such as healthcare and business. Data sets like electronic health records in such applications often contain privacy-sensitive information, which brings about privacy concerns potentially if the information is released or shared to third-parties in cloud. A practical and widely-adopted technique for data privacy preservation is to anonymize data via generalization to satisfy a given privacy model. However, most existing privacy preserving approaches tailored to small-scale data sets often fall short when encountering big data, due to their insufficiency or poor scalability. In this paper, we investigate the local-recoding problem for big data anonymization against proximity privacy breaches and attempt to identify a scalable solution to this problem. Specifically, we present a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes, and model the problem of local recoding as a proximity-aware clustering problem. A scalable two-phase clustering approach consisting of a t -ancestors clustering (similar to k -means) algorithm and a proximity-aware agglomerative clustering algorithm is proposed to address the above problem. We design the algorithms with MapReduce to gain high scalability by performing data-parallel computation in cloud. Extensive experiments on real-life data sets demonstrate that our approach significantly improves the capability of defending the proximity privacy breaches, the scalability and the time-efficiency of local-recoding anonymization over existing approaches.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/TII.2019.2893433

Privacy-Preserving Energy Trading Using Consortium Blockchain in Smart Grid

Keke Gai, +4 more

- 16 Jan 2019

- IEEE Transactions on Industrial Informat...

TL;DR: The proposed approach mainly addresses energy trading users’ privacy in smart grid and screens the distribution of energy sale of sellers deriving from the fact that various energy trading volumes can be mined to detect its relationships with other information, such as physical location and energy usage.

...read moreread less

567

•Journal Article•10.1186/S40537-016-0059-Y

Big data privacy: a technological perspective and review

Priyank Jain, +2 more

- 26 Nov 2016

- Journal of Big Data

TL;DR: This paper covers uses of privacy by taking existing methods such as HybrEx, k-anonymity, T-closeness and L-diversity and its implementation in business and presents recent techniques of privacy preserving in big data.

...read moreread less

359

Proceedings Article•10.1109/ICITST.2015.7412089

A survey on security and privacy issues in big data

Duygu Sinanc Terzi, +2 more

- 01 Dec 2015

TL;DR: The big data, its ecosystem, concerns on big data and comparative view of big data privacy and security approaches in literature are presented in literature in terms of infrastructure, application, and data.

...read moreread less

158

Journal Article•10.1007/S11227-016-1677-Z

Handling big data: research challenges and future directions

Ioannis Anagnostopoulos, +2 more

- 01 Apr 2016

- The Journal of Supercomputing

TL;DR: A classification of some of the most important challenges when handling big data is presented and solutions that could address the identified challenges are recommended.

...read moreread less

146

Journal Article•10.1145/3214304

Host-Based Intrusion Detection System with System Calls: Review and Future Trends

Ming Liu, +4 more

- 19 Nov 2018

- ACM Computing Surveys

TL;DR: A review of the development of system-call-based HIDS and future research trends is provided, namely, the reduction of the false-positive rate, the improvement of detection efficiency, and the enhancement of collaborative security.

...read moreread less

139

...

Expand

References

•Journal Article•10.1109/TIT.1982.1056489

Least squares quantization in PCM

S. P. Lloyd

- 01 Mar 1982

- IEEE Transactions on Information Theory

TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.

...read moreread less

16K

Least Squares Quantization in PCM

S. P. Lloyd

- 01 Jan 1982

TL;DR: The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.

...read moreread less

9.6K

Journal Article•10.1142/S0218488502001648

k -anonymity: a model for protecting privacy

Latanya Sweeney

- 01 Oct 2002

- International Journal of Uncertainty, Fu...

TL;DR: The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment and examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected.

...read moreread less

9.2K

•Proceedings Article•10.1109/ICDE.2006.1

L-diversity: privacy beyond k-anonymity

Ashwin Machanavajjhala, +3 more

- 03 Apr 2006

TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.

...read moreread less

4.5K

Journal Article•10.1145/1217299.1217302

L-diversity: Privacy beyond k-anonymity

Ashwin Machanavajjhala, +3 more

- 01 Mar 2007

- ACM Transactions on Knowledge Discovery ...

TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.

...read moreread less

4.3K