Statistical database

Topic Tools

Papers published on a yearly basis

Papers

Book Chapter•10.1007/978-3-540-79228-4_1•

Differential privacy: a survey of results

[...]

Cynthia Dwork¹•Institutions (1)

Microsoft¹

25 Apr 2008

TL;DR: This survey recalls the definition of differential privacy and two basic techniques for achieving it, and shows some interesting applications of these techniques, presenting algorithms for three specific tasks and three general results on differentially private learning.

...read moreread less

Abstract: Over the past five years a new approach to privacy-preserving data analysis has born fruit [13, 18, 7, 19, 5, 37, 35, 8, 32]. This approach differs from much (but not all!) of the related literature in the statistics, databases, theory, and cryptography communities, in that a formal and ad omnia privacy guarantee is defined, and the data analysis techniques presented are rigorously proved to satisfy the guarantee. The key privacy guarantee that has emerged is differential privacy. Roughly speaking, this ensures that (almost, and quantifiably) no risk is incurred by joining a statistical database. In this survey, we recall the definition of differential privacy and two basic techniques for achieving it. We then show some interesting applications of these techniques, presenting algorithms for three specific tasks and three general results on differentially private learning.

...read moreread less

4,225 citations

Proceedings Article•10.1145/773153.773173•

Revealing information while preserving privacy

[...]

Irit Dinur¹, Kobbi Nissim¹•Institutions (1)

Princeton University¹

9 Jun 2003

TL;DR: A polynomial reconstruction algorithm of data from noisy (perturbed) subset sums and shows that in order to achieve privacy one has to add perturbation of magnitude (Ω√n).

...read moreread less

Abstract: We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an n-bit string d1,..,dn, with a query being a subset q ⊆ [n] to be answered by Σieqdi. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset sums. Applying this reconstruction algorithm to statistical databases we show that in order to achieve privacy one has to add perturbation of magnitude (Ω√n). That is, smaller perturbation always results in a strong violation of privacy. We show that this result is tight by exemplifying access algorithms for statistical databases that preserve privacy while adding perturbation of magnitude O(√n).For time-T bounded adversaries we demonstrate a privacypreserving access algorithm whose perturbation magnitude is ≈ √T.

...read moreread less

1,278 citations

Journal Article•10.1145/76894.76895•

Security-control methods for statistical databases: a comparative study

[...]

Nabil R. Adam¹, John C. Worthmann²•Institutions (2)

Rutgers University¹, Eindhoven University of Technology²

01 Dec 1989-ACM Computing Surveys

TL;DR: This paper recommends directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem.

...read moreread less

Abstract: This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Security-control methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation.Criteria for evaluating the performance of the various security-control methods are identified. Security-control methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamic-online statistical databases is also presented.To date no single security-control method prevents both exact and partial disclosures. There are, however, a few perturbation-based methods that prevent exact disclosure and enable the database administrator to exercise "statistical disclosure control." Some of these methods, however introduce bias into query responses or suffer from the 0/1 query-set-size problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1).We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem. Furthermore, efforts directed toward developing a bias-correction mechanism and solving the general problem of small query-set-size would help salvage a few of the current perturbation-based methods.

...read moreread less

1,145 citations

Proceedings Article•10.1145/1065167.1065184•

Practical privacy: the SuLQ framework

[...]

Avrim Blum¹, Cynthia Dwork², Frank McSherry², Kobbi Nissim³•Institutions (3)

Carnegie Mellon University¹, Microsoft², Ben-Gurion University of the Negev³

13 Jun 2005

TL;DR: This work considers a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries, and modify the privacy analysis to real-valued functions f and arbitrary row types, greatly improving the bounds on noise required for privacy.

...read moreread less

Abstract: We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true answer is ΣieSf(di), and a noisy version is released as the response to the query. Results of Dinur, Dwork, and Nissim show that a strong form of privacy can be maintained using a surprisingly small amount of noise -- much less than the sampling error -- provided the total number of queries is sublinear in the number of database rows. We call this query and (slightly) noisy reply the SuLQ (Sub-Linear Queries) primitive. The assumption of sublinearity becomes reasonable as databases grow increasingly large.We extend this work in two ways. First, we modify the privacy analysis to real-valued functions f and arbitrary row types, as a consequence greatly improving the bounds on noise required for privacy. Second, we examine the computational power of the SuLQ primitive. We show that it is very powerful indeed, in that slightly noisy versions of the following computations can be carried out with very few invocations of the primitive: principal component analysis, k means clustering, the Perceptron Algorithm, the ID3 algorithm, and (apparently!) all algorithms that operate in the in the statistical query learning model [11].

...read moreread less

1,062 citations

Book Chapter•10.1007/978-3-540-28628-8_32•

Privacy-Preserving Datamining on Vertically Partitioned Databases

[...]

Cynthia Dwork, Kobbi Nissim

15 Aug 2004

TL;DR: Under a rigorous definition of breach of privacy, Dinur and Nissim proved that unless the total number of queries is sub-linear in the size of the database, a substantial amount of noise is required to avoid a breach, rendering the database almost useless.

...read moreread less

Abstract: In a recent paper Dinur and Nissim considered a statistical database in which a trusted database administrator monitors queries and introduces noise to the responses with the goal of maintaining data privacy [5]. Under a rigorous definition of breach of privacy, Dinur and Nissim proved that unless the total number of queries is sub-linear in the size of the database, a substantial amount of noise is required to avoid a breach, rendering the database almost useless.

...read moreread less

456 citations

...

Expand

Year	Papers
2021	3
2020	1
2019	5
2018	7
2017	9
2016	4

Topic Tools

Papers published on a yearly basis

Papers

Differential privacy: a survey of results

Revealing information while preserving privacy

Security-control methods for statistical databases: a comparative study

Practical privacy: the SuLQ framework

Privacy-Preserving Datamining on Vertically Partitioned Databases

Related Topics (5)

Performance Metrics