Disentangling the Linguistic Competence of Privacy-Preserving BERT

doi:10.48550/arxiv.2310.11363

Journal Article10.48550/arxiv.2310.11363

Disentangling the Linguistic Competence of Privacy-Preserving BERT

Stefan Arnold, +2 more

- 17 Oct 2023

- arXiv.org

- Vol. abs/2310.11363

TL;DR: Evidence is found that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words.

Abstract: Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization. However, text-to-text privatization is known for degrading the performance of language models when trained on perturbed text. Employing a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text, we intend to disentangle at the linguistic level the distortion induced by differential privacy. Experimental results from a representational similarity analysis indicate that the overall similarity of internal representations is substantially reduced. Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 3: Layer-wise probing results for BERT under public (blue circles) and private (orange squares) training modalities. Surface properties according to Adi et al. (2016) are depicted in Figures 3(a), 3(b), and 3(c). Syntactic properties according to Tenney et al. (2019b) are depicted in Figures 3(d), 3(e), and 3(f). Semantic properties according to Tenney et al. (2019b) are depicted in Figures 3(g), 3(h), 3(i), and 3(j). Structural properties according to Hewitt and Manning (2019) are depicted in Figures 3(k) and 3(l).

Figure 1: Interval-wise learning progress of BERT from 26, 903, 298 chunks generated from Wikipedia.

Figure 2: Layer-wise representational similarity of BERT for 5, 000 samples randomly drawn from WikiText.

Table 1: Example chunk (truncated) from Wikipedia privatized with different privacy budgets. Highlighted words represent a mismatch between the original word and the surrogate word after privatization.

Figure 4: Divergence-based clustering of attention maps extracted from 1, 000 random samples of WikiText.

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Proceedings Article

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

27.5K

•Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019

- arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

26.2K

•Book Chapter•10.1007/11681878_14

Calibrating noise to sensitivity in private data analysis

Cynthia Dwork, +3 more

- 04 Mar 2006

TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

...read moreread less

8.9K

...

Expand