Journal Article10.48550/arxiv.2310.11363
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Stefan Arnold,Nils Kemmerzell,Annika Schreiner +2 more
TL;DR: Evidence is found that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words.
read more
Abstract: Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization. However, text-to-text privatization is known for degrading the performance of language models when trained on perturbed text. Employing a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text, we intend to disentangle at the linguistic level the distortion induced by differential privacy. Experimental results from a representational similarity analysis indicate that the overall similarity of internal representations is substantially reduced. Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 3: Layer-wise probing results for BERT under public (blue circles) and private (orange squares) training modalities. Surface properties according to Adi et al. (2016) are depicted in Figures 3(a), 3(b), and 3(c). Syntactic properties according to Tenney et al. (2019b) are depicted in Figures 3(d), 3(e), and 3(f). Semantic properties according to Tenney et al. (2019b) are depicted in Figures 3(g), 3(h), 3(i), and 3(j). Structural properties according to Hewitt and Manning (2019) are depicted in Figures 3(k) and 3(l). 
Figure 1: Interval-wise learning progress of BERT from 26, 903, 298 chunks generated from Wikipedia. 
Figure 2: Layer-wise representational similarity of BERT for 5, 000 samples randomly drawn from WikiText. 
Table 1: Example chunk (truncated) from Wikipedia privatized with different privacy budgets. Highlighted words represent a mismatch between the original word and the surrogate word after privatization. 
Figure 4: Divergence-based clustering of attention maps extracted from 1, 000 random samples of WikiText.
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
•Proceedings Article
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov,Kai Chen,Greg S. Corrado,Jeffrey Dean +3 more
- 16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
27.5K
•Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Calibrating noise to sensitivity in private data analysis
Cynthia Dwork,Frank McSherry,Kobbi Nissim,Adam Smith +3 more
- 04 Mar 2006
TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.