Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization. However, text-to-text privatization is known for degrading the performance of language models when trained on perturbed text. Employing a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text, we intend to disentangle at the linguistic level the distortion induced by differential privacy. Experimental results from a representational similarity analysis indicate that the overall similarity of internal representations is substantially reduced. Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words.

采用一系列解释技术对来自在扰动预文本上训练的BERT的内部表示进行解析，旨在在语言层面上解开差分隐私引起的失真。实验证据表明，内部表示的整体相似度显著降低。使用探测任务来解析这种不相似性，发现文本到文本的隐私处理影响了多种形式的语言能力，编码了单词的局部属性，但在编码单词串的上下文关系方面存在不足。

隐私保护BERT语言能力的解耦