With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is currently lacking. To address this gap, our work delves into an extensive examination of the privacy analysis of FL when used for training LLMs, both from theoretical and practical perspectives. In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations. Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets. Additionally, we conduct thorough experiments to evaluate the privacy leakage of these models when data is protected by state-of-the-art differential privacy (DP) mechanisms.

我们的研究对联邦学习在训练大规模语言模型时的隐私分析进行了广泛的研究，从理论和实践角度设计了两种具有理论成功率的主动成员推断攻击，揭示了包括BERT、RoBERTa、DistilBERT和OpenAI的GPT在多个真实世界的语言数据集中存在的重大隐私漏洞，并评估了这些模型在采用最先进的差分隐私机制保护数据时的隐私泄漏情况。

联邦大型语言模型中的隐私泄漏分析