Natural language processing models have experienced a significant upsurge in recent years, with numerous applications being built upon them. Many of these applications require fine-tuning generic base models on customized, proprietary datasets. This fine-tuning data is especially likely to contain personal or sensitive information about individuals, resulting in increased privacy risk. Membership inference attacks are the most commonly employed attack to assess the privacy leakage of a machine learning model. However, limited research is available on the factors that affect the vulnerability of language models to this kind of attack, or on the applicability of different defense strategies in the language domain. We provide the first systematic review of the vulnerability of fine-tuned large language models to membership inference attacks, the various factors that come into play, and the effectiveness of different defense strategies. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.

自然语言处理模型在最近几年中经历了显著的提升，其上已建立了许多应用。然而，这些应用中许多需要在定制的专有数据集上对通用基础模型进行微调，这些微调数据往往含有个人或敏感信息，增加了隐私风险。本研究首次系统回顾了大型自然语言处理模型在成员推理攻击方面的脆弱性，整理了影响这种攻击脆弱性的各种因素以及不同防御策略的有效性。研究表明，某些训练方法能显著降低隐私风险，其中差分隐私和低秩适配器的组合在保护隐私方面效果最好。

SoK: 降低Fine-tuned语言模型对成员推断攻击的脆弱性