Large language models have revolutionized the field of NLP by achieving state-of-the-art performance on various tasks. However, there is a concern that these models may disclose information in the training data. In this study, we focus on the summarization task and investigate the membership inference (MI) attack: given a sample and black-box access to a model's API, it is possible to determine if the sample was part of the training data. We exploit text similarity and the model's resistance to document modifications as potential MI signals and evaluate their effectiveness on widely used datasets. Our results demonstrate that summarization models are at risk of exposing data membership, even in cases where the reference summary is not available. Furthermore, we discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.

大型语言模型在自然语言处理领域取得了卓越的性能，但存在信息泄露的担忧。本研究关注总结任务，并研究了成员推断攻击：在对模型的API拥有黑盒访问权限的情况下，能否确定样本是否属于训练数据。我们利用文本相似性和模型对文档修改的抵抗力作为潜在的攻击信号，并评估它们在广泛使用的数据集上的有效性。我们的结果表明，总结模型存在泄露数据成员身份的风险，即使参考摘要不可用。此外，我们讨论了几种保护总结模型免受成员推断攻击的安全防护措施，并讨论了隐私与效用之间的固有权衡。

评估语言模型中的隐私风险：基于概括任务的案例研究