BriefGPT.xyz
Sep, 2024
针对上下文学习的成员推断攻击
Membership Inference Attacks Against In-Context Learning
HTML
PDF
Rui Wen, Zheng Li, Michael Backes, Yang Zhang
TL;DR
本研究解决了上下文学习在适应大型语言模型(LLM)时面临的隐私攻击漏洞问题。我们提出了首个专门针对上下文学习的成员推断攻击方法,结果表明,相比现有的基于概率的攻击,我们的攻击在大多数情况下能准确判断成员状态,准确率可达95%。此外,我们还探讨了组合防御策略以增强隐私保护的可能性。
Abstract
Adapting
Large Language Models
(LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as
In-Context Learning
(ICL). However, the vulnerabil
→