BriefGPT.xyz
Feb, 2024
潘多拉的白箱:开放式LLM中训练数据泄露的增加
Pandora's White-Box: Increased Training Data Leakage in Open LLMs
HTML
PDF
Jeffrey G. Wang, Jason Wang, Marvin Li, Seth Neel
TL;DR
本研究对开源大型语言模型的隐私攻击进行了系统研究,提出了威胁预训练和微调模型的成员推断攻击方法,并展示了近乎完美的攻击效果,强调了在进行高度敏感数据的微调和部署之前应当十分谨慎。
Abstract
In this paper we undertake a systematic study of
privacy attacks
against
open source
large language models
(LLMs), where an adversary has
→