BriefGPT.xyz
Sep, 2024
语言模型中的不可忘记泛化
Unforgettable Generalization in Language Models
HTML
PDF
Eric Zhang, Leshem Chosen, Jacob Andreas
TL;DR
本研究探讨了语言模型在通过随机标签微调“忘记”技能后的行为变化,揭示了不同任务中遗忘的普遍性和内容依赖性。研究发现,遗忘的有效性与模型在训练数据上的初始任务预测信心以及表示的变异性相关,且即使在遗忘后线性探测器也能可靠执行任务,这表明针对特定技能的遗忘极具挑战性和不确定性。
Abstract
When
Language Models
(LMs) are trained to forget (or "unlearn'') a skill, how precisely does their behavior change? We study the behavior of
Transformer
LMs in which tasks have been forgotten via
→