BriefGPT.xyz
Nov, 2022
自毁模型:增加基础模型有害双重用途的成本
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models
HTML
PDF
Eric Mitchell, Peter Henderson, Christopher D. Manning, Dan Jurafsky, Chelsea Finn
TL;DR
该研究提出了一种名为「任务屏蔽」的新的训练范式,使用元学习和对抗学习的技术训练出一种自毁机制的基础模型来预防对有害任务的适应,降低其潜在风险。
Abstract
A growing ecosystem of large, open-source
foundation models
has reduced the labeled data and technical expertise necessary to apply machine learning to many new problems. Yet
foundation models
pose a clear
→