Oct, 2023
开源大型语言模型的安全性:对齐是否真的能防止滥用?
On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?
Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin...
TL;DR通过直接操纵开放源代码的大型语言模型的生成过程,我们展示了它们容易被引导生成不受欢迎的内容,包括有害或有偏见信息甚至私人数据,这表明需要更先进的开源语言模型缓解策略。