BriefGPT.xyz
Oct, 2023
开源大型语言模型的安全性:对齐是否真的能防止滥用?
On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?
HTML
PDF
Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin...
TL;DR
通过直接操纵开放源代码的大型语言模型的生成过程,我们展示了它们容易被引导生成不受欢迎的内容,包括有害或有偏见信息甚至私人数据,这表明需要更先进的开源语言模型缓解策略。
Abstract
large language models
(LLMs) have achieved unprecedented performance in Natural Language Generation (NLG) tasks. However, many existing studies have shown that they could be misused to generate
undesired content
.
→