BriefGPT.xyz
Mar, 2024
通过代码探索大型语言模型的安全泛化挑战
Exploring Safety Generalization Challenges of Large Language Models via Code
HTML
PDF
Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan...
TL;DR
通过将自然语言输入转化为代码输入,CodeAttack框架揭示了大型语言模型的安全泛化性问题,并发现了代码领域中的新安全风险,需要更健壮的安全对齐算法来匹配大型语言模型的代码功能。
Abstract
The rapid advancement of
large language models
(
llms
) has brought about remarkable capabilities in natural language processing but also raised concerns about their potential misuse. While strategies like supervis
→