BriefGPT.xyz
Oct, 2023
大规模语言模型中的多语言越狱挑战
Multilingual Jailbreak Challenges in Large Language Models
HTML
PDF
Yue Deng, Wenxuan Zhang, Sinno Jialin Pan, Lidong Bing
TL;DR
大型语言模型(LLMs)存在潜在的安全隐患,因此需要发展预防措施。本研究揭示了LLMs内存在的多语言破解挑战,并针对意外和恶意的风险场景进行了探讨。实验结果显示,在多语言环境中,通过自卫框架进行训练可以显著减少LLMs生成的不安全内容。
Abstract
While
large language models
(LLMs) exhibit remarkable capabilities across a wide range of tasks, they pose potential
safety concerns
, such as the ``jailbreak'' problem, wherein malicious instructions can manipula
→