BriefGPT.xyz
Feb, 2024
对LLM的越狱攻击进行全面评估
Comprehensive Assessment of Jailbreak Attacks Against LLMs
HTML
PDF
Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes...
TL;DR
对大型语言模型(也称为LLMs)的滥用进行了研究,发现存在越过社会伦理道德保障的破解攻击,相关研究呈现了不同的破解方法和违规类别,展示了破解提示的攻击效果,以及破解攻击与模型之间的转移性。这一研究强调了对不同破解方法进行评估的必要性,为未来研究提供了启示,并为从业者评估破解攻击提供了基准工具。
Abstract
Misuse of the
large language models
(LLMs) has raised widespread concern. To address this issue,
safeguards
have been taken to ensure that LLMs align with social ethics. However, recent findings have revealed an
→