Mar, 2024

JailbreakBench:大型语言模型越狱鲁棒性评估基准

TL;DRJailbreakBench is an open-sourced benchmark for evaluating jailbreak attacks on large language models, addressing challenges such as the lack of a standard evaluation practice, incomparable cost and success rate computation, and lack of reproducibility in existing works.