With the rapid development of Large Language Models (LLMs), increasing
attention has been paid to their safety concerns. Consequently, evaluating the
safety of LLMs has become an essential task for facilitating the broad
applications of LLMs. Nevertheless, the absence of comprehensive safety
evaluation benchmarks poses a significant impediment to effectively assess and
enhance the safety of LLMs. In this work, we present SafetyBench, a
comprehensive benchmark for evaluating the safety of LLMs, which comprises
11,435 diverse multiple choice questions spanning across 7 distinct categories
of safety concerns. Notably, SafetyBench also incorporates both Chinese and
English data, facilitating the evaluation in both languages. Our extensive
tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot
settings reveal a substantial performance advantage for GPT-4 over its
counterparts, and there is still significant room for improving the safety of
current LLMs. We believe SafetyBench will enable fast and comprehensive
evaluation of LLMs' safety, and foster the development of safer LLMs. Data and
evaluation guidelines are available at this https URL
Submission entrance and leaderboard are available at
this https URL

以 SafetyBench 为基础，该研究设计了一个综合评估大型语言模型安全性的基准测试工具，包括 11435 个不同类别的问题，并能够提供两种语言（中文和英文）的评估结果，通过测试发现 GPT-4 在安全性方面相较于其他模型表现突出，但当前大型语言模型的安全性仍有提升空间。