With the profound development of large language models(LLMs), their safety
concerns have garnered increasing attention. However, there is a scarcity of
Chinese safety benchmarks for LLMs, and the existing safety taxonomies are
inadequate, lacking comprehensive safety detection capabilities in authentic
Chinese scenarios. In this work, we introduce CHiSafetyBench, a dedicated
safety benchmark for evaluating LLMs' capabilities in identifying risky content
and refusing answering risky questions in Chinese contexts. CHiSafetyBench
incorporates a dataset that covers a hierarchical Chinese safety taxonomy
consisting of 5 risk areas and 31 categories. This dataset comprises two types
of tasks: multiple-choice questions and question-answering, evaluating LLMs
from the perspectives of risk content identification and the ability to refuse
answering risky questions respectively. Utilizing this benchmark, we validate
the feasibility of automatic evaluation as a substitute for human evaluation
and conduct comprehensive automatic safety assessments on mainstream Chinese
LLMs. Our experiments reveal the varying performance of different models across
various safety domains, indicating that all models possess considerable
potential for improvement in Chinese safety capabilities. Our dataset is
publicly available at
this https URL

该论文介绍了 CHiSafetyBench，这是一个专门用于评估大型语言模型在中文情境中识别危险内容和拒绝回答危险问题能力的安全基准。通过该基准，作者验证了自动评估作为人工评估的替代的可行性，并对主流的中文语言模型进行了全面的自动安全评估。实验表明，不同模型在各个安全领域的性能存在差异，指示了所有模型在中国的安全能力方面有相当大的改进潜力。