This study presents NewsBench, a novel benchmark framework developed to evaluate the capability of Large Language Models (LLMs) in Chinese Journalistic Writing Proficiency (JWP) and their Safety Adherence (SA), addressing the gap between journalistic ethics and the risks associated with AI utilization. Comprising 1,267 tasks across 5 editorial applications, 7 aspects (including safety and journalistic writing with 4 detailed facets), and spanning 24 news topics domains, NewsBench employs two GPT-4 based automatic evaluation protocols validated by human assessment. Our comprehensive analysis of 11 LLMs highlighted GPT-4 and ERNIE Bot as top performers, yet revealed a relative deficiency in journalistic ethic adherence during creative writing tasks. These findings underscore the need for enhanced ethical guidance in AI-generated journalistic content, marking a step forward in aligning AI capabilities with journalistic standards and safety considerations.

该研究提出了NewsBench，这是一个新颖的基准框架，用于评估大型语言模型（LLMs）在中文新闻写作能力（JWP）和安全性遵循（SA）方面的能力，填补了新闻伦理和人工智能利用风险之间的差距。通过对11个LLM的综合分析，发现GPT-4和ERNIE Bot表现最佳，但在创造性写作任务中存在相对不足的新闻道德遵从性。这些发现强调了在AI生成的新闻内容中增强伦理指导的必要性，是将AI能力与新闻标准和安全考虑相一致的一步。

NewsBench：用于中国新闻编辑应用的判断写作能力和安全遵循的LLM系统性评估