The ability to follow instructions is crucial to Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating superficial response quality, which does not necessarily indicate instruction-following capability. To fill this research gap, in this paper, we propose FollowBench, a Multi-level Fine-grained Constraints Following Benchmark for LLMs. FollowBench comprehensively includes five different types (i.e., Content, Scenario, Style, Format, and Example) of fine-grained constraints. To enable a precise constraint following estimation, we introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each level. To evaluate whether LLMs' outputs have satisfied every individual constraint, we propose to prompt strong LLMs with constraint evolution paths to handle challenging semantic constraints. By evaluating nine closed-source and open-source popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work. The data and code are publicly available at https://github.com/YJiangcm/FollowBench.

本文提出了FollowBench，这是一个用于大型语言模型（LLMs）的多级细粒度约束遵循基准，通过评估LLMs在遵循不同类型的细粒度约束方面的表现，揭示了LLMs在遵循指令方面的不足之处，并指出了未来研究的潜在方向。

FollowBench：用于大型语言模型的多级细粒度约束追踪基准