BriefGPT.xyz
Jan, 2024
基于多样化指令的可控生成大型语言模型的基准测试
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
HTML
PDF
Yihan Chen, Benfeng Xu, Quan Wang, Yi Liu, Zhendong Mao
TL;DR
我们提出了一个新的基准测试CoDI-Eval,系统和全面评估LLMs对带有各种约束的指令的响应,揭示了它们在按照特定约束执行指令方面的局限性和开源与闭源LLMs之间存在显著差距。
Abstract
While
large language models
(LLMs) have exhibited impressive
instruction-following capabilities
, it is still unclear whether and to what extent they can respond to explicit constraints that might be entailed in v
→