Jul, 2024

多约束复杂指令跟踪的基准测试

TL;DRLLMs' ability to follow complex instructions composed of multiple constraints is evaluated using ComplexBench, a new benchmark that exposes deficiencies in existing models.