BriefGPT.xyz
Jun, 2024
SIFo基准测试:探索大型语言模型的顺序指令跟随能力
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
HTML
PDF
Xinyi Chen, Baohao Liao, Jirui Qi, Panagiotis Eustratiadis, Christof Monz...
TL;DR
评估大型语言模型(LLMs)遵循多个指令的能力面临诸多挑战,为解决这些问题,我们引入了一个基准测试,通过顺序指令跟踪任务评估模型的遵循多个指令的能力。
Abstract
Following
multiple instructions
is a crucial ability for large language models (LLMs). Evaluating this ability comes with significant challenges: (i) limited
coherence
between
→