BriefGPT.xyz
Feb, 2024
Lissard:长且简单的顺序推理数据集
Lissard: Long and Simple Sequential Reasoning Datasets
HTML
PDF
Mirelle Bueno, Roberto Lotufo, Rodrigo Nogueira
TL;DR
论文介绍了一个基准测试集Lissard,其中包括七个任务,旨在评估模型处理和生成不同长度序列以及需要重复操作的能力。评估结果显示,无论是开源模型(Mistral-7B和Mixtral-8x7B)还是专有模型(GPT-3.5和GPT-4),随着序列复杂性的增加,所有模型的性能都呈现一致下降趋势。
Abstract
language models
are now capable of solving tasks that require dealing with long sequences consisting of hundreds of thousands of tokens. However, they often fail on tasks that require
repetitive use of simple rules
→