Gaël Gendron, Qiming Bao, Michael Witbrock, Gillian Dobbie
TL;DR本论文评估了最新的Large Language Models在抽象推理任务上的表现,并发现它们相比于其他自然语言处理任务的表现非常有限。作者探讨了这种差异的原因,并提出了一个新的基准,用于评估自然语言处理中的抽象推理任务。
Abstract
large language models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for