We investigate the capabilities of transformer large language models (LLMs) on relational reasoning tasks involving abstract symbols. Such tasks have long been studied in the neuroscience literature as fundamental building blocks for more complex abilities in programming, mathematics, and verbal reasoning. For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an "inverse scaling law": transformers fail to generalize as their embedding dimension increases. For both settings (i) and (ii), we propose subtle transformer modifications which can reduce the amount of data needed by adding two trainable parameters per head.

调查了Transformer大型语言模型在涉及抽象符号的关系推理任务中的能力。对于(i)回归任务，我们证明了Transformer在训练时具有泛化性，但需要大量的训练数据；对于具有符号标签的(ii)下一个令牌预测任务，我们展示了一种“反比例尺律”：随着嵌入维度的增加，Transformer无法泛化。针对(i)和(ii)这两种情况，我们提出了微妙的Transformer修改，通过每个头部添加两个可训练参数来减少所需的数据量。

变形金刚何时能够通过抽象符号进行推理？