Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse - a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities.

最好的语言模型仍然在幻影现象方面存在困难：生成的事实不正确，这妨碍了它们在训练期间可靠地检索到的信息；我们将逆序诅咒重新界定为因子化诅咒-模型在不同的因子化下学习相同联合分布的失败；通过一系列的受控实验，包括我们引入的模拟知识密集的微调任务的WikiReversal，我们发现因子化诅咒是流行大型语言模型中使用的下一个标记预测目标的固有失败；此外，我们证明可靠的信息检索无法通过规模、反向标记甚至朴素的双向注意力训练来解决；因此，在专门数据上进行微调的各种方法在下游任务上必然会产生不同的结果，除非模型已经看到正确的令牌序列；在五个不同复杂程度的任务中，我们的结果揭示了一个有希望的前进路径：因子化不可知目标可以显著减轻逆序诅咒，并暗示了改进的知识存储和计划能力。

因子化诅咒：预测逆转诅咒及更多的标记