In this paper, we demonstrate the benefits of using memory augmented Large Language Model (LLM) architecture in improving the recall abilities of facts from a potentially long context. As a case study we test LARIMAR, a recently proposed LLM architecture which augments a LLM decoder with an external associative memory, on several long-context recall tasks, including passkey and needle-in-the-haystack tests. We demonstrate that the external memory can be adapted at test time to handle contexts much longer than those seen during training, while keeping readouts from the memory recognizable to the trained decoder and without increasing GPU memory footprint. Compared to alternative architectures for long-context recall tasks with models of a comparable parameter count, LARIMAR is able to maintain strong performance without any task-specific training.

本文展示了使用增强记忆的大型语言模型（LLM）架构在提高从潜在长上下文中召回事实的能力方面的好处。我们以LARIMAR为案例研究，它是最近提出的一种LLM架构，通过在LLM解码器上增加外部关联内存来增强性能，并在几个长上下文召回任务中进行测试，包括密码测试和大海捞针测试。我们证明了测试时可以适应比训练中观察到的更长上下文，同时保持经过训练的解码器可以识别的内存读出结果，而不增加GPU内存占用。与参数数量相近的长上下文召回任务的其他替代架构相比，LARIMAR可以在没有任何特定任务训练的情况下保持强大的性能。

基于内存的大型语言模型中的针头引线