Large Language Models (LLMs) have the capacity to store and recall facts.
Through experimentation with open-source models, we observe that this ability
to retrieve facts can be easily manipulated by changing contexts, even without
altering their factual meanings. These findings highlight that LLMs might
behave like an associative memory model where certain tokens in the contexts
serve as clues to retrieving facts. We mathematically explore this property by
studying how transformers, the building blocks of LLMs, can complete such
memory tasks. We study a simple latent concept association problem with a
one-layer transformer and we show theoretically and empirically that the
transformer gathers information using self-attention and uses the value matrix
for associative memory.

大型语言模型具有存储和提取事实的能力，并且可以通过改变上下文来操纵提取事实的能力，揭示出它们可能像联想记忆模型一样行为，其中上下文中的某些令牌作为提取事实的线索。我们通过研究 transformer 如何完成此类记忆任务，对这一属性进行了数学探索，使用一个简单的单层 transformer 研究了简单的潜在概念关联问题，理论和经验都表明 transformer 使用自注意力来收集信息并使用值矩阵进行联想记忆。

LLM 的潜在概念关联和转换器中的联想记忆

Do LLMs dream of elephants (when told not to)? Latent concept  association and associative memory in transformers

The acquisition and performance of arithmetic skills and basic operations
such as addition, subtraction, multiplication, and division are essential for
daily functioning, and reflect complex cognitive processes. This paper explores
the cognitive mechanisms powering arithmetic learning, presenting a
neurobiologically plausible cognitive architecture that simulates the
acquisition of these skills. I implement a number vectorization embedding
network and an associative memory model to investigate how an intelligent
system can learn and recall arithmetic equations in a manner analogous to the
human brain. I perform experiments that provide insights into the
generalization capabilities of connectionist models, neurological causes of
dyscalculia, and the influence of network architecture on cognitive
performance. Through this interdisciplinary investigation, I aim to contribute
to ongoing research into the neural correlates of mathematical cognition in
intelligent systems.

通过实现一个数学向量化嵌入网络和一个联想记忆模型，本研究探索了支持算术学习的认知机制，以神经生物学可行的认知架构模拟这些技能的习得。通过实验，揭示了联系主义模型的泛化能力、发展计算障碍的神经学原因以及网络架构对认知性能的影响。通过这个跨学科研究，旨在为智能系统中数学认知的神经相关性的继续研究做出贡献。

探索学习算术方程的认知架构

Exploring a Cognitive Architecture for Learning Arithmetic Equations

While Attention has come to be an important mechanism in deep learning, there
remains limited intuition for why it works so well. Here, we show that
Transformer Attention can be closely related under certain data conditions to
Kanerva's Sparse Distributed Memory (SDM), a biologically plausible associative
memory model. We confirm that these conditions are satisfied in pre-trained
GPT2 Transformer models. We discuss the implications of the Attention-SDM map
and provide new computational and biological interpretations of Attention.

该研究发现，在某些数据条件下，Transformer Attention 机制与 Kanerva 的 Sparse Distributed Memory 有密切关联，可以提供 Attention 的新的计算和生物学解释，进一步确认预训练的 GPT2 Transformer 模型已满足这些条件。