An important open question in computational neuroscience is how various spatially tuned neurons, such as place cells, are used to support the learning of reward-seeking behavior of an animal. Existing computational models either lack biological plausibility or fall short of behavioral flexibility when environments change. In this paper, we propose a computational theory that achieves behavioral flexibility with better biological plausibility. We first train a mixture of Gaussian distributions to model the ensemble of firing fields of place cells. Then we propose a Hebbian-like rule to learn the synaptic strength matrix among place cells. This matrix is interpreted as the transition rate matrix of a continuous time Markov chain to generate the sequential replay of place cells. During replay, the synaptic strengths from place cells to medium spiny neurons (MSN) are learned by a temporal-difference like rule to store place-reward associations. After replay, the activation of MSN will ramp up when an animal approaches the rewarding place, so the animal can move along the direction where the MSN activation is increasing to find the rewarding place. We implement our theory into a high-fidelity virtual rat in the MuJoCo physics simulator. In a complex maze, the rat shows significantly better learning efficiency and behavioral flexibility than a rat that implements a neuroscience-inspired reinforcement learning algorithm, deep Q-network.

本文提出了一种计算理论，通过使用混合高斯分布模拟场所细胞的活动，并通过连续时间马尔科夫链的转移概率矩阵。通过学习环境与奖励的关联来提高老鼠行为的灵活性，实验结果表明，相较于深度Q网络处理的神经启发式强化学习算法，该理论可以显著提高老鼠学习效率和行为灵活性。

使用位置细胞学习灵活奖励寻求行为的计算理论