In RL, memory models such as RNNs and transformers address Partially
Observable Markov Decision Processes (POMDPs) by mapping trajectories to latent
Markov states. Neither model scales particularly well to long sequences,
especially compared to an emerging class of memory models sometimes called
linear recurrent models. We discover that the recurrent update of these models
is a monoid, leading us to formally define a novel memory monoid framework. We
revisit the traditional approach to batching in recurrent RL, highlighting both
theoretical and empirical deficiencies. Leveraging the properties of memory
monoids, we propose a new batching method that improves sample efficiency,
increases the return, and simplifies the implementation of recurrent loss
functions in RL.

强化学习中，使用记忆模型如 RNN 和 transformers 来处理部分可观测的马尔科夫决策过程（POMDPs），但这些模型在处理长序列时无法很好地扩展，与一种新兴的线性循环模型相比，其性能较差。我们发现这些模型的循环更新是一个幺半群，从而正式定义了一种新颖的记忆幺半群框架。我们重新审视了强化学习中循环网络的传统批处理方法，突出了理论和实证上的不足。利用记忆幺半群的特性，我们提出了一种新的批处理方法，以改善样本效率，提高回报以及简化强化学习中循环损失函数的实现。