In value-based deep reinforcement learning with replay memories, the batch
size parameter specifies how many transitions to sample for each gradient
update. Although critical to the learning process, this value is typically not
adjusted when proposing new algorithms. In this work we present a broad
empirical study that suggests {\em reducing} the batch size can result in a
number of significant performance gains; this is surprising, as the general
tendency when training neural networks is towards larger batch sizes for
improved performance. We complement our experimental findings with a set of
empirical analyses towards better understanding this phenomenon.

在价值导向的深度强化学习中，回放记忆中的批大小参数指定了每次梯度更新要采样多少转换。尽管在提出新算法时通常不会调整此值，但它对于学习过程非常关键。在这项工作中，我们进行了一项广泛的实证研究，表明减小批大小可能导致许多显著的性能提升；这令人惊讶，因为训练神经网络时一般倾向于使用较大的批大小以获得改进的性能。我们通过一系列经验分析来补充我们的实验结果，以更好地理解这种现象。

小批次深度强化学习

Small batch deep reinforcement learning

Deep Reinforcement Learning agents often suffer from catastrophic forgetting,
forgetting previously found solutions in parts of the input space when training
on new data. Replay Memories are a common solution to the problem,
decorrelating and shuffling old and new training samples. They naively store
state transitions as they come in, without regard for redundancy. We introduce
a novel cognitive-inspired replay memory approach based on the
Grow-When-Required (GWR) self-organizing network, which resembles a map-based
mental model of the world. Our approach organizes stored transitions into a
concise environment-model-like network of state-nodes and transition-edges,
merging similar samples to reduce the memory size and increase pair-wise
distance among samples, which increases the relevancy of each sample. Overall,
our paper shows that map-based experience replay allows for significant memory
reduction with only small performance decreases.

该研究采用基于心理认知的重新记忆策略，通过构建一个基于地图的经验重播存储库，减少了记忆体的大小，并增加了样本之间的相关性，从而有效地解决了深度增强学习代理在处理新数据时可能出现的忘记先前解决方案的问题。