In value-based deep reinforcement learning with replay memories, the batch
size parameter specifies how many transitions to sample for each gradient
update. Although critical to the learning process, this value is typically not
adjusted when proposing new algorithms. In this work we present a broad
empirical study that suggests {\em reducing} the batch size can result in a
number of significant performance gains; this is surprising, as the general
tendency when training neural networks is towards larger batch sizes for
improved performance. We complement our experimental findings with a set of
empirical analyses towards better understanding this phenomenon.

在价值导向的深度强化学习中，回放记忆中的批大小参数指定了每次梯度更新要采样多少转换。尽管在提出新算法时通常不会调整此值，但它对于学习过程非常关键。在这项工作中，我们进行了一项广泛的实证研究，表明减小批大小可能导致许多显著的性能提升；这令人惊讶，因为训练神经网络时一般倾向于使用较大的批大小以获得改进的性能。我们通过一系列经验分析来补充我们的实验结果，以更好地理解这种现象。