Various data augmentation techniques have been recently proposed in image-based deep reinforcement learning (DRL). Although they empirically demonstrate the effectiveness of data augmentation for improving sample efficiency or generalization, which technique should be preferred is not always clear. To tackle this question, we analyze existing methods to better understand them and to uncover how they are connected. Notably, by expressing the variance of the Q-targets and that of the empirical actor/critic losses of these methods, we can analyze the effects of their different components and compare them. We furthermore formulate an explanation about how these methods may be affected by choosing different data augmentation transformations in calculating the target Q-values. This analysis suggests recommendations on how to exploit data augmentation in a more principled way. In addition, we include a regularization term called tangent prop, previously proposed in computer vision, but whose adaptation to DRL is novel to the best of our knowledge. We evaluate our proposition and validate our analysis in several domains. Compared to different relevant baselines, we demonstrate that it achieves state-of-the-art performance in most environments and shows higher sample efficiency and better generalization ability in some complex environments.

各种数据增强技术在基于图像的深度强化学习中已被提出。通过分析现有方法，我们可以更好地理解它们及其不同组件的效果，并确定如何更好地利用数据增强。此分析表明了如何通过选择不同的数据增强变换来计算目标 Q 值，以更加有原则地利用数据增强。此外，我们提出了一种名为切线传递（tangent prop）的正则化项，在若干领域中验证了我们的论点，并且与不同基准模型相比，在大多数环境中取得了最先进的性能，并在某些复杂环境中展现了更高的样本效率和更好的泛化能力。

深度强化学习中的数据增强再探