Learning state representations has gained steady popularity in reinforcement learning (RL) due to its potential to improve both sample efficiency and returns on many environments. A straightforward and efficient method is to generate representations with a distinct neural network trained on an auxiliary task, i.e. a task that differs from the actual RL task. While a whole range of such auxiliary tasks has been proposed in the literature, a comparison on typical continuous control benchmark environments is computationally expensive and has, to the best of our knowledge, not been performed before. This paper presents such a comparison of common auxiliary tasks, based on hundreds of agents trained with state-of-the-art off-policy RL algorithms. We compare possible improvements in both sample efficiency and returns for environments ranging from simple pendulum to a complex simulated robotics task. Our findings show that representation learning with auxiliary tasks is beneficial for environments of higher dimension and complexity, and that learning environment dynamics is preferable to predicting rewards. We believe these insights will enable other researchers to make more informed decisions on how to utilize representation learning for their specific problem.

生成表示在强化学习中得到了稳步流行，由于其在提高样本效率和许多环境中的回报方面的潜力。本文对常见的辅助任务进行了比较，基于数百个使用最先进的离策略强化学习算法训练的代理程序。发现显示，辅助任务的表示学习对于维度和复杂度较高的环境是有利的，并且学习环境动态性胜于预测奖励。我们相信这些洞察将使其他研究人员能够更明智地决定如何利用表示学习解决他们的特定问题。

比较强化学习表征学习的辅助任务