Learning state representations has gained steady popularity in reinforcement
learning (RL) due to its potential to improve both sample efficiency and
returns on many environments. A straightforward and efficient method is to
generate representations with a distinct neural network trained on an auxiliary
task, i.e. a task that differs from the actual RL task. While a whole range of
such auxiliary tasks has been proposed in the literature, a comparison on
typical continuous control benchmark environments is computationally expensive
and has, to the best of our knowledge, not been performed before. This paper
presents such a comparison of common auxiliary tasks, based on hundreds of
agents trained with state-of-the-art off-policy RL algorithms. We compare
possible improvements in both sample efficiency and returns for environments
ranging from simple pendulum to a complex simulated robotics task. Our findings
show that representation learning with auxiliary tasks is beneficial for
environments of higher dimension and complexity, and that learning environment
dynamics is preferable to predicting rewards. We believe these insights will
enable other researchers to make more informed decisions on how to utilize
representation learning for their specific problem.

生成表示在强化学习中得到了稳步流行，由于其在提高样本效率和许多环境中的回报方面的潜力。本文对常见的辅助任务进行了比较，基于数百个使用最先进的离策略强化学习算法训练的代理程序。发现显示，辅助任务的表示学习对于维度和复杂度较高的环境是有利的，并且学习环境动态性胜于预测奖励。我们相信这些洞察将使其他研究人员能够更明智地决定如何利用表示学习解决他们的特定问题。