Recently, various auxiliary tasks have been proposed to accelerate
representation learning and improve sample efficiency in deep reinforcement
learning (RL). However, existing auxiliary tasks do not take the
characteristics of RL problems into consideration and are unsupervised. By
leveraging returns, the most important feedback signals in RL, we propose a
novel auxiliary task that forces the learnt representations to discriminate
state-action pairs with different returns. Our auxiliary loss is theoretically
justified to learn representations that capture the structure of a new form of
state-action abstraction, under which state-action pairs with similar return
distributions are aggregated together. In low data regime, our algorithm
outperforms strong baselines on complex tasks in Atari games and DeepMind
Control suite, and achieves even better performance when combined with existing
auxiliary tasks.

本研究提出了新的辅助任务，通过回报信号，使得学到的表示区分具有不同回报的状态和动作对，从而可以更好地在 Atari 游戏和 DeepMind 控制套件等复杂任务中进行学习，并在与现有的辅助任务相结合时表现更好。