Can we use reinforcement learning to learn general-purpose policies that can
perform a wide range of different tasks, resulting in flexible and reusable
skills? Contextual policies provide this capability in principle, but the
representation of the context determines the degree of generalization and
expressivity. Categorical contexts preclude generalization to entirely new
tasks. Goal-conditioned policies may enable some generalization, but cannot
capture all tasks that might be desired. In this paper, we propose goal
distributions as a general and broadly applicable task representation suitable
for contextual policies. Goal distributions are general in the sense that they
can represent any state-based reward function when equipped with an appropriate
distribution class, while the particular choice of distribution class allows us
to trade off expressivity and learnability. We develop an off-policy algorithm
called distribution-conditioned reinforcement learning (DisCo RL) to
efficiently learn these policies. We evaluate DisCo RL on a variety of robot
manipulation tasks and find that it significantly outperforms prior methods on
tasks that require generalization to new goal distributions.

本文提出了一种基于目标分布的通用任务表征方法，通过该方法可以实现针对不同任务的灵活重用技能，并开发了一种离策略算法 (Distribution-Conditioned Reinforcement Learning, DisCo RL) 来高效地学习这些策略。在多种机器人操作任务上的实验表明，该方法显著优于先前的方法，尤其是需要对新目标分布进行泛化的任务。