Reinforcement learning has shown promise in learning policies that can solve
complex problems. However, manually specifying a good reward function can be
difficult, especially for intricate tasks. Inverse reinforcement learning
offers a useful paradigm to learn the underlying reward function directly from
expert demonstrations. Yet in reality, the corpus of demonstrations may contain
trajectories arising from a diverse set of underlying reward functions rather
than a single one. Thus, in inverse reinforcement learning, it is useful to
consider such a decomposition. The options framework in reinforcement learning
is specifically designed to decompose policies in a similar light. We therefore
extend the options framework and propose a method to simultaneously recover
reward options in addition to policy options. We leverage adversarial methods
to learn joint reward-policy options using only observed expert states. We show
that this approach works well in both simple and complex continuous control
tasks and shows significant performance increases in one-shot transfer
learning.

本文介绍了一种以对手生成网络为基础的新方法，以同时恢复反演强化学习中的奖励和策略选项，用于解决从专家演示中学习复杂任务的奖励函数。该方法在简单和复杂的连续控制任务中表现良好，展示了一次转移学习中的显著性能提高。