The design of a reward function often poses a major practical challenge to
real-world applications of reinforcement learning. Approaches such as inverse
reinforcement learning attempt to overcome this challenge, but require expert
demonstrations, which can be difficult or expensive to obtain in practice. We
propose variational inverse control with events (VICE), which generalizes
inverse reinforcement learning methods to cases where full demonstrations are
not needed, such as when only samples of desired goal states are available. Our
method is grounded in an alternative perspective on control and reinforcement
learning, where an agent's goal is to maximize the probability that one or more
events will happen at some point in the future, rather than maximizing
cumulative rewards. We demonstrate the effectiveness of our methods on
continuous control tasks, with a focus on high-dimensional observations like
images where rewards are hard or even impossible to specify.

本文提出了一种新方法 —— 基于事件的变分反控制方法 (VICE)，用于解决控制和强化学习中经常遇到的奖励函数设计困难问题，特别是在只有一些目标状态示例的情况下。我们的方法基于控制和强化学习的另一种视角，即代理目标是最大化未来某个时间点发生一个或多个事件的概率，而不是最大化累积奖励。我们通过高维观测（如图像），演示了我们的方法在连续控制任务上的有效性，其中奖励很难甚至无法指定。