The ability of a reinforcement learning (RL) agent to learn about many reward
functions at the same time has many potential benefits, such as the
decomposition of complex tasks into simpler ones, the exchange of information
between tasks, and the reuse of skills. We focus on one aspect
我们提出了一种基于 successor features 和 generalized policy improvement 的转移框架,用于处理奖励函数在不同任务之间变化的情况,并且可以在不同任务之间自由地交换信息,同时具有转移策略的性能保证。在导航任务和控制模拟机械臂中,该方法成功地促进了优化的转移,明显优于其他方法.