In many real-world applications, reinforcement learning (RL) agents might
have to solve multiple tasks, each one typically modeled via a reward function.
If reward functions are expressed linearly, and the agent has previously
learned a set of policies for different tasks, successor features (SFs) can be
exploited to combine such policies and identify reasonable solutions for new
problems. However, the identified solutions are not guaranteed to be optimal.
We introduce a novel algorithm that addresses this limitation. It allows RL
agents to combine existing policies and directly identify optimal policies for
arbitrary new problems, without requiring any further interactions with the
environment. We first show (under mild assumptions) that the transfer learning
problem tackled by SFs is equivalent to the problem of learning to optimize
multiple objectives in RL. We then introduce an SF-based extension of the
Optimistic Linear Support algorithm to learn a set of policies whose SFs form a
convex coverage set. We prove that policies in this set can be combined via
generalized policy improvement to construct optimal behaviors for any new
linearly-expressible tasks, without requiring any additional training samples.
We empirically show that our method outperforms state-of-the-art competing
algorithms both in discrete and continuous domains under value function
approximation.

介绍了一种基于 SF 的新算法，它允许 RL 代理结合现有策略，并在任意新问题上直接识别出最优策略，无需进一步与环境进行交互。该算法可通过广义策略改进将策略组合形成最优行为，且性能优于现有竞争算法。

基于乐观的线性支持和继承特征的最优策略转移

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

We propose Deep Optimistic Linear Support Learning (DOL) to solve
high-dimensional multi-objective decision problems where the relative
importances of the objectives are not known a priori. Using features from the
high-dimensional inputs, DOL computes the convex coverage set containing all
potential optimal solutions of the convex combinations of the objectives. To
our knowledge, this is the first time that deep reinforcement learning has
succeeded in learning multi-objective policies. In addition, we provide a
testbed with two experiments to be used as a benchmark for deep multi-objective
reinforcement learning.

文章提出 DOL 方法，通过使用高维输入的特征，计算包含所有潜在最优解的凸组合集，解决了高维多目标决策问题，并提供了一个包含两个实验的基准测试平台用于深度多目标强化学习。