In multi-task reinforcement learning, it is possible to improve the data
efficiency of training agents by transferring knowledge from other different
but related tasks. Because the experiences from different tasks are usually
biased toward the specific task goals. Traditional methods rely on
Kullback-Leibler regularization to stabilize the transfer of knowledge from one
task to the others. In this work, we explore the direction of replacing the
Kullback-Leibler divergence with a novel Optimal transport-based
regularization. By using the Sinkhorn mapping, we can approximate the Optimal
transport distance between the state distribution of tasks. The distance is
then used as an amortized reward to regularize the amount of sharing
information. We experiment our frameworks on several grid-based navigation
multi-goal to validate the effectiveness of the approach. The results show that
our added Optimal transport-based rewards are able to speed up the learning
process of agents and outperforms several baselines on multi-task learning.

通过使用 Sinkhorn 映射来替换 Kullback-Leibler 散度，进一步提高多任务强化学习的数据效率，并通过实验证明新增的基于最优传输的奖励可以加速智能体的学习过程，优于多任务学习中的几个基准模型。