Multitask Reinforcement Learning (MTRL) approaches have gained increasing
attention for its wide applications in many important Reinforcement Learning
(RL) tasks. However, while recent advancements in MTRL theory have focused on
the improved statistical efficiency by assuming a shared structure across
tasks, exploration--a crucial aspect of RL--has been largely overlooked. This
paper addresses this gap by showing that when an agent is trained on a
sufficiently diverse set of tasks, a generic policy-sharing algorithm with
myopic exploration design like $\epsilon$-greedy that are inefficient in
general can be sample-efficient for MTRL. To the best of our knowledge, this is
the first theoretical demonstration of the "exploration benefits" of MTRL. It
may also shed light on the enigmatic success of the wide applications of myopic
exploration in practice. To validate the role of diversity, we conduct
experiments on synthetic robotic control environments, where the diverse task
set aligns with the task selection by automatic curriculum learning, which is
empirically shown to improve sample-efficiency.

通过在多个任务上进行训练，可以证明多任务强化学习（MTRL）中基于共享结构的一般性策略共享算法具有适用于样本高效率的 myopic 探索设计，这是首次理论证明了 MTRL 的 “探索优势”。而多样性任务集的验证实验证明任务选择与自动课程学习相一致，在合成机器人控制环境的实验中改善了样本效率。