A core ambition of reinforcement learning (RL) is the creation of agents
capable of rapid learning in novel tasks. Meta-RL aims to achieve this by
directly learning such agents. One category of meta-RL methods, called black
box methods, does so by training off-the-shelf sequence models end-to-end. In
contrast, another category of methods have been developed that explicitly infer
a posterior distribution over the unknown task. These methods generally have
distinct objectives and sequence models designed to enable task inference, and
so are known as task inference methods. However, recent evidence suggests that
task inference objectives are unnecessary in practice. Nonetheless, it remains
unclear whether task inference sequence models are beneficial even when task
inference objectives are not. In this paper, we present strong evidence that
task inference sequence models are still beneficial. In particular, we
investigate sequence models with permutation invariant aggregation, which
exploit the fact that, due to the Markov property, the task posterior does not
depend on the order of data. We empirically confirm the advantage of
permutation invariant sequence models without the use of task inference
objectives. However, we also find, surprisingly, that there are multiple
conditions under which permutation variance remains useful. Therefore, we
propose SplAgger, which uses both permutation variant and invariant components
to achieve the best of both worlds, outperforming all baselines on continuous
control and memory environments.

通过研究表明，即使在没有任务推断目标的情况下，任务推断序列模型仍然是有益的，并提出了 SplAgger 方法，通过使用置换变体和不变体组件，以在连续控制和记忆环境中胜过所有基线。