Context, the embedding of previous collected trajectories, is a powerful
construct for Meta-Reinforcement Learning (Meta-RL) algorithms. By conditioning
on an effective context, Meta-RL policies can easily generalize to new tasks
within a few adaptation steps. We argue that improving the quality of context
involves answering two questions: 1. How to train a compact and sufficient
encoder that can embed the task-specific information contained in prior
trajectories? 2. How to collect informative trajectories of which the
corresponding context reflects the specification of tasks? To this end, we
propose a novel Meta-RL framework called CCM (Contrastive learning augmented
Context-based Meta-RL). We first focus on the contrastive nature behind
different tasks and leverage it to train a compact and sufficient context
encoder. Further, we train a separate exploration policy and theoretically
derive a new information-gain-based objective which aims to collect informative
trajectories in a few steps. Empirically, we evaluate our approaches on common
benchmarks as well as several complex sparse-reward environments. The
experimental results show that CCM outperforms state-of-the-art algorithms by
addressing previously mentioned problems respectively.

提出一种名为 CCM 的元强化学习框架，通过对比不同任务来训练一个精简有效的上下文编码器，并训练一个单独的探索策略和理论推导一个新的信息增益目标，从而在几步内收集信息丰富的轨迹。实验证明，CCM 通过分别解决之前提到的问题，优于现有算法。