Offline meta reinforcement learning (OMRL) aims to learn transferrable knowledge from offline datasets to facilitate the learning process for new target tasks. Context-based RL employs a context encoder to rapidly adapt the agent to new tasks by inferring about the task representation, and then adjusting the acting policy based on the inferred task representation. Here we consider context-based OMRL, in particular, the issue of task representation learning for OMRL. We empirically demonstrate that the context encoder trained on offline datasets could suffer from distribution shift between the contexts used for training and testing. To tackle this issue, we propose a hard sampling based strategy for learning a robust task context encoder. Experimental results, based on distinct continuous control tasks, demonstrate that the utilization of our technique results in more robust task representations and better testing performance in terms of accumulated returns, compared with baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/HS-OMRL.

本文介绍了离线元强化学习（OMRL）的上下文基础，特别是针对OMRL的任务表示学习问题。我们提出了一种硬采样的策略来学习一个强大的任务上下文编码器，实验结果表明，与基线方法相比，在多个不同的连续控制任务中，使用我们的技术可以得到更强壮的任务表示和更好的测试性能。

论离线元强化学习任务表示学习中的上下文分布转移