Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline
datasets to enhance the agent's generalization ability on unseen tasks.
However, the context shift problem arises due to the distribution discrepancy
between the contexts used for training (from the behavior policy) and testing
(from the exploration policy). The context shift problem leads to incorrect
task inference and further deteriorates the generalization ability of the
meta-policy. Existing OMRL methods either overlook this problem or attempt to
mitigate it with additional information. In this paper, we propose a novel
approach called Context Shift Reduction for OMRL (CSRO) to address the context
shift problem with only offline datasets. The key insight of CSRO is to
minimize the influence of policy in context during both the meta-training and
meta-test phases. During meta-training, we design a max-min mutual information
representation learning mechanism to diminish the impact of the behavior policy
on task representation. In the meta-test phase, we introduce the non-prior
context collection strategy to reduce the effect of the exploration policy.
Experimental results demonstrate that CSRO significantly reduces the context
shift and improves the generalization ability, surpassing previous methods
across various challenging domains.

使用离线数据集，提出了一种名为 CSRO 的新方法来解决上下文转换问题，该方法在元训练和元测试阶段都能显著减少上下文转换，并提高了泛化能力。