Offline goal-conditioned RL (GCRL) offers a way to train general-purpose
agents from fully offline datasets. In addition to being conservative within
the dataset, the generalization ability to achieve unseen goals is another
fundamental challenge for offline GCRL. However, to the best of our knowledge,
this problem has not been well studied yet. In this paper, we study
out-of-distribution (OOD) generalization of offline GCRL both theoretically and
empirically to identify factors that are important. In a number of experiments,
we observe that weighted imitation learning enjoys better generalization than
pessimism-based offline RL method. Based on this insight, we derive a theory
for OOD generalization, which characterizes several important design choices.
We then propose a new offline GCRL method, Generalizable Offline
goAl-condiTioned RL (GOAT), by combining the findings from our theoretical and
empirical studies. On a new benchmark containing 9 independent identically
distributed (IID) tasks and 17 OOD tasks, GOAT outperforms current
state-of-the-art methods by a large margin.

本文研究了离线目标导向增强学习算法的越界泛化问题，提出了一种基于加权模仿学习的离线学习算法（GOAT），在 9 项独立同分布任务和 17 项越界任务测试中显著优于现有的算法。