Everyday tasks of long-horizon and comprising a sequence of multiple implicit
subtasks still impose a major challenge in offline robot control. While a
number of prior methods aimed to address this setting with variants of
imitation and offline reinforcement learning, the learned behavior is typically
narrow and often struggles to reach configurable long-horizon goals. As both
paradigms have complementary strengths and weaknesses, we propose a novel
hierarchical approach that combines the strengths of both methods to learn
task-agnostic long-horizon policies from high-dimensional camera observations.
Concretely, we combine a low-level policy that learns latent skills via
imitation learning and a high-level policy learned from offline reinforcement
learning for skill-chaining the latent behavior priors. Experiments in various
simulated and real robot control tasks show that our formulation enables
producing previously unseen combinations of skills to reach temporally extended
goals by "stitching" together latent skills through goal chaining with an
order-of-magnitude improvement in performance upon state-of-the-art baselines.
We even learn one multi-task visuomotor policy for 25 distinct manipulation
tasks in the real world which outperforms both imitation learning and offline
reinforcement learning techniques.

本研究提出了一种层次化的方法，将模仿学习和离线强化学习的优点相结合，学习从高维相机观察中获得与任务无关的长时程策略，并通过技能链接来合并潜在的行为先验，以达到以前未见的技能组合，从而更好地控制机器人的实验结果。