Identifying statistical regularities in solutions to some tasks in multi-task
reinforcement learning can accelerate the learning of new tasks. Skill learning
offers one way of identifying these regularities by decomposing pre-collected
experiences into a sequence of skills. A popular approach to skill learning is
maximizing the likelihood of the pre-collected experience with latent variable
models, where the latent variables represent the skills. However, there are
often many solutions that maximize the likelihood equally well, including
degenerate solutions. To address this underspecification, we propose a new
objective that combines the maximum likelihood objective with a penalty on the
description length of the skills. This penalty incentivizes the skills to
maximally extract common structures from the experiences. Empirically, our
objective learns skills that solve downstream tasks in fewer samples compared
to skills learned from only maximizing likelihood. Further, while most prior
works in the offline multi-task setting focus on tasks with low-dimensional
observations, our objective can scale to challenging tasks with
high-dimensional image observations.

研究多任务强化学习中的统计规律对于新任务学习的加速是有效的，而技能学习是实现这一目标的一种方式，技能学习的热门方法是使用潜在变量模型来最大化预收集的经验的可能性，结合应用于描述技能的描述长度惩罚的新目标可以使技能更好地从经验中提取共同结构，并在具有高维图像观察的挑战性任务中进行了验证。