Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been particularly successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful manipulation behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, surpassing the state-of-the-art approaches for skill discovery by a large margin.

我们提出了SLIM，一种多批评家学习方法，它通过在演员-评论家框架中优雅地结合多个奖励函数，显著提高了机器人操作的潜在变量技能发现，克服了可能干扰收敛到有用技能的奖励之间的干扰，并展示了在桌面操作中，我们方法在获得安全高效的运动基元方面的适用性，通过规划利用它们，大大超过了技能发现的现有方法。

多批评家技能学习