We study sim-to-real skill transfer and discovery in the context of robotics control using representation learning. We draw inspiration from spectral decomposition of Markov decision processes. The spectral decomposition brings about representation that can linearly represent the state-action value function induced by any policies, thus can be regarded as skills. The skill representations are transferable across arbitrary tasks with the same transition dynamics. Moreover, to handle the sim-to-real gap in the dynamics, we propose a skill discovery algorithm that learns new skills caused by the sim-to-real gap from real-world data. We promote the discovery of new skills by enforcing orthogonal constraints between the skills to learn and the skills from simulators, and then synthesize the policy using the enlarged skill sets. We demonstrate our methodology by transferring quadrotor controllers from simulators to Crazyflie 2.1 quadrotors. We show that we can learn the skill representations from a single simulator task and transfer these to multiple different real-world tasks including hovering, taking off, landing and trajectory tracking. Our skill discovery approach helps narrow the sim-to-real gap and improve the real-world controller performance by up to 30.2%.

我们研究了使用表示学习进行机器人控制中的从仿真到实际的技能转移和发现。通过从马尔可夫决策过程的谱分解中获得灵感，我们得到了能够线性表示任何策略引发的状态-动作价值函数的表示，因此可以被视为技能。我们提出了一种处理动力学中仿真到实际差距的技能发现算法，从真实世界数据中学习由仿真到实际差距引起的新技能。我们通过强制要求学习的技能与来自仿真器的技能之间具有正交约束，并使用扩展的技能集合综合策略，来促进新技能的发现。我们通过将四旋翼控制器从仿真器转移到Crazyflie 2.1四旋翼上来展示我们的方法。我们展示了我们可以从单个仿真器任务中学习技能表示，并将其转移到包括悬停、起飞、降落和轨迹跟踪在内的多个不同实际任务中。我们的技能发现方法有助于缩小仿真到实际差距，并将实际控制器的性能提高了30.2%。

基于表示的视角下的模拟到现实学习技能转移与发现