Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we exploit the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images). We do this by employing an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) gets rendered images as input. We show experimentally on a range of simulated tasks that using these asymmetric inputs significantly improves performance. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real world transfer without training on any real world data.

本研究使用深度强化学习的actor-critic算法，利用物理模拟器的完全状态可观测性，针对机器人操作中的部分观测（RGBD图像）的问题进行训练，通过使用不对称输入来显著提高性能，并使用领域随机化的方法，实现了在没有真实世界数据的情况下，在真实机器人上进行的模拟到真实世界的转移。

基于图像的机器人学习的非对称演员-评论家算法