We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which engineering a scripted controller would be laborious. Our experiments indicate that our reinforcement and imitation agent achieves significantly better performances than agents trained with reinforcement learning or imitation learning alone. We also illustrate that these policies, trained with large visual and dynamics variations, can achieve preliminary successes in zero-shot sim2real transfer. A brief visual description of this work can be viewed in https://youtu.be/EDl8SQUNjj0

该研究提出了一种模型无关的深度强化学习方法，利用少量的演示数据来协助强化学习代理。作者将该方法应用于机器人操作任务并训练了端到端的视觉-动力学策略，直接从RGB相机输入到关节速度。实验结果表明，与仅使用强化学习或模仿学习训练代理的结果相比，作者的强化和模仿代理取得了显著的性能提高。此外，这些训练有素的策略在模拟到现实世界的零样本情况下也能获得初步的成功。

针对多样化视觉动作技能的强化学习和模仿学习