Intelligent control of robotic arms has huge potential over the coming years, but as of now will often fail to adapt when presented with new and unfamiliar environments. Recent trends to solve this problem have seen a shift to end-to-end solutions using deep reinforcement learning to learn policies from visual input, rather than relying on a handcrafted, modular pipeline. Building upon the recent success of deep Q-networks, we present an approach which uses three-dimensional simulations to train a 7-DOF robotic arm in a robot arm control task without any prior knowledge. Policies accept images of the environment as input and output motor actions. However, the high-dimensionality of the policies as well as the large state space makes policy search difficult. This is overcome by ensuring interesting states are explored via intermediate rewards that guide the policy towards higher reward states. Our results demonstrate that deep Q-networks can be used to learn policies for a task that involves locating a cube, grasping, and then finally lifting. The agent is able to learn to deal with a range of starting joint configurations and starting cube positions when tested in simulation. Moreover, we show that policies trained via simulation have the potential to be directly applied to real-world equivalents without any further training. We believe that robot simulations can decrease the dependency on physical robots and ultimately improve productivity of training robot control tasks.

该研究旨在使用深度强化学习算法，通过在模拟环境中训练机器人臂完成定位和抓取方块的任务，进而实现在真实场景下机器人控制的无缝转移，并设计了结构化奖励函数以提高训练效率。

深度 Q-Learning 机器人臂控制的 3D 模拟