In this paper, we study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach. We study this problem in the context of grasping, a longstanding challenge in robotic manipulation. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp, our method enables closed-loop vision-based control, whereby the robot continuously updates its grasp strategy based on the most recent observations to optimize long-horizon grasp success. To that end, we introduce QT-Opt, a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters to perform closed-loop, real-world grasping that generalizes to 96% grasp success on unseen objects. Aside from attaining a very high success rate, our method exhibits behaviors that are quite distinct from more standard grasping systems: using only RGB vision-based perception from an over-the-shoulder camera, our method automatically learns regrasping strategies, probes objects to find the most effective grasps, learns to reposition objects and perform other non-prehensile pre-grasp manipulations, and responds dynamically to disturbances and perturbations.

本文提出了一种基于QT-Opt的可扩展自监督视觉强化学习框架，该框架能够利用超过580k的真实抓取尝试来训练一个具有超过1.2M个参数的深度神经网络Q函数，实现闭环实际抓取并能够推广到96%的机器人抓取任务中，而且不仅实现了极高的成功率，而且通过RGB视觉感知和操纵，自动学习了重抓策略，动态响应干扰与扰动，并能够重新定位物品和执行其他非抓取前的操作。

QT-Opt：基于视觉的机器人操作的可伸缩深度强化学习