Reflecting on the last few years, the biggest breakthroughs in deep reinforcement learning (RL) have been in the discrete action domain. Robotic manipulation, however, is inherently a continuous control environment, but these continuous control reinforcement learning algorithms often depend on actor-critic methods that are sample-inefficient and inherently difficult to train, due to the joint optimisation of the actor and critic. To that end, we explore how we can bring the stability of discrete action RL algorithms to the robot manipulation domain. We extend the recently released ARM algorithm, by replacing the continuous next-best pose agent with a discrete next-best pose agent. Discretisation of rotation is trivial given its bounded nature, while translation is inherently unbounded, making discretisation difficult. We formulate the translation prediction as the voxel prediction problem by discretising the 3D space; however, voxelisation of a large workspace is memory intensive and would not work with a high density of voxels, crucial to obtaining the resolution needed for robotic manipulation. We therefore propose to apply this voxel prediction in a coarse-to-fine manner by gradually increasing the resolution. In each step, we extract the highest valued voxel as the predicted location, which is then used as the centre of the higher-resolution voxelisation in the next step. This coarse-to-fine prediction is applied over several steps, giving a near-lossless prediction of the translation. We show that our new coarse-to-fine algorithm is able to accomplish RLBench tasks much more efficiently than the continuous control equivalent, and even train some real-world tasks, tabular rasa, in less than 7 minutes, with only 3 demonstrations. Moreover, we show that by moving to a voxel representation, we are able to easily incorporate observations from multiple cameras.

使用粗到细的离散化方法，取代不稳定，并且数据效率低的连续机器人学中的角色扮演者-评论者方法， 实现离散增强学习应用。该方法利用最近推出的ARM算法，将连续的下一个最佳姿态代理替换为离散的，采用粗到细的Q-attention方法，学习何时对场景的哪一部分进行缩放，实现对平移空间的近乎无损区分，并允许使用离散行动及深度Q-学习方法。实验表明，这种新的粗到细算法在几个困难的基于视觉的机器人任务上实现了最先进的性能，并且可以在几分钟内训练出现实世界的政策。

粗到细的 Q-attention：通过离散化实现视觉机器人操作的高效学习