Vision-based robotic cloth unfolding has made great progress recently. However, prior works predominantly rely on value learning and have not fully explored policy-based techniques. Recently, the success of reinforcement learning on the large language model has shown that the policy gradient algorithm can enhance policy with huge action space. In this paper, we introduce ClothPPO, a framework that employs a policy gradient algorithm based on actor-critic architecture to enhance a pre-trained model with huge 10^6 action spaces aligned with observation in the task of unfolding clothes. To this end, we redefine the cloth manipulation problem as a partially observable Markov decision process. A supervised pre-training stage is employed to train a baseline model of our policy. In the second stage, the Proximal Policy Optimization (PPO) is utilized to guide the supervised model within the observation-aligned action space. By optimizing and updating the strategy, our proposed method increases the garment's surface area for cloth unfolding under the soft-body manipulation task. Experimental results show that our proposed framework can further improve the unfolding performance of other state-of-the-art methods.

本文介绍了基于视觉的机器人布料展开的研究，引入了一个基于策略梯度算法和演员-评论家架构的框架ClothPPO，通过优化和更新策略，提高了软体操作任务下布料展开的表现。实验结果表明，我们的方法可以进一步改善其他最先进方法的展开性能。

ClothPPO：一个增强机器人布料操纵的近端策略优化框架，使用与观测对齐的动作空间