We present a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision. This is enabled by 1) task-relevant autonomy, which guides exploration towards object interactions and prevents stagnation near goal states, 2) efficient policy learning by leveraging basic task knowledge in behavior priors, and 3) formulating generic rewards that combine human-interpretable semantic information with low-level, fine-grained observations. We demonstrate that our approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks, obtaining an average success rate of 80% across tasks, a 3-4 improvement over existing approaches. Videos can be found at https://continual-mobile-manip.github.io/

本研究解决了移动操控领域缺乏有效自主学习框架的问题。通过引入任务相关的自主性、行为先验的知识以及通用奖励机制，我们提出了一种新方法，使得机器人能够在没有大量外部设备或人工监督的情况下自我提升。研究表明，该方法使得Spot机器人在四项复杂的移动操控任务中平均成功率达到80%，相较于现有方法提升了3-4个百分点。

通过自主实世界强化学习持续改进移动操控