Coverage path planning is the problem of finding the shortest path that
covers the entire free space of a given confined area, with applications
ranging from robotic lawn mowing and vacuum cleaning, to demining and
search-and-rescue tasks. While offline methods can find provably complete, and
in some cases optimal, paths for known environments, their value is limited in
online scenarios where the environment is not known beforehand, especially in
the presence of non-static obstacles. We propose an end-to-end reinforcement
learning-based approach in continuous state and action space, for the online
coverage path planning problem that can handle unknown environments. We
construct the observation space from both global maps and local sensory inputs,
allowing the agent to plan a long-term path, and simultaneously act on
short-term obstacle detections. To account for large-scale environments, we
propose to use a multi-scale map input representation. Furthermore, we propose
a novel total variation reward term for eliminating thin strips of uncovered
space in the learned path. To validate the effectiveness of our approach, we
perform extensive experiments in simulation with a distance sensor, surpassing
the performance of a recent reinforcement learning-based approach.

该研究提出了基于强化学习的、连续状态和动作空间下的在线覆盖路径规划方法，用于处理未知环境的大型区域，并且结合了全局地图和局部感知输入，以及多尺度地图输入表示的观测空间构建，通过提出的全变差奖励，实现了学习路径上无漏洞被覆盖的目标。