TL;DR本研究提出了一种基于强化学习的算法,引入了新颖的多智能体规划模块 MSP 和空间平移变换器 Spatial-TeamFormer,实现了多智能体协作视觉探测,经过政策蒸馏提取的元策略大大提高了最终策略的泛化能力,并在一个真实的 3D 模拟器 Habitat 中表现出比经典规划方法更好的性能。
Abstract
We consider the task of visual indoor exploration with multiple agents, where the agents need to cooperatively explore the entire indoor region using as few steps as possible. Classical planning-based methods often suffer from particularly expensive computation at each inference step and a limited expressiveness of cooperation strategy. By contrast,