We consider an active visual exploration scenario, where an agent must intelligently select its camera motions to efficiently reconstruct the full environment from only a limited set of narrow field-of-view glimpses. While the agent has full observability of the environment during training, it has only partial observability once deployed, being constrained by what portions it has seen and what camera motions are permissible. We introduce sidekick policy learning to capitalize on this imbalance of observability. The main idea is a preparatory learning phase that attempts simplified versions of the eventual exploration task, then guides the agent via reward shaping or initial policy supervision. To support interpretation of the resulting policies, we also develop a novel policy visualization technique. Results on active visual exploration tasks with 360 scenes and 3D objects show that sidekicks consistently improve performance and convergence rates over existing methods. Code, data and demos are available.

本文介绍了一种基于 sidekick policy learning 的活动视觉探索方法，增强智能体在仅有有限视野瞥见的情况下，结合奖励塑形和初始政策监督来指导其选择相机运动，进而更加高效地重建整个环境。通过在 360 场景和 3D 对象上的实验，结果表明，该方法能够在性能和收敛速度上显著提高智能体的表现。

主动视觉探索的副手策略学习