We present a modular approach for learning policies for navigation over long
planning horizons from language input. Our hierarchical policy operates at
multiple timescales, where the higher-level master policy proposes subgoals to
be executed by specialized sub-policies. Our choice of subgoals is
compositional and semantic, i.e. they can be sequentially combined in arbitrary
orderings, and assume human-interpretable descriptions (e.g. 'exit room', 'find
kitchen', 'find refrigerator', etc.).
We use imitation learning to warm-start policies at each level of the
hierarchy, dramatically increasing sample efficiency, followed by reinforcement
learning. Independent reinforcement learning at each level of hierarchy enables
sub-policies to adapt to consequences of their actions and recover from errors.
Subsequent joint hierarchical training enables the master policy to adapt to
the sub-policies.
On the challenging EQA (Das et al., 2018) benchmark in House3D (Wu et al.,
2018), requiring navigating diverse realistic indoor environments, our approach
outperforms prior work by a significant margin, both in terms of navigation and
question answering.

该研究提出了一种模块化的方法，利用语言输入学习长期规划的导航策略。他们的分层策略在多个时间尺度上运行，并使用模块化和语义子目标，通过模仿学习和强化学习相结合的方法在 EQA 基准上表现出色，无论是在导航还是问题回答方面均优于前人工作。