This paper describes an alignment-based model for interpreting natural
language instructions in context. We approach instruction following as a search
over plans, scoring sequences of actions conditioned on structured observations
of text and the environment. By explicitly modeling bot