vision-and-language navigation (VLN) has witnessed significant advancements
in recent years, largely attributed to meticulously curated datasets and
proficiently trained models. Nevertheless, when tested in diverse environments,
the trained models inevitably encounter significant shift