IJCAIJun, 2024

为什么只用文本:用多模态提示增强视觉与语言导航

TL;DRVision-and-Language Navigation with Multi-modal Prompts (VLN-MP) integrates natural language and images in instructions, showing improved navigation performance through the use of multi-modal and visual prompts.