IJCAIJun, 2024
为什么只用文本:用多模态提示增强视觉与语言导航
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
Haodong Hong, Sen Wang, Zi Huang, Qi Wu, Jiajun Liu
TL;DRVision-and-Language Navigation with Multi-modal Prompts (VLN-MP) integrates natural language and images in instructions, showing improved navigation performance through the use of multi-modal and visual prompts.