Hung Le, Chinnadhurai Sankar, Seungwhan Moon, Ahmad Beirami, Alborz Geramifard...
TL;DR本文介绍了 DVD 数据集,使用该数据集分析现有方法并提供有趣的见解,探讨视频对话系统的能力与局限性,并为不同类型的空间时间推理注释详细信息,该数据集明确旨在减少模型可能利用的偏见。
Abstract
A video-grounded dialogue system is required to understand both dialogue, which contains semantic dependencies from turn to turn, and video, which contains visual cues of spatial and temporal scene variations. Bu