BriefGPT.xyz
May, 2023
VSTAR:一个基于视频的对话数据集,用于具有场景和主题转换的情境语义理解
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
HTML
PDF
Yuxuan Wang, Zilong Zheng, Xueliang Zhao, Jinpeng Li, Yueqian Wang...
TL;DR
本文提出了一个基于VSTAR数据集的视频对话理解的基准测试,其中包括场景分割、主题分割和视频对话生成三个基准测试,以验证多模态信息和段落在视频对话理解和生成中的重要性。
Abstract
video-grounded
dialogue understanding
is a challenging problem that requires machine to perceive, parse and reason over situated semantics extracted from weakly aligned video and dialogues. Most existing benchmar
→