BriefGPT.xyz
Oct, 2022
视频对话生成中的多模态语义图协同推理
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation
HTML
PDF
Xueliang Zhao, Yuxuan Wang, Chongyang Tao, Chenshuo Wang, Dongyan Zhao
TL;DR
本文研究了基于视频对话生成,提出一种方法,可以将视频数据集成到预训练语言模型中,通过多模态推理实现各种模态之间的互补信息,实验结果表明,该模型能够在自动和人工评估方面显著优于现有的最先进模型。
Abstract
We study
video-grounded dialogue generation
, where a response is generated based on the dialogue context and the associated video. The primary challenges of this task lie in (1) the difficulty of integrating video data into
→