BriefGPT.xyz
Apr, 2024
密集视频字幕生成与跨模态记忆检索
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
HTML
PDF
Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon, Jinwoo Choi, Seong Tae Kim
TL;DR
通过使用外部记忆库和跨模态视频-文本匹配方法,我们提出了一种新的框架来解决密集视频字幕的挑战,实现了事件定位和事件字幕任务的自动化。实验结果表明,在ActivityNet Captions和YouCook2数据集上,我们的模型表现出良好的性能,无需来自大型视频数据集的大量预训练。
Abstract
There has been significant attention to the research on
dense video captioning
, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing
dense vi
→