BriefGPT.xyz
Jun, 2024
太多的帧,不全是有用的:长篇视频问答的高效策略
Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA
HTML
PDF
Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryoo, Donghyun Kim...
TL;DR
长篇视频中的关键帧选择和顺序感知字幕生成能够显著减少信息冗余,我们提出的LVNet框架通过两种新的方法在LVQA基准数据集上实现了最先进的性能。
Abstract
long-form videos
that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely-related. Therefore, when performing long-form video question answering (
→