BriefGPT.xyz
Apr, 2024
PLLaVA:基于图像到视频的无参数LLaVA扩展用于视频密集字幕
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
HTML
PDF
Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng...
TL;DR
通过引入一种简单但有效的汇聚策略,本文将图像-语言预训练模型应用于视频理解任务,并在问题回答和字幕生成等基准测试上取得了最新的最佳表现。
Abstract
vision-language pre-training
has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for
video-related tasks
demands exceptionally large computatio
→