BriefGPT.xyz
Jul, 2024
元优化角度边界对比框架用于视频-语言表示学习
Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
HTML
PDF
Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu...
TL;DR
通过引入对比目标及角默损失函数,采用多层感知机参数化的权重函数和引入大型视觉语言模型生成的视频-文本数据,我们改进了视频-语言表示,并在常用视频问答和文本-视频检索数据集上取得了优越性能。
Abstract
data quality
stands at the forefront of deciding the effectiveness of
video-language representation learning
. However, video-text pairs in previous data typically do not align perfectly with each other, which mig
→