BriefGPT.xyz
Dec, 2021
视觉语言理解的蒸馏双编码模型
Distilled Dual-Encoder Model for Vision-Language Understanding
HTML
PDF
Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin...
TL;DR
提出了一种跨模态关注蒸馏框架来训练双编码器模型,以用于视觉语言理解任务,如视觉推理和视觉问答,并证明使用这种框架可以在保持比融合编码器模型更快的推理速度的同时实现竞争性的性能表现。
Abstract
We propose a
cross-modal attention distillation
framework to train a
dual-encoder model
for
vision-language understanding
tasks, such as <
→