BriefGPT.xyz
Jun, 2022
通过鼓励一致的基于梯度解释来改善视觉定位
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations
HTML
PDF
Ziyan Yang, Kushal Kafle, Franck Dernoncourt, Vicente Ordóñez Román
TL;DR
Attention Mask Consistency是一种基于边缘的损失函数,在视觉语言模型预训练中作用使得梯度基础的解释与区域级别注释保持一致,并且比依赖于明确训练对象检测器的区域级注释的模型产生更优秀的视觉定位性能。
Abstract
We propose a margin-based loss for
vision-language model
pretraining that encourages
gradient-based explanations
that are consistent with region-level annotations. We refer to this objective as
→