BriefGPT.xyz
Feb, 2023
Paparazzi:深入探究语言和视觉模型在观点描述中的能力
Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions
HTML
PDF
Henrik Voigt, Jan Hombeck, Monique Meuschke, Kai Lawonn, Sina Zarrieß
TL;DR
本论文研究了CLIP模型在3D环境下对物体视角描述和识别中的表现以及对少量可用训练数据条件下的硬负采样和随机对比进行微调。
Abstract
Existing language and
vision models
achieve impressive performance in image-text understanding. Yet, it is an open question to what extent they can be used for
language understanding
in
→