BriefGPT.xyz
Apr, 2024
SyncMask:时尚中心化视觉-语言预训练的同步注意屏蔽
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
HTML
PDF
Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon, Shunghyun Choi, Yeong Hyeon Gu
TL;DR
通过生成准确定位信息在图像和文本中同时出现的图像块和单词标记的掩码,SyncMask解决了时尚数据集中图像和文本之间信息不匹配的问题,并在时尚数据集中的三个下游任务中表现出优秀的性能。
Abstract
vision-language models
(VLMs) have made significant strides in cross-modal understanding through large-scale paired datasets. However, in
fashion domain
, datasets often exhibit a disparity between the information
→