BriefGPT.xyz
Jul, 2022
FashionViL:面向时尚的视觉与语言表征学习
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
HTML
PDF
Xiao Han, Licheng Yu, Xiatian Zhu, Li Zhang, Yi-Zhe Song...
TL;DR
本文提出了FashionViL,一个针对时尚领域的视觉语言(V+L)表征学习框架,包含两个周到设计的预训练任务:多视角对比学习和伪属性分类学习,以及一个基于Transformer的灵活多用途模型架构,将其广泛适用于各种V+L任务,并在5个下游任务上取得了最佳成果。
Abstract
Large-scale Vision-and-Language (V+L) pre-training for
representation learning
has proven to be effective in boosting various downstream V+L tasks. However, when it comes to the
fashion
domain, existing V+L metho
→