BriefGPT.xyz
Oct, 2022
MaPLe: 多模态提示学习
MaPLe: Multi-modal Prompt Learning
HTML
PDF
Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
TL;DR
本研究提出了多模态提示学习(MaPLe)的方法,旨在通过不同的早期阶段分别学习视觉和语言分支的独立提示,以逐步建模分阶段的特征关系,并促进视觉-语言提示之间的强耦合,以改善CLIP的下游任务结果。结果表明,该方法具有良好的性能和广泛的应用前景。
Abstract
Pre-trained vision-language (V-L) models such as
clip
have shown excellent
generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection
→