BriefGPT.xyz
May, 2024
跨模态上下文学习实现多模态生成
Multi-modal Generation via Cross-Modal In-Context Learning
HTML
PDF
Amandeep Kumar, Muzammal Naseer, Sanath Narayan, Rao Muhammad Anwer, Salman Khan...
TL;DR
本研究提出了一种利用大型语言模型和扩散模型的多模式生成方法(MGCC),通过在LLM嵌入空间中显式学习文本和图像之间的跨模式依赖关系以及生成特定于多物体场景的对象边界框,实现了从复杂的多模式提示序列中生成新图像的能力,并在两个基准数据集上进行了实验验证。
Abstract
In this work, we study the problem of generating novel images from complex
multimodal prompt sequences
. While existing methods achieve promising results for
text-to-image generation
, they often struggle to captur
→