BriefGPT.xyz
Feb, 2024
MuLan: 多模态-LLM进化多对象扩散智能体
MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion
HTML
PDF
Sen Li, Ruochen Wang, Cho-Jui Hsieh, Minhao Cheng, Tianyi Zhou
TL;DR
通过渐进式多对象生成、规划和反馈控制,我们开发了一种无需训练的多模态语言模型代理(MuLan),以解决现有文本到图像模型在处理多对象、对象空间位置、相对大小、重叠和属性绑定方面的困难。
Abstract
Existing
text-to-image models
still struggle to generate images of multiple objects, especially in handling their
spatial positions
,
relative siz
→