Fashionable image generation aims to synthesize images of diverse fashion prevalent around the globe, helping fashion designers in real-time visualization by giving them a basic customized structure of how a specific design preference would look in real life and what further improvements can be made for enhanced customer satisfaction. Moreover, users can alone interact and generate fashionable images by just giving a few simple prompts. Recently, diffusion models have gained popularity as generative models owing to their flexibility and generation of realistic images from Gaussian noise. Latent diffusion models are a type of generative model that use diffusion processes to model the generation of complex data, such as images, audio, or text. They are called "latent" because they learn a hidden representation, or latent variable, of the data that captures its underlying structure. We propose a method exploiting the equivalence between diffusion models and energy-based models (EBMs) and suggesting ways to compose multiple probability distributions. We describe a pipeline on how our method can be used specifically for new fashionable outfit generation and virtual try-on using LLM-guided text-to-image generation. Our results indicate that using an LLM to refine the prompts to the latent diffusion model assists in generating globally creative and culturally diversified fashion styles and reducing bias.

该研究提出了使用基于能量的模型（EBMs）的扩散模型的等价性，建议组成多个概率分布的方法，并提出了一个通过 LLM 引导文本生成图像生成新时尚装配和虚拟试穿的流程，其结果表明使用 LLM 生成具有全球创意和文化多样性的时尚风格并减少偏见。

使用LLMs和潜在扩散模型进行互动式时尚内容生成