Manuel Brack, Felix Friedrich, Dominik Hintersdorf, Lukas Struppek, Patrick Schramowski...
TL;DR本文介绍了一种名为 SEGA 的文本生成图像方法,它能够使用户控制语义方向以生成多样化的高保真图像,并且在多种任务上展现了它的有效性和灵活性。
Abstract
text-to-image diffusion models have recently received a lot of interest for
their astonishing ability to produce high-fidelity images from text only.
However, achieving one-shot generation that aligns with the us