Text-conditioned image generation models are a prevalent use of AI image synthesis, yet intuitively controlling output guided by an artist remains challenging. Current methods require multiple images and textual prompts for each object to specify them as concepts to generate a single customized image. On the other hand, our work, \verb|DiffMorph|, introduces a novel approach that synthesizes images that mix concepts without the use of textual prompts. Our work integrates a sketch-to-image module to incorporate user sketches as input. \verb|DiffMorph| takes an initial image with conditioning artist-drawn sketches to generate a morphed image. We employ a pre-trained text-to-image diffusion model and fine-tune it to reconstruct each image faithfully. We seamlessly merge images and concepts from sketches into a cohesive composition. The image generation capability of our work is demonstrated through our results and a comparison of these with prompt-based image generation.

使用条件化文本的图像生成模型是一种广泛应用于人工智能图像合成的方法，但直观地通过艺术家的引导控制输出仍然具有挑战性。当前的方法需要多个图像和文本提示来指定每个对象作为概念以生成一张定制的图像。与此不同的是，我们的工作'DiffMorph'引入了一种新的方法，它在不使用文本提示的情况下合成混合概念的图像。我们的工作将绘制的草图作为输入，采用了一种草图到图像模块进行整合，以生成变形后的图像。我们使用预训练的文本到图像扩散模型并对其进行微调以忠实地重构每个图像。我们将从草图中合理地融合图像和概念，展示了我们的工作的图像生成能力，并将其与基于提示的图像生成进行比较。

DiffMorph：基于扩散模型的无文本图像变形