Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.

开发有效的提示对于生成高质量图像的强大能力具有挑战性，因此本研究提出了PromptMagician，一个视觉分析系统，通过推荐模型和多层次可视化来帮助用户探索和优化生成图像的输入提示。研究通过用户研究和专家访谈证明了该系统的有效性和可用性，从而改善生成图文模型的创造力支持。

PromptMagician：文本到图像创作的交互式提示工程