This paper presents a LoRA-free method for stylized image generation that takes a text prompt and style reference images as inputs and produces an output image in a single pass. Unlike existing methods that rely on training a separate LoRA for each style, our method can adapt to various styles with a unified model. However, this poses two challenges: 1) the prompt loses controllability over the generated content, and 2) the output image inherits both the semantic and style features of the style reference image, compromising its content fidelity. To address these challenges, we introduce StyleAdapter, a model that comprises two components: a two-path cross-attention module (TPCA) and three decoupling strategies. These components enable our model to process the prompt and style reference features separately and reduce the strong coupling between the semantic and style information in the style references. StyleAdapter can generate high-quality images that match the content of the prompts and adopt the style of the references (even for unseen styles) in a single pass, which is more flexible and efficient than previous methods. Experiments have been conducted to demonstrate the superiority of our method over previous works.

该研究提出了一种无需LoRA的方法，用于风格化图像生成，通过使用文本提示和风格参考图像作为输入，以单次传递生成输出图像。使用统一模型，该方法能够适应多种风格，但面临两个挑战：1）提示在生成内容上失去可控性，2）输出图像同时继承了风格参考图像的语义和风格特征，从而损害其内容的保真度。为解决这些挑战，作者提出了StyleAdapter，该模型由两个组件组成：双路径交叉注意力模块（TPCA）和三个解耦策略。这些组件使模型能够分别处理提示和风格参考特征，并减少风格参考中语义和风格信息之间的强耦合。StyleAdapter能够以单次传递生成与提示内容匹配且采用参考的风格的高质量图像，相比以前的方法更灵活高效。实验证明了我们方法的优越性。

StyleAdapter：一种适用于风格化图像生成的单通道无LORA模型