In this work, we target the task of text-driven style transfer in the context of text-to-image (T2I) diffusion models. The main challenge is consistent structure preservation while enabling effective style transfer effects. The past approaches in this field directly concatenate the content and style prompts for a prompt-level style injection, leading to unavoidable structure distortions. In this work, we propose a novel solution to the text-driven style transfer task, namely, Adaptive Style Incorporation~(ASI), to achieve fine-grained feature-level style incorporation. It consists of the Siamese Cross-Attention~(SiCA) to decouple the single-track cross-attention to a dual-track structure to obtain separate content and style features, and the Adaptive Content-Style Blending (AdaBlending) module to couple the content and style information from a structure-consistent manner. Experimentally, our method exhibits much better performance in both structure preservation and stylized effects.

本研究提出了一种创新的文本驱动风格转移任务解决方案，名为自适应风格融合（ASI），通过Siamese Cross-Attention（SiCA）、Adaptive Content-Style Blending（AdaBlending）模块来实现细粒度的特征级风格融合，并在结构保持和风格化效果方面展现出更好的性能。

结构一致的文本驱动风格迁移中的自适应风格融合