In recent years, there has been a significant focus on research related to text-guided image inpainting. However, the task remains challenging due to several constraints, such as ensuring alignment between the image and the text, and maintaining consistency in distribution between corrupted and uncorrupted regions. In this paper, thus, we propose a dual affine transformation generative adversarial network (DAFT-GAN) to maintain the semantic consistency for text-guided inpainting. DAFT-GAN integrates two affine transformation networks to combine text and image features gradually for each decoding block. Moreover, we minimize information leakage of uncorrupted features for fine-grained image generation by encoding corrupted and uncorrupted regions of the masked image separately. Our proposed model outperforms the existing GAN-based models in both qualitative and quantitative assessments with three benchmark datasets (MS-COCO, CUB, and Oxford) for text-guided image inpainting.

本研究解决了文本引导图像修复中图像与文本之间对齐以及破损区域与未破损区域分布一致性的问题。提出的双仿射变换生成对抗网络（DAFT-GAN）通过逐步结合文本和图像特征，保持语义一致性，并通过分别编码破损和未破损区域来最小化信息泄露。该模型在MS-COCO、CUB和Oxford三套基准数据集上，在定性和定量评估中优于现有的基于GAN的模型。

双仿射变换生成对抗网络用于文本引导的图像修复