We describe a new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input. Our approach is based on the conditional version of GANs and expands on previous work leveraging an auxiliary task in the discriminator. Our generated images are not limited to certain classes and do not suffer from mode collapse while semantically matching the text input. A key to our training methods is how to form positive and negative training examples with respect to the class label of a given image. Instead of selecting random training examples, we perform negative sampling based on the semantic distance from a positive example in the class. We evaluate our approach using the Oxford-102 flower dataset, adopting the inception score and multi-scale structural similarity index (MS-SSIM) metrics to assess discriminability and diversity of the generated images. The empirical results indicate greater diversity in the generated images, especially when we gradually select more negative training examples closer to a positive example in the semantic space.

本研究提出了一种新的方法，改进了生成对抗网络（GANs）训练的能力，可以根据文本输入合成多样的图像，这种方法基于条件版本的GANs，扩展了前人利用判别器中的辅助任务，通过负样本采样来构造积极和消极的训练样例，通过牛津102花卉数据集的实验结果表明，生成的图像更具多样性，特别是当负样本逐渐靠近语义空间中的积极样本时。

文本到图像合成中的语义关联对抗学习