Non-autoregressive generative transformers recently demonstrated impressive image generation performance, and orders of magnitude faster sampling than their autoregressive counterparts. However, optimal parallel sampling from the true joint distribution of visual tokens remains an open challenge. In this paper we introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer. Given a masked-and-reconstructed real image, the Token-Critic model is trained to distinguish which visual tokens belong to the original image and which were sampled by the generative transformer. During non-autoregressive iterative sampling, Token-Critic is used to select which tokens to accept and which to reject and resample. Coupled with Token-Critic, a state-of-the-art generative transformer significantly improves its performance, and outperforms recent diffusion models and GANs in terms of the trade-off between generated image quality and diversity, in the challenging class-conditional ImageNet generation.

该论文介绍了Token-Critic，一个辅助模型，用于指导非自回归生成变压器的采样，该模型可选择要接受和要拒绝和重新采样的token，结合Token-Critic，生成变压器在ImageNet生成方面优于最近的扩散模型和GANs，可以在生成图像质量和多样性之间取得良好的平衡。

使用 Token-Critic 改进的遮蔽图像生成