Both generative adversarial networks (GAN) in unsupervised learning and actor-critic methods in reinforcement learning (RL) have gained a reputation for being difficult to optimize. Practitioners in both fields have amassed a large number of strategies to mitigate these instabilities and improve training. Here we show that GANs can be viewed as actor-critic methods in an environment where the actor cannot affect the reward. We review the strategies for stabilizing training for each class of models, both those that generalize between the two and those that are particular to that model. We also review a number of extensions to GANs and RL algorithms with even more complicated information flow. We hope that by highlighting this formal connection we will encourage both GAN and RL communities to develop general, scalable, and stable algorithms for multilevel optimization with deep networks, and to draw inspiration across communities.

本文就生成对抗网络与强化学习算法中的优化困难性问题进行了探索，指出两类算法在训练过程中的不稳定性问题，以及缓解这些问题的策略，并将GAN视为一种无法影响奖励的actor-critic方法。希望此理论联系能够激发GAN和RL社区开发具有通用性、可扩展性和稳定性的深度网络算法，并促进两个社区之间的创新灵感。

连接生成对抗网络和演员-评论家方法