In recent years, various powerful policy gradient algorithms have been proposed in deep reinforcement learning. While all these algorithms build on the Policy Gradient Theorem, the specific design choices differ significantly across algorithms. We provide a holistic overview of on-policy policy gradient algorithms to facilitate the understanding of both their theoretical foundations and their practical implementations. In this overview, we include a detailed proof of the continuous version of the Policy Gradient Theorem, convergence results and a comprehensive discussion of practical algorithms. We compare the most prominent algorithms on continuous control environments and provide insights on the benefits of regularization. All code is available at https://github.com/Matt00n/PolicyGradientsJax.

基于政策梯度定理的深度强化学习中，各种强大的政策梯度算法已被提出。本论文提供了对政策梯度算法的整体概述，旨在促进对其理论基础和实际实现的理解，包括连续版本的政策梯度定理的详细证明、收敛性结果以及对实际算法的全面讨论。通过在连续控制环境中比较最重要的算法并提供正则化的好处方面的见解，进一步加强了对主题的认识。

深度强化学习中的政策梯度综合指南：理论、算法与实现