We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an {\alpha}-policy that stands as a locally superior policy. By adaptively modifying the {\alpha} value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https://github.com/SonSang/gippo.

通过将可微分环境的解析梯度与Proximal Policy Optimization（PPO）算法相结合，我们引入了一种新颖的策略学习方法。通过自适应修改alpha值，我们可以有效管理学习过程中解析策略梯度的影响，并提出了评估解析梯度方差和偏差的度量标准，在检测到高方差或偏差时减少对这些梯度的依赖。我们的方法在函数优化、物理模拟和交通控制环境等各种场景中胜过基准算法。

梯度信息启发式近端策略优化