Trust region methods are a popular tool in reinforcement learning as they
yield robust policy updates in continuous and discrete action spaces. However,
enforcing such trust regions in deep reinforcement learning is difficult.
Hence, many approaches, such as Trust Region Policy Optimization (TRPO) and
Proximal Policy Optimization (PPO), are based on approximations. Due to those
approximations, they violate the constraints or fail to find the optimal
solution within the trust region. Moreover, they are difficult to implement,
often lack sufficient exploration, and have been shown to depend on seemingly
unrelated implementation choices. In this work, we propose differentiable
neural network layers to enforce trust regions for deep Gaussian policies via
closed-form projections. Unlike existing methods, those layers formalize trust
regions for each state individually and can complement existing reinforcement
learning algorithms. We derive trust region projections based on the
Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius
norm for Gaussian distributions. We empirically demonstrate that those
projection layers achieve similar or better results than existing methods while
being almost agnostic to specific implementation choices. The code is available
at this https URL

本文提出了可微分的神经网络层来通过闭合形式的投影来执行深度高斯策略的信任区域，为 Gaussian 分布导出了基于 KL 散度、Wasserstein L2 距离和 Frobenius 范数的信任区域投影。实验证明，这些投影层可以实现类似或更好的结果，而且几乎对于具体的实现选择是不敏感的。