proximal policy optimization (PPO) is one of the most successful deep
reinforcement-learning methods, achieving state-of-the-art performance across a
wide range of challenging tasks. However, its optimization behavior is still
far from being fully understood. In this paper, we show tha