Novel advanced policy gradient (APG) algorithms, such as proximal policy
optimization (PPO), trust region policy optimization, and their variations,
have become the dominant reinforcement learning (RL) algorithms because of
their ease of implementation and good practical performance. T