AbstractPolicy gradient (PG) methods are successful approaches to deal with
continuous reinforcement learning (RL) problems. They learn stochastic parametric (hyper)policies by either exploring in the space of actions or in the space of parameters.
→