There exist a number of reinforcement learning algorithms which learnby
climbing the gradient of expected reward. Their long-runconvergence has been
proved, even in partially observableenvironments with non-deterministic
actions, and without the need fora system model. However, the var