BriefGPT.xyz
Dec, 2018
相对熵正则化策略迭代
Relative Entropy Regularized Policy Iteration
HTML
PDF
Abbas Abdolmaleki, Jost Tobias Springenberg, Jonas Degrave, Steven Bohez, Yuval Tassa...
TL;DR
我们提出了一种基于离线策略的Actor-Critic算法,结合了随机搜索梯度-free优化和学习的动作价值函数,通过评估参数化动作-价值函数、估计局部非参数化策略和拟合参数化策略的三个步骤,在 31 个连续控制任务中进行对比与实验,并取得了良好的效果。
Abstract
We present an off-policy
actor-critic algorithm
for
reinforcement learning
(RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple
→