BriefGPT.xyz
May, 2020
镜像下降策略优化
Mirror Descent Policy Optimization
HTML
PDF
Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh
TL;DR
提出了一种称为镜像下降策略优化(Mirror Descent Policy Optimization,MDPO)的高效强化学习算法,MDPO是一个迭代更新策略的算法,其目标函数由标准强化学习目标的线性化和一个限制连续策略之间接近的接近项组成,是由MD原则推导而来的,同时通过采取多个梯度步骤进行逼近。
Abstract
We propose deep
reinforcement learning
(RL) algorithms inspired by
mirror descent
, a well-known first-order trust region optimization method for solving constrained convex problems. Our approach, which we call as
→