BriefGPT.xyz
Jan, 2022
块策略镜像下降
Block Policy Mirror Descent
HTML
PDF
Guanghui Lan, Yan Li, Tuo Zhao
TL;DR
本文提出了一种新的策略梯度方法,即基于块的策略镜像下降(BPMD)方法,用于解决一类带有(强)凸正则化器的强化学习(RL)问题,通过部分更新规则执行已采样状态上的策略更新,从而实现了每次迭代计算代价的降低,并且在分析多种采样方案时达到快速的线性收敛。
Abstract
In this paper, we present a new class of policy gradient (PG) methods, namely the
block policy mirror descent
(BPMD) methods for solving a class of regularized
reinforcement learning
(RL) problems with (strongly)
→