BriefGPT.xyz
May, 2021
带正则化的政策镜像下降算法:具有线性收敛的广义框架
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
HTML
PDF
Wenhao Zhan, Shicong Cen, Baihe Huang, Yuxin Chen, Jason D. Lee...
TL;DR
提出了一种广义的策略镜像下降算法 (GPMD) 以解决正则化强化学习问题,具有线性收敛特性,支持一般类别的凸正则化器,并在数值实验中得到验证。
Abstract
policy optimization
, which learns the policy of interest by maximizing the value function via large-scale optimization techniques, lies at the heart of modern
reinforcement learning
(RL). In addition to value max
→