BriefGPT.xyz
Dec, 2017
深度原始-对偶强化学习:利用贝尔曼对偶加速演员-评论家算法
Deep Primal-Dual Reinforcement Learning: Accelerating Actor-Critic using Bellman Duality
HTML
PDF
Woon Sang Cho, Mengdi Wang
TL;DR
基于深度神经网络的参数Primal-Dual pi学习方法,旨在解决马尔可夫决策过程中状态空间大和策略离线学习问题,通过基本线性Bellman方法对价值和策略函数进行原始对偶更新,从而更加有效地进行价值和策略更新,在与同类方法比较的测试中表现明显优于最相关的基准方法
Abstract
We develop a parameterized Primal-Dual $\pi$ Learning method based on
deep neural networks
for
markov decision process
with large state space and off-policy
→