BriefGPT.xyz
Mar, 2018
软-鲁棒的演员-评论家策略梯度算法
Soft-Robust Actor-Critic Policy-Gradient
HTML
PDF
Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
TL;DR
本文提出了一种基于 Soft-Robust Actor-Critic 算法的 Robust Reinforcement Learning 方法,能够学习针对不确定性模型的最优策略且避免过于保守,实验证明其收敛性和高效性。
Abstract
robust reinforcement learning
aims to derive an optimal behavior that accounts for
model uncertainty
in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust
→