BriefGPT.xyz
Jul, 2020
泛用效用的强化学习变分策略梯度方法
Variational Policy Gradient Method for Reinforcement Learning with General Utilities
HTML
PDF
Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang
TL;DR
该研究通过引入泛函的方法,提出了一种新的策略梯度算法,用于解决马尔可夫决策问题中带通用上限效用函数的策略优化问题,并证明了其全局收敛性和收敛速度。
Abstract
In recent years,
reinforcement learning
(RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider
→