BriefGPT.xyz
Oct, 2021
乐观策略优化在非平稳MDPs中被证明是高效的
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
HTML
PDF
Han Zhong, Zhuoran Yang, Zhaoran Wang Csaba Szepesvári
TL;DR
通过提出一种名为PROPO的算法,本文研究了非稳态线性核马尔科夫决策过程中的史诗强化学习,它是第一个可以处理非稳态的可证明有效的策略优化算法。
Abstract
We study
episodic reinforcement learning
(RL) in non-stationary
linear kernel
markov decision processes
(MDPs). In this setting, both the
→