BriefGPT.xyz
May, 2019
基于模型的强化学习中贪心策略的严格遗憾界
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
HTML
PDF
Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor
TL;DR
本文聚焦在有限状态有限时间的马尔科夫决策过程设置下的模型基RL,证明了探索具有贪心策略可以实现紧密的极小极大性能,从而完全避免使用full-planning,而复杂度降为S,并通过实时动态规划进行了新颖的分析。
Abstract
State-of-the-art efficient
model-based
reinforcement learning
(RL) algorithms typically act by iteratively solving empirical models, i.e., by performing \emph{full-planning} on
→