BriefGPT.xyz
Jan, 2019
使用值函数界限在没有领域知识的情况下加强强化学习的问题相关遗憾范围
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
HTML
PDF
Andrea Zanette, Emma Brunskill
TL;DR
该研究针对有限时间段的离散马尔科夫决策问题,提出了一种算法并分析了其性能上限,得出了最先进的范围和如果环境规范小则更紧的限制,其不需要先前对应环境规范的知识,能解决经验学习中常常遇到的限制问题。
Abstract
Strong
worst-case performance
bounds for
episodic reinforcement learning
exist but fortunately in practice RL algorithms perform much better than such bounds would predict. Algorithms and theory that provide stro
→