BriefGPT.xyz
Aug, 2021
超越无悔:实例相关的PAC强化学习
Beyond No Regret: Instance-Dependent PAC Reinforcement Learning
HTML
PDF
Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson
TL;DR
研究提出了一种新的针对PAC表格强化学习的实例相关样本复杂度的计算方法,并设计了一种能够达到该样本复杂度的规划算法,该算法几乎是极小值最优的,且在多个实例上展现出比最坏情况界限更显著的改进。
Abstract
The theory of
reinforcement learning
has focused on two fundamental problems: achieving
low regret
, and identifying $\epsilon$-optimal policies. While a simple reduction allows one to apply a low-regret algorithm
→