BriefGPT.xyz
Jan, 2023
强化学习中尖锐的方差相关界限:随机与确定性环境中的最佳选择
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
HTML
PDF
Runlong Zhou, Zihan Zhang, Simon S. Du
TL;DR
研究马尔可夫决策过程中方差相关的遗憾界限,提出两个新的环境范数并设计了MVP算法和参考函数算法进行模型建模和模型自由算法,得到方差相关界限的上界和下界。
Abstract
We study
variance-dependent regret bounds
for
markov decision processes
(MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit environments with low variance (e.g., enjoying consta
→