BriefGPT.xyz
Jun, 2020
除去偏见:针对对抗性赌博机和MDPs的高概率数据依赖性遗憾边界
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
HTML
PDF
Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang
TL;DR
发展了一种新的方法,使用标准无偏估计量,并依赖于简单的递增的学习速率表和对数单调自协调障碍以及加强的弗里德曼不等式,以获取高概率遗憾边界。
Abstract
We develop a new approach to obtaining high probability
regret bounds
for
online learning
with
bandit feedback
against an
→