BriefGPT.xyz
Dec, 2020
多臂赌博和强化学习中的模型选择的遗憾界平衡和消除
Regret Bound Balancing and Elimination for Model Selection in Bandits and RL
HTML
PDF
Aldo Pacchiano, Christoph Dann, Claudio Gentile, Peter Bartlett
TL;DR
该文章提出了一种简单的模型选择方法,用于解决随机赌博和强化学习问题,并通过平衡算法的候选遗憾边界,以及淘汰违反其候选边界的算法来消除算法,从而证明该方法的总遗憾由最佳候选遗憾边界的一个乘性因子限制。
Abstract
We propose a simple
model selection
approach for algorithms in
stochastic bandit
and
reinforcement learning
problems. As opposed to prior
→