BriefGPT.xyz
Oct, 2013
带切换成本的赌博机:T ^ {2/3} 遗憾
Bandits with Switching Costs: T^{2/3} Regret
HTML
PDF
Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres
TL;DR
本文研究的是带有动作切换代价的敌对多臂赌博机问题,证明了在该问题下玩家T回合的最小極大后悔度为~Θ(T^2/3),并研究了其他在线学习领域的开放问题,结果得到了一个多尺度随机游走的新随机化结构,该结构对如此困难的学习问题证明可能会有所帮助。
Abstract
We study the
adversarial multi-armed bandit
problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's $T$-round
minimax regret
in this setting is $\t\theta
→