BriefGPT.xyz
Jun, 2016
基于oracle的对抗性情境赌博算法的改进遗憾界
Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
HTML
PDF
Vasilis Syrgkanis, Haipeng Luo, Akshay Krishnamurthy, Robert E. Schapire
TL;DR
提出了一种基于oracle的算法来应对敌对情境下的赌博问题,该算法在访问离线优化Oracle并且享有$O((KT)^{\frac{2}{3}}(\log N)^{\frac{1}{3}})$的遗憾度的情况下是计算有效的,其中K是操作的数量,T是迭代次数,N是基线策略的数量。
Abstract
We give an
oracle-based algorithm
for the
adversarial contextual bandit
problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially
→