BriefGPT.xyz
May, 2014
Lipschitz Bandits: 遗憾下限和最优算法
Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms
HTML
PDF
Stefan Magureanu, Richard Combes, Alexandre Proutiere
TL;DR
研究了随机多臂赌博问题中期望奖励是武器的Lipschitz函数的情况,提出了两种算法OSLB和CKL-UCB,并衍生出上限,针对连续武器集合的情况建议首先离散化行动空间再应用算法,同时也考虑到了具有类似性质的背景下文本字形赌博。
Abstract
We consider
stochastic multi-armed bandit
problems where the expected reward is a
lipschitz function
of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we deri
→