BriefGPT.xyz
Oct, 2016
具有一般奖励函数的组合多臂赌博机
Combinatorial Multi-Armed Bandit with General Reward Functions
HTML
PDF
Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu...
TL;DR
本文研究了随机组合多臂赌博机框架,提出了一种名为SDCB的新算法,该算法估计底层随机变量的分布和它们的随机显著性置信区间,并证明了SDCB可以实现 O(logT) 的分布相关遗憾和 $ ilde{O}(√T)$ 的分布无关遗憾,并将所得结果应用于$K$-MAX问题。
Abstract
In this paper, we study the
stochastic combinatorial multi-armed bandit
(CMAB) framework that allows a general nonlinear
reward function
, whose expected value may not depend only on the means of the input random
→