TL;DR研究了如何在随机赌博机游戏中选择期望回报最高的 K 个赌臂问题,提出了一种基于概率近似正确算法,并引入了难度参数来量化问题难度。通过研究两种算法的采样复杂度,得出了更优的上界,并证明了该上界在某些情况下是紧的。同时得出了引入难度参数的实例相关算法需要额外的对数因子作为代价的下界。
Abstract
We study the problem of selecting $K$ arms with the highest expected rewards
in a stochastic $n$-armed bandit game. This problem has a wide range of
applications, e.g., A/B testing, crowdsourcing, simulation optimization. Our
goal is to develop a pac algorithm, which, with probability