结合随机赌博机的上置信界

Dec, 2020

Upper Confidence Bounds for Combining Stochastic Bandits

Ashok Cutkosky, Abhimanyu Das, Manish Purohit

TL;DR提出一种基于元-UCB算法的简单方法，用于组合随机赌博算法，提高在劣势环境下的表现，实验结果表明算法可以在多种场景下取得与下界一致的效果，已验证线性赌博和模型选择问题的有效性。

Abstract

We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a variant of the classic UCB algorithm. Our fin