区域多臂赌博机

Feb, 2018

Regional Multi-Armed Bandits

Zhiyang Wang, Ruida Zhou, Cong Shen

TL;DR本文研究了一种多臂赌博机问题变体，其中每个机械臂的期望奖励是未知参数的函数，并且将机械臂分成不同的组，我们提出了一种有效的算法UCB-g来解决该问题，并证明该算法最优性，并针对非静态环境提出了扩展算法SW-UCB-g。

Abstract

We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when the player selects an arm at each time slot, in