多臂老虎机中的分布式探索

Nov, 2013

Distributed Exploration in Multi-Armed Bandits

Eshcar Hillel, Zohar Karnin, Tomer Koren, Ronny Lempel, Oren Somekh

TL;DR研究多臂赌博机在$k$个协作玩家中进行探索，以确定最佳手臂，结果表明协作与沟通可以实现更快的学习速度, 最佳方案是$k$倍的学习性能加速，且通信量只有$log(1/ε)$。

Abstract

We study exploration in multi-armed bandits in a setting where $k$ players collaborate in order to identify an $\epsilon$-optimal arm. Our motivation comes from recent employment of bandit algorithms in computati