并发赌博机与认知无线电网络

Apr, 2014

Concurrent bandits and cognitive radio networks

Orly Avner, Shie Mannor

TL;DR提出一种结合 epsilon-greedy 学习规则和避碰机制的算法，用于解决多用户共享多臂赌博问题，应用于认知无线电网络中，实验证明相比其他算法，该算法在此环境中可以显著提高性能，并取得次线性遗憾。

Abstract

We consider the problem of multiple users targeting the arms of a single multi-armed stochastic bandit. The motivation for this problem comes from cognitive radio networks, where selfish users need to coexist wit