在线多臂赌博机的曝光公平性

Feb, 2024

Fairness of Exposure in Online Restless Multi-armed Bandits

Archit Sood, Shweta Jain, Sujit Gujar

TL;DR通过建立公平的多臂赌博机框架，考虑离线和在线情形中的不公平问题，本论文证明了算法在单次选取情况下的次线性公平后悔度，并在实证中展示了在多次选取场景中算法的良好表现。

Abstract

restless multi-armed bandits (RMABs) generalize the multi-armed bandits where each arm exhibits Markovian behavior and transitions according to their transition dynamics. Solutions to RMAB exist for both offline and online cases. However, they do not consider the distribution of pulls