We study fairness within the stochastic, \emph{multi-armed bandit} (MAB) decision making framework. We adapt the fairness framework of "treating similar individuals similarly" to this setting. Here, an `individual' corresponds to an arm and two arms are `similar' if they have a similar quality distribution. First, we adopt a {\em smoothness constraint} that if two arms have a similar quality distribution then the probability of selecting each arm should be similar. In addition, we define the {\em fairness regret}, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization. We show that a variation on Thompson sampling satisfies smooth fairness for total variation distance, and give an $\tilde{O}((kT)^{2/3})$ bound on fairness regret. This complements prior work, which protects an on-average better arm from being less favored. We also explain how to extend our algorithm to the dueling bandit setting.

本研究探讨在随机的多臂老虎机决策框架下的公平性问题，采用“相似个体应受到相似对待”的公平性框架，使用平滑度约束和公平性遗憾度量实现公平性，研究表明 Thompson sampling 等算法可以实现平滑公平性，且在公平性遗憾上有$	ilde{O}((kT)^{2/3})$的上界。

汇集式赌博机中的校准公正