实现随机多臂赌博问题的公正性

May, 2019

Stochastic Multi-armed Bandits with Arm-specific Fairness Guarantees

Vishakha Patil, Ganesh Ghalme, Vineet Nair, Y. Narahari

TL;DR研究公平多臂老虎机问题中学习与公平之间的相互作用，通过特定向量表示公平性约束，定义一个公平感知的后悔，通过两个参数刻画一个 Fair-SMAB 算法类，并提供一个公平保证，无论学习算法的选择是什么，都可以持续地适用。

Abstract

We study an interesting variant of the stochastic multi-armed bandit problem in which each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between l