TL;DR本篇论文研究了解决上下文线性赌博机问题的隐私学习算法,其中采用联合差分隐私的定义将经典的线性 - UCB 算法转换成联合差分隐私算法,并在其中使用高斯噪声或 Wishart 噪声,使结果算法的遗憾得到了限制。此外,还给出了任何 MAB 问题私有算法必须产生的额外遗憾的第一个下限。
Abstract
We study the contextual linear bandit problem, a version of the standard
stochastic multi-armed bandit (MAB) problem where a learner sequentially
selects actions to maximize a reward which depends also on a user provided
per-round context. Though the context is chosen arbitrarily or ad