This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees in a federated linear contextual bandits model. For CDP, we propose a federated algorithm termed as \robin and show that it is near-optimal in terms of the number of clients $M$ and the privacy budget $\varepsilon$ by deriving nearly-matching upper and lower regret bounds when user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating that learning under user-level $(\varepsilon,\delta)$-LDP must suffer a regret blow-up factor at least {$\min\{1/\varepsilon,M\}$ or $\min\{1/\sqrt{\varepsilon},\sqrt{M}\}$} under different conditions.

本文研究联邦线性情境强化学习在用户级差分隐私下的模型，介绍了用户级中心差分隐私和本地差分隐私，并研究了学习遗憾与相应差分隐私保证之间的基本权衡。对于中心差分隐私，提出了一种联邦算法Robin，并在满足用户级差分隐私的情况下证明了其近乎最优，对于本地差分隐私，获得了一些下界，表明在不同条件下，满足用户级(ε，δ）-LDP的学习必须遭受至少min{1/ε，M}或min{1/根号下ε，根号下M}的遗憾膨胀因子。

具有用户级差分隐私的联邦线性情境赌博机