This paper presents a novel federated linear contextual bandits model, where individual clients face different $K$-armed stochastic bandits coupled through common global parameters. By leveraging the geometric structure of the linear rewards, a collaborative algorithm called Fed-PE is proposed to cope with the heterogeneity across clients without exchanging local feature vectors or raw data. Fed-PE relies on a novel multi-client G-optimal design, and achieves near-optimal regrets for both disjoint and shared parameter cases with logarithmic communication costs. In addition, a new concept called collinearly-dependent policies is introduced, based on which a tight minimax regret lower bound for the disjoint parameter case is derived. Experiments demonstrate the effectiveness of the proposed algorithms on both synthetic and real-world datasets.

本文提出了一种新型的联邦线性情境赌博机模型Fed-PE，并采用协作算法来处理客户端的异质性，它基于新颖的多客户G-最优设计，并通过对不相交和共享参数情况下的对数通信成本，实现了几乎最优的遗憾。此外，本文还引入了一种新的概念——共线依赖策略，并基于此导出了不相交参数情况下的严格极小化遗憾下界。实验结果表明，该算法在合成和真实数据集上均具有很好的鲁棒性。