As the adoption of federated learning increases for learning from sensitive data local to user devices, it is natural to ask if the learning can be done using implicit signals generated as users interact with the applications of interest, rather than requiring access to explicit labels which can be difficult to acquire in many tasks. We approach such problems with the framework of federated contextual bandits, and develop variants of prominent contextual bandit algorithms from the centralized seting for the federated setting. We carefully evaluate these algorithms in a range of scenarios simulated using publicly available datasets. Our simulations model typical setups encountered in the real-world, such as various misalignments between an initial pre-trained model and the subsequent user interactions due to non-stationarity in the data and/or heterogeneity across clients. Our experiments reveal the surprising effectiveness of the simple and commonly used softmax heuristic in balancing the well-know exploration-exploitation tradeoff across the breadth of our settings.

在联邦学习中，研究通过用户与感兴趣的应用程序交互产生的隐式信号，而非要求访问难以获取的显式标签的情况下，采用联合上下文强化学习框架来开发各种中心化环境下主要上下文强化学习算法的变体，并在公开可用数据集的一系列场景中仔细评估这些算法。我们的实验显示，简单且常用的 softmax启发式方法可以在多种设置下平衡已知的探索和开发之间的权衡。

联邦多臂赌博算法的实证评估