We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting. Despite the recent advances in communication-efficient distributed bandit learning, existing solutions are restricted to simple models like multi-armed bandits and linear bandits, which hamper their practical utility. In this paper, instead of assuming the existence of a linear reward mapping from the features to the expected rewards, we consider non-linear reward mappings, by letting agents collaboratively search in a reproducing kernel Hilbert space (RKHS). This introduces significant challenges in communication efficiency as distributed kernel learning requires the transfer of raw data, leading to a communication cost that grows linearly w.r.t. time horizon $T$. We addresses this issue by equipping all agents to communicate via a common Nystr\"{o}m embedding that gets updated adaptively as more data points are collected. We rigorously proved that our algorithm can attain sub-linear rate in both regret and communication cost.

本文针对分布式学习环境下学习核化上下文赌博机问题的通信效率挑战，提出了一种基于Nyström嵌入的解决方案，可以在保证通信效率的同时，实现对非线性映射的学习。通过严谨的证明，证明了该算法在遗憾和通信成本方面可以获得次线性速率。

基于核的上下文臂机的通讯高效分布式学习