This paper introduces a federated learning framework tailored for online combinatorial optimization with bandit feedback. In this setting, agents select subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals. Our framework transforms any offline resilient single-agent $(\alpha-\epsilon)$-approximation algorithm, having a complexity of $\tilde{\mathcal{O}}(\frac{\psi}{\epsilon^\beta})$, where the logarithm is omitted, for some function $\psi$ and constant $\beta$, into an online multi-agent algorithm with $m$ communicating agents and an $\alpha$-regret of no more than $\tilde{\mathcal{O}}(m^{-\frac{1}{3+\beta}} \psi^\frac{1}{3+\beta} T^\frac{2+\beta}{3+\beta})$. This approach not only eliminates the $\epsilon$ approximation error but also ensures sublinear growth with respect to the time horizon $T$ and demonstrates a linear speedup with an increasing number of communicating agents. Additionally, the algorithm is notably communication-efficient, requiring only a sublinear number of communication rounds, quantified as $\tilde{\mathcal{O}}\left(\psi T^\frac{\beta}{\beta+1}\right)$. Furthermore, the framework has been successfully applied to online stochastic submodular maximization using various offline algorithms, yielding the first results for both single-agent and multi-agent settings and recovering specialized single-agent theoretical guarantees. We empirically validate our approach to a stochastic data summarization problem, illustrating the effectiveness of the proposed framework, even in single-agent scenarios.

该论文介绍了一个用于在线组合优化和有限带反馈的联邦学习框架，该框架将任何具有复杂度为O(psi/epsilon^beta)（其中省略了对数计算，psi是一个函数，beta是常数）的离线单代理（alpha-epsilon）逼近算法转化为具有m个通信代理和alpha遗憾度的在线多代理算法，并保证了与时间跨度T的次线性增长，且随着通信代理数量的增加而线性加速。此外，该算法还具有高效的通信特性，只需要亚线性数量的通信轮次，通过将该框架成功应用于在线随机子模块最大化，并实现了第一个单代理和多代理设置的结果，以及恢复了专门的单代理理论保证。我们还通过对随机数据摘要问题的实证验证来展示所提出的框架的有效性，即使在单代理场景中也是如此。

联邦组合多智能体多臂赌博机