To enhance the efficiency and practicality of federated bandit learning,
recent advances have introduced incentives to motivate communication among
clients, where a client participates only when the incentive offered by the
server outweighs its participation cost. However, existing incentive mechanisms
naively assume the clients are truthful: they all report their true cost and
thus the higher cost one participating client claims, the more the server has
to pay. Therefore, such mechanisms are vulnerable to strategic clients aiming
to optimize their own utility by misreporting. To address this issue, we
propose an incentive compatible (i.e., truthful) communication protocol, named
Truth-FedBan, where the incentive for each participant is independent of its
self-reported cost, and reporting the true cost is the only way to achieve the
best utility. More importantly, Truth-FedBan still guarantees the sub-linear
regret and communication cost without any overheads. In other words, the core
conceptual contribution of this paper is, for the first time, demonstrating the
possibility of simultaneously achieving incentive compatibility and nearly
optimal regret in federated bandit learning. Extensive numerical studies
further validate the effectiveness of our proposed solution.

通过提出名为 Truth-FedBan 的激励兼容（即真实性）通信协议，本文首次展示了在联邦赌博学习中同时实现激励兼容性和近乎最优的遗憾的可能性。大量的数值研究进一步验证了我们提出的解决方案的有效性。

激励诚实通信对于联邦赌博机的应用

Incentivized Truthful Communication for Federated Bandits

Motivated by practical needs such as large-scale learning, we study the
impact of adaptivity constraints to linear contextual bandits, a central
problem in online active learning. We consider two popular limited adaptivity
models in literature: batch learning and rare policy switches. We show that,
when the context vectors are adversarially chosen in $d$-dimensional linear
contextual bandits, the learner needs $O(d \log d \log T)$ policy switches to
achieve the minimax-optimal regret, and this is optimal up to
$\mathrm{poly}(\log d, \log \log T)$ factors; for stochastic context vectors,
even in the more restricted batch learning model, only $O(\log \log T)$ batches
are needed to achieve the optimal regret. Together with the known results in
literature, our results present a complete picture about the adaptivity
constraints in linear contextual bandits. Along the way, we propose the
distributional optimal design, a natural extension of the optimal experiment
design, and provide a both statistically and computationally efficient learning
algorithm for the problem, which may be of independent interest.

本研究对线性上下文臂、受限的适应性模型和最优遗憾进行了研究，发现在批次学习模型中只需要 O（log log T）批次进行学习，但在策略转换限制下需 O（dlogdlogT）次策略转换才能达到最优遗憾。

具有有限适应性和学习分布最优设计的线性赌臂机

Linear Bandits with Limited Adaptivity and Learning Distributional  Optimal Design

In this paper, we study the behavior of the Hedge algorithm in the online
stochastic setting. We prove that anytime Hedge with decreasing learning rate,
which is one of the simplest algorithm for the problem of prediction with
expert advice, is surprisingly both worst-case optimal and adaptive to the
easier stochastic and adversarial with a gap problems. This shows that, in
spite of its small, non-adaptive learning rate, Hedge possesses the same
optimal regret guarantee in the stochastic case as recently introduced adaptive
algorithms. Moreover, our analysis exhibits qualitative differences with other
variants of the Hedge algorithm, such as the fixed-horizon version (with
constant learning rate) and the one based on the so-called "doubling trick",
both of which fail to adapt to the easier stochastic setting. Finally, we
discuss the limitations of anytime Hedge and the improvements provided by
second-order regret bounds in the stochastic case.

研究了在线随机环境下的 Hedge 算法行为，证明了降低学习率的任何时候版本，能够同时适应较容易的随机问题和顶峰问题，并与其他变体算法的表现有质的差异，最终讨论了该算法的局限性和 Stochastic 情况下双重遗憾边界带来的改进。