This paper studies distributed online learning under Byzantine attacks. The
performance of an online learning algorithm is often characterized by
(adversarial) regret, which evaluates the quality of one-step-ahead
decision-making when an environment provides adversarial losses, and a
sublinear bound is preferred. But we prove that, even with a class of
state-of-the-art robust aggregation rules, in an adversarial environment and in
the presence of Byzantine participants, distributed online gradient descent can
only achieve a linear adversarial regret bound, which is tight. This is the
inevitable consequence of Byzantine attacks, even though we can control the
constant of the linear adversarial regret to a reasonable level. Interestingly,
when the environment is not fully adversarial so that the losses of the honest
participants are i.i.d. (independent and identically distributed), we show that
sublinear stochastic regret, in contrast to the aforementioned adversarial
regret, is possible. We develop a Byzantine-robust distributed online momentum
algorithm to attain such a sublinear stochastic regret bound. Extensive
numerical experiments corroborate our theoretical analysis.

本研究旨在研究分布式在线学习在拜占庭攻击下的表现。通过使用一类最先进的强健合并规则，我们证明，即使在存在 byzantine 参与者和对抗性环境下，分布式在线梯度下降也只能实现线性对抗性后悔界限，并且我们可以将线性对抗性后悔的常数控制在合理水平。有趣的是，当环境不是完全对抗性时，我们展示了亚线性随机后悔的可能性，采用了一种拜占庭式强健分布式在线动量算法来实现。

拜占庭容错的分布式在线学习：驯服恶意环境中的对手

Byzantine-Robust Distributed Online Learning: Taming Adversarial  Participants in An Adversarial Environment

We establish a connection between the stability of mirror descent and the
information ratio by Russo and Van Roy [2014]. Our analysis shows that mirror
descent with suitable loss estimators and exploratory distributions enjoys the
same bound on the adversarial regret as the bounds on the Bayesian regret for
information-directed sampling. Along the way, we develop the theory for
information-directed sampling and provide an efficient algorithm for
adversarial bandits for which the regret upper bound matches exactly the best
known information-theoretic upper bound.

本论文研究了镜像下降法和信息比率之间的关系，探讨了在采样信息导向时，采用合适的损失估计器和探索分布的镜像下降法和信息导向采样的贝叶斯后验遗憾上限呈现相同的下降趋势，并且本文还提供了一种有效的算法用于敌对赌博问题中，该算法的遗憾上限与信息理论上限完全匹配。

镜像下降和信息比率

Mirror Descent and the Information Ratio

We revisit the problem of solving two-player zero-sum games in the
decentralized setting. We propose a simple algorithmic framework that
simultaneously achieves the best rates for honest regret as well as adversarial
regret, and in addition resolves the open problem of removing the logarithmic
terms in convergence to the value of the game. We achieve this goal in three
steps. First, we provide a novel analysis of the optimistic mirror descent
(OMD), showing that it can be modified to guarantee fast convergence for both
honest regret and value of the game, when the players are playing
collaboratively. Second, we propose a new algorithm, dubbed as robust
optimistic mirror descent (ROMD), which attains optimal adversarial regret
without knowing the time horizon beforehand. Finally, we propose a simple
signaling scheme, which enables us to bridge OMD and ROMD to achieve the best
of both worlds. Numerical examples are presented to support our theoretical
claims and show that our non-adaptive ROMD algorithm can be competitive to OMD
with adaptive step-size selection.

本文提出了针对分散式场景中双方零和博弈问题的算法，提供了最佳的诚实遗憾和对抗遗憾率，解决了收敛到游戏价值的对数项的开放问题，并通过乐观的镜像下降算法与鲁棒的乐观镜像下降算法的信号传递方案相结合，实现了最佳结果。