This work considers the problem of decentralized online learning, where the
goal is to track the optimum of the sum of time-varying functions, distributed
across several nodes in a network. The local availability of the functions and
their gradients necessitates coordination and consensus among the nodes. We put
forth the Generalized Gradient Tracking (GGT) framework that unifies a number
of existing approaches, including the state-of-the-art ones. The performance of
the proposed GGT algorithm is theoretically analyzed using a novel semidefinite
programming-based analysis that yields the desired regret bounds under very
general conditions and without requiring the gradient boundedness assumption.
The results are applicable to the special cases of GGT, which include various
state-of-the-art algorithms as well as new dynamic versions of various
classical decentralized algorithms. To further minimize the regret, we consider
a condensed version of GGT with only four free parameters. A procedure for
offline tuning of these parameters using only the problem parameters is also
detailed. The resulting optimized GGT (oGGT) algorithm not only achieves
improved dynamic regret bounds, but also outperforms all state-of-the-art
algorithms on both synthetic and real-world datasets.

本文提出了一种基于广义梯度跟踪（GGT）框架的去中心化在线学习算法，并使用新的半定编程分析理论对其性能进行了理论分析和优化，进而得到实际数据集上的优异表现。

分布式在线学习的优化梯度跟踪

Optimized Gradient Tracking for Decentralized Online Learning

We introduce data-driven decision-making algorithms that achieve
state-of-the-art \emph{dynamic regret} bounds for non-stationary bandit
settings. These settings capture applications such as advertisement allocation,
dynamic pricing, and traffic network routing in changing environments. We show
how the difficulty posed by the (unknown \emph{a priori} and possibly
adversarial) non-stationarity can be overcome by an unconventional marriage
between stochastic and adversarial bandit learning algorithms. Our main
contribution is a general algorithmic recipe for a wide variety of
non-stationary bandit problems. Specifically, we design and analyze the sliding
window-upper confidence bound algorithm that achieves the optimal dynamic
regret bound for each of the settings when we know the respective underlying
\emph{variation budget}, which quantifies the total amount of temporal
variation of the latent environments. Boosted by the novel bandit-over-bandit
framework that adapts to the latent changes, we can further enjoy the (nearly)
optimal dynamic regret bounds in a (surprisingly) parameter-free manner. In
addition to the classical exploration-exploitation trade-off, our algorithms
leverage the power of the "forgetting principle" in the learning processes,
which is vital in changing environments. Our extensive numerical experiments on
both synthetic and real world online auto-loan datasets show that our proposed
algorithms achieve superior empirical performance compared to existing
algorithms.

介绍针对非静态赌博机环境的最新数据驱动决策算法，采用了随机和对手式学习算法的非传统结合方法，通过滑动窗口 - 置信界算法，针对各种非静态赌博机问题实现了最优动态遗憾边界，并通过数字实验验证了算法的超越性能。