For min-max optimization and variational inequalities problems (VIP)
encountered in diverse machine learning tasks, Stochastic Extragradient (SEG)
and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent
algorithms. Constant step-size variants of SEG/SGDA have gained popularity,
with appealing benefits such as easy tuning and rapid forgiveness of initial
conditions, but their convergence behaviors are more complicated even in
rudimentary bilinear models. Our work endeavors to elucidate and quantify the
probabilistic structures intrinsic to these algorithms. By recasting the
constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a
first-of-its-kind Law of Large Numbers and a Central Limit Theorem,
demonstrating that the average iterate is asymptotically normal with a unique
invariant distribution for an extensive range of monotone and non-monotone
VIPs. Specializing to convex-concave min-max optimization, we characterize the
relationship between the step-size and the induced bias with respect to the
Von-Neumann's value. Finally, we establish that Richardson-Romberg
extrapolation can improve proximity of the average iterate to the global
solution for VIPs. Our probabilistic analysis, underpinned by experiments
corroborating our theoretical discoveries, harnesses techniques from
optimization, Markov chains, and operator theory.

本研究旨在通过将常数步长随机外推算法（SEG）和随机梯度升降（SGDA）重新组合为时齐马尔科夫链来澄清并量化这些算法内在的概率结构，并证明了对于广泛的单调和非单调 VIP 而言，平均迭代数渐近地趋向于具有唯一不变分布的正态分布，从而带来了对 VIPs 的改进和理论发现验证的实验

变分不等式中的随机方法：遍历性，偏差和改进

Stochastic Methods in Variational Inequalities: Ergodicity, Bias and  Refinements

To lower the communication complexity of federated min-max learning, a
natural approach is to utilize the idea of infrequent communications (through
multiple local updates) same as in conventional federated learning. However,
due to the more complicated inter-outer problem structure in federated min-max
learning, theoretical understandings of communication complexity for federated
min-max learning with infrequent communications remain very limited in the
literature. This is particularly true for settings with non-i.i.d. datasets and
partial client participation. To address this challenge, in this paper, we
propose a new algorithmic framework called stochastic sampling averaging
gradient descent ascent (SAGDA), which i) assembles stochastic gradient
estimators from randomly sampled clients as control variates and ii) leverages
two learning rates on both server and client sides. We show that SAGDA achieves
a linear speedup in terms of both the number of clients and local update steps,
which yields an $\mathcal{O}(\epsilon^{-2})$ communication complexity that is
orders of magnitude lower than the state of the art. Interestingly, by noting
that the standard federated stochastic gradient descent ascent (FSGDA) is in
fact a control-variate-free special version of SAGDA, we immediately arrive at
an $\mathcal{O}(\epsilon^{-2})$ communication complexity result for FSGDA.
Therefore, through the lens of SAGDA, we also advance the current understanding
on communication complexity of the standard FSGDA method for federated min-max
learning.

本文提出了一种名为 SAGDA 的新算法框架，用于降低联邦 min-max 学习的通信复杂度，并在此基础上提高了对标准 FSGDA 方法通信复杂度的理解。

SAGDA: 在联邦式 Min-Max 学习中实现 O (ε^{-2}) 通信复杂度

SAGDA: Achieving $\mathcal{O}(ε^{-2})$ Communication Complexity in Federated Min-Max Learning

Local SGD is a promising approach to overcome the communication overhead in
distributed learning by reducing the synchronization frequency among worker
nodes. Despite the recent theoretical advances of local SGD in empirical risk
minimization, the efficiency of its counterpart in minimax optimization remains
unexplored. Motivated by large scale minimax learning problems, such as
adversarial robust learning and training generative adversarial networks
(GANs), we propose local Stochastic Gradient Descent Ascent (local SGDA), where
the primal and dual variables can be trained locally and averaged periodically
to significantly reduce the number of communications. We show that local SGDA
can provably optimize distributed minimax problems in both homogeneous and
heterogeneous data with reduced number of communications and establish
convergence rates under strongly-convex-strongly-concave and
nonconvex-strongly-concave settings. In addition, we propose a novel variant
local SGDA+, to solve nonconvex-nonconcave problems. We give corroborating
empirical evidence on different distributed minimax problems.

本文提出了一种名为 local SGDA 的算法来缓解分布式学习中的通信开销，可在广泛的分布式 minmax 优化问题下实现可证明的收敛性和更少的通信次数。