To address the communication bottleneck challenge in distributed learning,
our work introduces a novel two-stage quantization strategy designed to enhance
the communication efficiency of distributed Stochastic Gradient Descent (SGD).
The proposed method initially employs truncation to mitigate the impact of
long-tail noise, followed by a non-uniform quantization of the post-truncation
gradients based on their statistical characteristics. We provide a
comprehensive convergence analysis of the quantized distributed SGD,
establishing theoretical guarantees for its performance. Furthermore, by
minimizing the convergence error, we derive optimal closed-form solutions for
the truncation threshold and non-uniform quantization levels under given
communication constraints. Both theoretical insights and extensive experimental
evaluations demonstrate that our proposed algorithm outperforms existing
quantization schemes, striking a superior balance between communication
efficiency and convergence performance.

为了解决分布式学习中的通信瓶颈挑战，本研究引入了一种新的两阶段量化策略，旨在增强分布式随机梯度下降（SGD）的通信效率。通过截断来减轻长尾噪声的影响，随后根据梯度的统计特征进行非均匀量化。我们为量化的分布式 SGD 提供了全面的收敛性分析，为其性能提供了理论保证。此外，通过最小化收敛误差，我们推导出了在给定通信约束下的截断阈值和非均匀量化水平的最优闭式解。理论洞察力和广泛的实验评估表明，我们的算法优于现有的量化方案，在通信效率和收敛性能之间达到了更优的平衡。

分布式 SGD 的截断非均匀量化

Truncated Non-Uniform Quantization for Distributed SGD

We study the problem of machine unlearning and identify a notion of
algorithmic stability, Total Variation (TV) stability, which we argue, is
suitable for the goal of exact unlearning. For convex risk minimization
problems, we design TV-stable algorithms based on noisy Stochastic Gradient
Descent (SGD). Our key contribution is the design of corresponding efficient
unlearning algorithms, which are based on constructing a (maximal) coupling of
Markov chains for the noisy SGD procedure. To understand the trade-offs between
accuracy and unlearning efficiency, we give upper and lower bounds on excess
empirical and populations risk of TV stable algorithms for convex risk
minimization. Our techniques generalize to arbitrary non-convex functions, and
our algorithms are differentially private as well.

本文研究机器遗忘问题，并确定算法稳定性的概念 —— 总变差（TV）稳定性，通过噪声随机梯度下降（SGD）设计基于 TV 稳定算法的高效遗忘算法，为了了解准确性与遗忘效率之间的权衡，本文对凸风险最小化的 TV 稳定算法进行了上下界分析，该技术可以推广到任意非凸函数，而且算法具有差分隐私保护。

算法稳定性驱动的机器学习去加工化

Machine Unlearning via Algorithmic Stability

With the increase in the amount of data and the expansion of model scale,
distributed parallel training becomes an important and successful technique to
address the optimization challenges. Nevertheless, although distributed
stochastic gradient descent (SGD) algorithms can achieve a linear iteration
speedup, they are limited significantly in practice by the communication cost,
making it difficult to achieve a linear time speedup. In this paper, we propose
a computation and communication decoupled stochastic gradient descent
(CoCoD-SGD) algorithm to run computation and communication in parallel to
reduce the communication cost. We prove that CoCoD-SGD has a linear iteration
speedup with respect to the total computation capability of the hardware
resources. In addition, it has a lower communication complexity and better time
speedup comparing with traditional distributed SGD algorithms. Experiments on
deep neural network training demonstrate the significant improvements of
CoCoD-SGD: when training ResNet18 and VGG16 with 16 Geforce GTX 1080Ti GPUs,
CoCoD-SGD is up to 2-3$\times$ faster than traditional synchronous SGD.

本文提出了 Computation and Communication Decoupling Stochastic Gradient Descent (CoCoD-SGD) 算法，实现了计算和通信的并行处理，有效减少了通信开销，较传统分布式 SGD 算法具有更高的时间加速度，在 16 个 GPU 上的 ResNet18 和 VGG16 深度神经网络训练表现出 2-3 倍的速度提升。