Modern deep neural networks often require distributed training with many workers due to their large size. As worker numbers increase, communication overheads become the main bottleneck in data-parallel minibatch stochastic gradient methods with per-iteration gradient synchronization. Local gradient methods like Local SGD reduce communication by only syncing after several local steps. Despite understanding their convergence in i.i.d. and heterogeneous settings and knowing the importance of batch sizes for efficiency and generalization, optimal local batch sizes are difficult to determine. We introduce adaptive batch size strategies for local gradient methods that increase batch sizes adaptively to reduce minibatch gradient variance. We provide convergence guarantees under homogeneous data conditions and support our claims with image classification experiments, demonstrating the effectiveness of our strategies in training and generalization.

现代深度神经网络通常需要分布式训练以应对其巨大的规模，但当工作节点数量增加时，通过每次迭代梯度同步的数据并行小批量随机梯度方法中的通信开销成为主要瓶颈。本文引入了适应性批量大小策略，用于局部梯度方法，通过自适应地增加批量大小来减小小批量梯度的方差，提供了在均匀数据条件下的收敛性保证，并通过图像分类实验支持我们的说法，证明了我们的策略在训练和泛化中的有效性。

分布式本地梯度方法的通信高效自适应批量大小策略