Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set of (sub)-gradient flow equations as the step decreases. Under stricter assumptions, we show a much stronger convergence result for noised and projected SGD schemes, namely that the long-run limits of the trajectories approach a set of generalised critical points of the loss function.

优化传输（Optimal Transport）近年来引发了广泛兴趣，尤其是由于Wasserstein距离的提出，该距离提供了一种几何上合理且直观的比较概率测度的方式。为了解决计算问题，引入了切片Wasserstein（SW）距离作为Wasserstein距离的替代方法，并在训练生成型神经网络（NNs）中得到应用。本文旨在弥补对于这一观察结果没有理论保证的空白，通过利用Bianchi等人（2022）关于SGD在非光滑和非凸函数上收敛性的最新工作，提供了SW loss函数对NN参数收敛的现实背景。具体而言，我们展示了随着步长的减小，这些轨迹逐渐接近（亚）梯度流方程的集合。在更严格的假设下，我们证明了一种更强的收敛结果，即轨迹的长期极限逼近损失函数的广义驻点集合。

用切片瓦烏希斯坦損失函數訓練神經網絡的 SGD 收斂