This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W_2$ loss). The minimization is characterized by a distribution-dependent ordinary differential equation (ODE), whose dynamics involves the Kantorovich potential between a current estimated distribution and the true data distribution. A main result shows that the time-marginal law of the ODE converges exponentially to the true data distribution. To prove that the ODE has a unique solution, we first construct explicitly a solution to the associated nonlinear Fokker-Planck equation and show that it coincides with the unique gradient flow for the $W_2$ loss. Based on this, a unique solution to the ODE is built from Trevisan's superposition principle and the exponential convergence results. An Euler scheme is proposed for the distribution-dependent ODE and it is shown to correctly recover the gradient flow for the $W_2$ loss in the limit. An algorithm is designed by following the scheme and applying persistent training, which is natural in our gradient-flow framework. In both low- and high-dimensional experiments, our algorithm converges much faster than and outperforms Wasserstein generative adversarial networks, by increasing the level of persistent training appropriately.

通过最小化二阶Wasserstein损失（即$W_2$损失），该论文处理无监督学习问题。论文证明了方式一通过分布相关的常微分方程（ODE）动力学的超限势潜力近似估计当前分布与真实数据分布之间的关系。主要结果显示ODE的时变边界概率收敛到真实数据分布。为了证明ODE具有唯一解，首先明确构造了与关联的非线性Fokker-Planck方程相关的解，并证明它与$W_2$损失的唯一梯度流相吻合。基于此，通过Trevisan的叠加原理和指数收敛结果，构建了ODE的唯一解。该论文提出了一个分布相关ODE的欧拉方案，并在极限情况下正确恢复了$W_2$损失的梯度流。通过遵循该方案和应用持久训练，设计了一个算法，其自然地适用于梯度流框架。在低维和高维实验中，我们的算法通过适当增加持久训练水平，比Wasserstein生成对抗网络收敛更快且性能更好。

通过最小化Wasserstein-2损失进行生成建模