In this paper, we propose Wasserstein proximals of $\alpha$-divergences as suitable objective functionals for learning heavy-tailed distributions in a stable manner. First, we provide sufficient, and in some cases necessary, relations among data dimension, $\alpha$, and the decay rate of data distributions for the Wasserstein-proximal-regularized divergence to be finite. Finite-sample convergence rates for the estimation in the case of the Wasserstein-1 proximal divergences are then provided under certain tail conditions. Numerical experiments demonstrate stable learning of heavy-tailed distributions -- even those without first or second moment -- without any explicit knowledge of the tail behavior, using suitable generative models such as GANs and flow-based models related to our proposed Wasserstein-proximal-regularized $\alpha$-divergences. Heuristically, $\alpha$-divergences handle the heavy tails and Wasserstein proximals allow non-absolute continuity between distributions and control the velocities of flow-based algorithms as they learn the target distribution deep into the tails.

我们提出了Wasserstein proximals of $\alpha$-divergences作为学习重尾分布的合适目标函数，首先给出了数据维度、$\alpha$和数据分布衰减率之间的足够关系以及某些情况下的必要关系，使得Wasserstein- proximal-regularized divergence是有限的，并且在某些尾部条件下提供了Wasserstein-1 proximal divergences的有限样本收敛速度，数值实验表明了学习重尾分布的稳定性，即使是没有第一或第二时刻的分布，也可以使用适当的生成模型（如GANs和与我们提出的Wasserstein proximal-regularized $\alpha$-divergences相关的基于流的模型）来学习目标分布，启发式地，$\alpha$-divergences处理重尾，Wasserstein proximals在分布之间提供非绝对连续性，并在深入尾部学习目标分布时控制流算法的速度。

使用Wasserstein-邻近正规化的$α$-散度学习重尾分布