A central question in statistical learning is to design algorithms that not only perform well on training data, but also generalize to new and unseen data. In this paper, we tackle this question by formulating a distributionally robust stochastic optimization (DRSO) problem, which seeks a solution that minimizes the worst-case expected loss over a family of distributions that are close to the empirical distribution in Wasserstein distances. We establish a connection between such Wasserstein DRSO and regularization. More precisely, we identify a broad class of loss functions, for which the Wasserstein DRSO is asymptotically equivalent to a regularization problem with a gradient-norm penalty. Such relation provides new interpretations for problems involving regularization, including a great number of statistical learning problems and discrete choice models (e.g. multinomial logit). The connection suggests a principled way to regularize high-dimensional, non-convex problems. This is demonstrated through two applications: the training of Wasserstein generative adversarial networks (WGANs) in deep learning, and learning heterogeneous consumer preferences with mixed logit choice model.

本文发展了关于Wasserstein DRO（分布鲁棒优化问题中的一种方法）变化规则的一般理论，它是一种新形式的正则化，可以处理可能不是凸的和不光滑的损失以及非欧几里得空间上的损失。通过应用我们理论中的变化规则，我们为对抗性鲁棒学习提供了新的泛化保证。

Wasserstein分布稳健优化和变差正则化