Recently, learning a model that generalizes well on out-of-distribution (OOD) data has attracted great attention in the machine learning community. In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data. Inspired by previous findings that adversarial training helps improve input-robustness, we theoretically show that adversarially trained models have converged excess risk on OOD data, and empirically verify it on both image classification and natural language understanding tasks. Besides, in the paradigm of first pre-training and then fine-tuning, we theoretically show that a pre-trained model that is more robust to input perturbation provides a better initialization for generalization on downstream OOD data. Empirically, after fine-tuning, this better-initialized model from adversarial pre-training also has better OOD generalization.

本文利用Wasserstein距离定义了out-of-distribution（OOD）一般化，理论上证明对输入扰动具有鲁棒性的模型可以在OOD数据上一般化；在图像分类和自然语言理解任务上进行了实证验证，并进一步理论证明了在预训练和微调范式中，更具扰动输入鲁棒性的预训练模型可以更好地初始化在下游OOD数据的泛化，实验证明在经过微调后，这种通过对抗训练预训练的更好初始化的模型也有更好的OOD一般化。

通过对抗训练和预训练改进OOD泛化