The reusability of state-of-the-art Pre-trained Language Models (PLMs) is
often limited by their generalization problem, where their performance
drastically decreases when evaluated on examples that differ from the training
dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation
arises from PLMs' reliance on spurious correlations, which work well for
frequent example types but not for general examples. To address this issue, we
propose a training approach called Mask-tuning, which integrates Masked
Language Modeling (MLM) training objectives into the fine-tuning process to
enhance PLMs' generalization. Comprehensive experiments demonstrate that
Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs'
generalization on OOD datasets while improving their performance on
in-distribution datasets. The findings suggest that Mask-tuning improves the
reusability of PLMs on unseen data, making them more practical and effective
for real-world applications.

预训练语言模型 (PLMs) 的可重用性常受到其泛化问题的限制，该问题表现为在评估与训练数据集不同的示例时，性能显著下降，被称为离群 / 未知示例。本文提出了一种名为 Mask-tuning 的训练方法，通过将掩码语言建模 (MLM) 训练目标整合到微调过程中，提高了 PLMs 的泛化能力。全面的实验证明，Mask-tuning 超越了当前最先进的技术，并增强了 PLMs 在离群数据集上的泛化能力，同时提高了它们在分布数据集上的性能。研究结果表明，Mask-tuning 改善了 PLMs 在未知数据上的可重用性，使其在实际应用中更加实用和有效。