While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate all trained models with 25 probing tasks meant to reveal the specific skills that drive transfer. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best. We also observe that target task performance is strongly correlated with higher-level abilities such as coreference resolution. However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks. We also observe evidence that the forgetting of knowledge learned during pretraining may limit our analysis, highlighting the need for further work on transfer learning methods in these settings.

对预训练模型RoBERTa进行110种中介-目标任务组合的大规模研究，发现需要高级推理和推理能力的中介任务最好。目标任务的表现与核心指代消解等高级能力密切相关，需要进一步研究广覆盖的评估基准。有证据表明，预训练过程中所学知识的遗忘可能限制了我们的分析，需要进一步研究这些情况下的迁移学习方法。

预训练模型的中间任务迁移学习在自然语言理解中的应用：何时和为何有效？