Transfer learning from large language models (LLMs) has emerged as a powerful
technique to enable knowledge-based fine-tuning for a number of tasks,
adaptation of models for different domains and even languages. However, it
remains an open question, if and when transfer learning will work, i.e. leading
to positive or negative transfer. In this paper, we analyze the knowledge
transfer across three natural language processing (NLP) tasks - text
classification, sentimental analysis, and sentence similarity, using three LLMs
- BERT, RoBERTa, and XLNet - and analyzing their performance, by fine-tuning on
target datasets for domain and cross-lingual adaptation tasks, with and without
an intermediate task training on a larger dataset. Our experiments showed that
fine-tuning without an intermediate task training can lead to a better
performance for most tasks, while more generalized tasks might necessitate a
preceding intermediate task training step. We hope that this work will act as a
guide on transfer learning to NLP practitioners.

本文分析了使用三种大型语言模型 BERT、RoBERTa 和 XLNet 在文本分类、情感分析、句子相似度三个自然语言处理任务上进行领域内和跨语言适应的迁移学习，并发现大多数任务直接进行微调而不进行中间任务训练可以获得更好的性能，而更广义的任务可能需要先进行中间任务训练。该工作有望成为 NLP 实践者进行迁移学习的指南。

中间任务训练对于域自适应和跨语言迁移学习的（无）效性

The (In)Effectiveness of Intermediate Task Training For Domain Adaptation and Cross-Lingual Transfer Learning

While pretrained models such as BERT have shown large gains across natural
language understanding tasks, their performance can be improved by further
training the model on a data-rich intermediate task, before fine-tuning it on a
target task. However, it is still poorly understood when and why
intermediate-task training is beneficial for a given target task. To
investigate this, we perform a large-scale study on the pretrained RoBERTa
model with 110 intermediate-target task combinations. We further evaluate all
trained models with 25 probing tasks meant to reveal the specific skills that
drive transfer. We observe that intermediate tasks requiring high-level
inference and reasoning abilities tend to work best. We also observe that
target task performance is strongly correlated with higher-level abilities such
as coreference resolution. However, we fail to observe more granular
correlations between probing and target task performance, highlighting the need
for further work on broad-coverage probing benchmarks. We also observe evidence
that the forgetting of knowledge learned during pretraining may limit our
analysis, highlighting the need for further work on transfer learning methods
in these settings.

对预训练模型 RoBERTa 进行 110 种中介 - 目标任务组合的大规模研究，发现需要高级推理和推理能力的中介任务最好。目标任务的表现与核心指代消解等高级能力密切相关，需要进一步研究广覆盖的评估基准。有证据表明，预训练过程中所学知识的遗忘可能限制了我们的分析，需要进一步研究这些情况下的迁移学习方法。