Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends outperforming the standard pretrain-finetune approach.

最新研究表明，在两个专门的语言模型的权重之间进行插值可以以多任务学习无法实现的方式在任务之间转移知识。然而，极少有人探索过在两个以上模型之间进行插值，每个模型具有不同的知识库。本文介绍了一种称为DFWE（Derivative Free Weight-space Ensembling）的新的基于少样本的任务迁移方法，用于开放领域对话。我们的框架使用预定义的源任务集创建了一组多样的专家语言模型。然后，我们在目标任务上对每个专家模型进行微调，从多个不同的知识库中处理目标任务。最后，我们使用无梯度优化算法在模型权重之间进行线性插值，以有效地找到一个好的插值权重。我们在FETA-Friends上展示了该方法的有效性，其优于标准的预训练-微调方法。

无导数权重空间集成