神经语言模型的域自适应权衡

Sep, 2021

The Trade-offs of Domain Adaptation for Neural Language Models

Dan Iter, David Grangier

TL;DR探讨了语言模型适应与机器学习理论的关系，研究了大型领域外训练集和小型领域内训练集之间的训练方法的优劣，提出了领域外预训练加上领域内微调比单独应用更为通用，并提出了基于数据选择的适应技术的公共框架。

Abstract

In this paper, we connect language model adaptation with concepts of machine learning theory. We consider a training setup with a large out-of-domain set and a small in-domain set. As a first contribution, we der