The practical success of much of NLP depends on the availability of training data. However, in real-world scenarios, training data is often scarce, not least because many application domains are restricted and specific. In this work, we compare different methods to handle this problem and provide guidelines for building NLP applications when there is only a small amount of labeled training data available for a specific domain. While transfer learning with pre-trained language models outperforms other methods across tasks, alternatives do not perform much worse while requiring much less computational effort, thus significantly reducing monetary and environmental cost. We examine the performance tradeoffs of several such alternatives, including models that can be trained up to 175K times faster and do not require a single GPU.

本文比较了NLP中使用不同方法处理数据量不足的问题，提供了使用少量标记训练数据来构建NLP应用的指南。虽然预训练语言模型的转移学习在各种任务中都表现出色，但其他方法的性能差别不大，而且需要的计算成本更少，从而显著降低了预算和环境成本。我们研究了几种这样的替代方案的性能权衡，包括可以加速训练175K倍且不需要单个GPU的模型。

稀疏数据情景下的领域适应：不使用BERT有何收获？