Large pre-trained models have achieved great success in many natural language processing tasks. However, when they are applied in specific domains, these models suffer from domain shift and bring challenges in fine-tuning and online serving for latency and capacity constraints. In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains. This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains. Specifically, we propose domain-specific vocabulary expansion in the adaptation stage and employ corpus level occurrence probability to choose the size of incremental vocabulary automatically. Then we systematically explore different strategies to compress the large pre-trained models for specific domains. We conduct our experiments in the biomedical and computer science domain. The experimental results demonstrate that our approach achieves better performance over the BERT BASE model in domain-specific tasks while 3.3x smaller and 5.1x faster than BERT BASE. The code and pre-trained models are available at https://aka.ms/adalm.

本文介绍了一种开发特定领域小型、快速和有效的预训练模型的通用方法，该方法通过对通用预训练模型进行调整，以及在目标领域进行任务无关的知识蒸馏来实现。具体而言，在适应阶段，我们提出了领域特定词汇扩展，并使用语料库级别出现概率自动选择增量词汇表的大小。然后，我们系统地探索了压缩特定领域的大型预训练模型的不同策略。实验结果表明，我们的方法在生物医学和计算机科学领域的特定任务中表现优于BERT BASE模型，同时比BERT BASE小3.3倍，快5.1倍。

适应并蒸馏：为特定领域开发小型、快速且高效的预训练语言模型