TL;DR本文提出了一种名为 TADA 的新颖方法,用于领域自适应,其具有模块化、参数高效和数据高效的优点,并且与完全域自适应预训练和适配器相比,具有相同或更好的性能而无需引入额外的参数或复杂的训练步骤。
Abstract
Intermediate training of pre-trained transformer-based language models on domain-specific data leads to substantial gains for downstream tasks. To increase efficiency and prevent catastrophic forgetting alleviate