精华、适应、精华：为神经机器翻译训练小型、域内模型

Mar, 2020

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

Mitchell A. Gordon, Kevin Duh

TL;DR本研究探讨了领域自适应设置下，使用序列级别知识蒸馏训练小型、内存高效的机器翻译模型的最佳实践。该研究的大规模实证结果在机器翻译领域中（使用三个语种对三个领域进行测试）表明，最佳性能需要两次平衡知识蒸馏，一次使用通用数据，另一次使用具体领域的数据并调整教师。

Abstract

We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domai