Recent research in decoding methods for Natural Language Generation (NLG) tasks has shown that the traditional beam search and greedy decoding algorithms are not optimal, because model probabilities do not always align with human preferences. Stronger decoding methods, including Quality Estimation (QE) reranking and Minimum Bayes' Risk (MBR) decoding, have since been proposed to mitigate the model-perplexity-vs-quality mismatch. While these decoding methods achieve state-of-the-art performance, they are prohibitively expensive to compute. In this work, we propose MBR finetuning and QE finetuning which distill the quality gains from these decoding methods at training time, while using an efficient decoding algorithm at inference time. Using the canonical NLG task of Neural Machine Translation (NMT), we show that even with self-training, these finetuning methods significantly outperform the base model. Moreover, when using an external LLM as a teacher model, these finetuning methods outperform finetuning on human-generated references. These findings suggest new ways to leverage monolingual data to achieve improvements in model quality that are on par with, or even exceed, improvements from human-curated data, while maintaining maximum efficiency during decoding.

通过在推论阶段使用高效解码算法并在训练阶段提炼质量收益，提出了MBR（最小贝叶斯风险）微调和QE（质量评估）微调方法，通过使用自我训练模型和外部LLM（语言模型）作为教师模型，这些微调方法在自然语言生成（NLG）任务中达到了比人生成参考文献更好的结果，且能保持推论过程的高效性。

MBR和QE微调：训练时间中最佳和最昂贵解码方法的蒸馏