Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

大型语言模型在机器翻译方面表现出良好的性能，但是使用监督微调的方式仍存在一些问题，本研究引入了对比优选优化 (CPO) 方法来改进性能。通过将 CPO 应用于 ALMA 模型，可以在限定的数据和参数规模下达到与竞赛获胜者及 GPT-4 相当甚至超过其性能的 ALMA-R 模型。

对比型偏好优化：推动机器翻译中LLM性能的边界