We demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. Originally shown to improve the perceived quality of long-form text generation, Contrastive Decoding searches for strings that maximize a weighted difference in likelihood between strong and weak models. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA 2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in addition to improvements on a collection of other tasks. Analysis suggests that Contrastive Decoding improves over existing methods by preventing some abstract reasoning errors, as well as by avoiding simpler modes such as copying sections of the input during chain-of-thought. Overall, Contrastive Decoding outperforms nucleus sampling for long-form generation and greedy decoding for reasoning tasks, making it a powerful general purpose method for generating text from language models.

利用对比解码方法生成的文本展现在各种推理任务中相比贪婪解码有着显著的提升，并在HellaSwag常识推理基准测试中胜过LLaMA 2、GPT-3.5和PaLM 2-L，在GSM8K数学词语推理基准测试中超过LLaMA 2、GPT-3.5和PaLM-540B，同时在其他任务中也有进步。分析表明，对比解码通过防止一些抽象推理错误和避免简单的复制输入部分来改善现有方法，从而在长文本生成和推理任务方面优于核心取样和贪婪解码，使其成为从语言模型生成文本的强大通用方法。

对比解码提升大型语言模型的推理能力