Despite their increasing performance, large language models still tend to reproduce training data, generate several repetitions, and focus on the most common grammatical structures and words. A possible cause is the decoding strategy adopted: the most common ones either consider only the most probable tokens, reducing output diversity, or increase the likelihood of unlikely tokens at the cost of output accuracy and correctness. In this paper, we propose a family of three new decoding methods by leveraging a mathematical analysis of the token probability distribution. In particular, the difference between consecutive, sorted probabilities can be used to avoid incorrect tokens and increase the chance of low-probable but accurate words. Experiments concerning math problem solving, extreme summarization, and the divergent association task show that our approach consistently performs at least as well as current alternatives in terms of quality and diversity.

本研究针对大型语言模型在文本生成中重复训练数据和缺乏多样性的问题，提出了一种基于数学分析的改进解码方法。该方法通过利用连续排序概率之间的差异，增加低概率但准确词汇的生成机会，从而在多个任务中展现出优于现有方法的生成质量和多样性。

DiffSampling：提高神经文本生成的多样性和准确性