Quantifying uncertainty in automatically generated text is important for letting humans check potential hallucinations and making systems more reliable. Conformal prediction is an attractive framework to provide predictions imbued with statistical guarantees, however, its application to text generation is challenging since any i.i.d. assumptions are not realistic. In this paper, we bridge this gap by leveraging recent results on non-exchangeable conformal prediction, which still ensures bounds on coverage. The result, non-exchangeable conformal nucleus sampling, is a novel extension of the conformal prediction framework to generation based on nearest neighbors. Our method can be used post-hoc for an arbitrary model without extra training and supplies token-level, calibrated prediction sets equipped with statistical guarantees. Experiments in machine translation and language modeling show encouraging results in generation quality. By also producing tighter prediction sets with good coverage, we thus give a more theoretically principled way to perform sampling with conformal guarantees.

通过利用最近对非交换式依从预测的研究结果，我们提出了一种新的统一预测框架的扩展，名为非交换式依从核心采样，用于基于最近邻的生成。我们的方法可以后处理任意模型，提供具有统计保证的标记级预测集，并且在机器翻译和语言建模实验中展示了令人鼓舞的生成质量结果。通过产生更紧的预测集并实现良好覆盖率，我们从理论上给出了一种更有原则性的具有依从保证的抽样方法。

非可交换的最近邻一致性语言生成