Recent improvements in text generation have leveraged human feedback to
improve the quality of the generated output. However, human feedback is not
always available, especially during inference. In this work, we propose an
inference time optimization method FITO to use fine-grained actionable feedback
in the form of error type, error location and severity level that are predicted
by a learned error pinpoint model for iterative refinement. FITO starts with an
initial output, then iteratively incorporates the feedback via a refinement
model that generates an improved output conditioned on the feedback. Given the
uncertainty of consistent refined samples at iterative steps, we formulate
iterative refinement into a local search problem and develop a simulated
annealing based algorithm that balances exploration of the search space and
optimization for output quality. We conduct experiments on three text
generation tasks, including machine translation, long-form question answering
(QA) and topical summarization. We observe 0.8 and 0.7 MetricX gain on
Chinese-English and English-German translation, 4.5 and 1.8 ROUGE-L gain at
long form QA and topic summarization respectively, with a single iteration of
refinement. With our simulated annealing algorithm, we see further quality
improvements, including up to 1.7 MetricX improvements over the baseline
approach.

使用细粒度的可行操作反馈，基于学习的错误定位模型预测的错误类型、错误位置和严重程度，提出了 FITO（一种推理时间优化方法）来进行迭代改进，通过一个生成改进输出的改进模型，迭代地结合反馈。我们在三个文本生成任务上进行了实验，包括机器翻译、长篇问答（QA）和主题摘要，在单次迭代的改进中，中英翻译和英德翻译分别观察到 0.8 和 0.7 的 MetricX 增益，问答和主题摘要分别观察到 4.5 和 1.8 的 ROUGE-L 增益。通过我们的模拟退火算法，我们看到进一步的质量改进，包括与基准方法相比高达 1.7 的 MetricX 改进。