Self-correction has emerged as a promising solution to boost the reasoning
performance of large language models (LLMs), where LLMs refine their solutions
using self-generated critiques that pinpoint the errors. This work explores
whether smaller-size (<= 13B) language models (LMs) have the ability of
self-correction on reasoning tasks with minimal inputs from stronger LMs. We
propose a novel pipeline that prompts smaller LMs to collect self-correction
data that supports the training of self-refinement abilities. First, we
leverage correct solutions to guide the model in critiquing their incorrect
responses. Second, the generated critiques, after filtering, are used for
supervised fine-tuning of the self-correcting reasoner through solution
refinement. Our experimental results show improved self-correction abilities of
two models on five datasets spanning math and commonsense reasoning, with
notable performance gains when paired with a strong GPT-4-based verifier,
though limitations are identified when using a weak self-verifier for
determining when to correct.

在小型语言模型上进行自我纠正训练以提高推理能力，通过使用正确解决方案引导模型对不正确的回答进行批判，并使用生成的批评经过筛选后进行自我纠正理由的监督微调，实验证明在数学和常识推理方面的五个数据集上两种模型的自我纠正能力得到了提升，与 GPT-4 基于验证器的强配对时取得了显著的性能提升，但使用弱自验证器来确定何时进行更正存在一定的限制。