Large language models (LLMs) have demonstrated outstanding performance across
various tasks, yet they still exhibit limitations such as hallucination,
unfaithful reasoning, and toxic content. One potential approach to mitigate
these issues is learning from human or external feedback (e.g. tools). In this
paper, we introduce an intrinsic self-correct reasoning framework for LLMs that
eliminates the need for human feedback, external tools, and handcraft prompts.
The proposed framework, based on a multi-step reasoning paradigm
\textbf{Le}arning from \textbf{Co}rrectness (\textsc{LeCo}), improves reasoning
performance without needing to learn from errors. This paradigm prioritizes
learning from correct reasoning steps, and a unique method to measure
confidence for each reasoning step based on generation logits. Experimental
results across various multi-step reasoning tasks demonstrate the effectiveness
of the framework in improving reasoning performance with reduced token
consumption.

利用多步骤推理方法和生成概率的置信度度量，我们提出了一种内在的自我纠正推理框架，无需人类反馈、外部工具和手工提示，在不学习错误的情况下提高大型语言模型的推理性能。实验证实了该框架在各种多步骤推理任务中改善了推理性能，同时减少了令牌的使用。