Recent progress in large language models (LLM) found chain-of-thought prompting strategies to improve the reasoning ability of LLMs by encouraging problem solving through multiple steps. Therefore, subsequent research aimed to integrate the multi-step reasoning process into the LLM itself through process rewards as feedback and achieved improvements over prompting strategies. Due to the cost of step-level annotation, some turn to outcome rewards as feedback. Aside from these training-based approaches, training-free techniques leverage frozen LLMs or external tools for feedback at each step to enhance the reasoning process. With the abundance of work in mathematics due to its logical nature, we present a survey of strategies utilizing feedback at the step and outcome levels to enhance multi-step math reasoning for LLMs. As multi-step reasoning emerges a crucial component in scaling LLMs, we hope to establish its foundation for easier understanding and empower further research.

本研究针对大型语言模型（LLM）在数学推理中的多步骤过程，主要填补了反馈整合的研究空白。通过调查不同的反馈策略，该论文提出了增强LLM推理能力的新方法，包括逐步和结果反馈的结合。研究结果表明，通过有效利用反馈，LLM的多步骤推理能力显著提升，推动了该领域的进一步研究。 

基于反馈的多步骤推理在大型语言模型数学应用中的调查