Large Language Models have demonstrated remarkable capabilities in code generation, yet they often struggle with complex programming tasks that require deep algorithmic reasoning. While process supervision through learned reward models shows promise in guiding reasoning steps, it requires expensive training data and suffers from unreliable evaluation. We propose Outcome-Refining Process Supervision, a novel paradigm that treats outcome refinement itself as the process to be supervised. Our framework leverages concrete execution signals to ground the supervision of reasoning steps, while using tree-structured exploration to maintain multiple solution trajectories simultaneously. Experiments demonstrate that our approach enables even smaller models to achieve high success accuracy and performance metrics on competitive programming tasks, creates more reliable verification than traditional reward models without requiring training PRMs. Our approach achieves significant improvements across 5 models and 3 datasets: an average of 26.9% increase in correctness and 42.2% in efficiency. The results suggest that providing structured reasoning space with concrete verification signals is crucial for solving complex programming tasks. We open-source all our code and data at: https://github.com/zhuohaoyu/ORPS

本研究针对大型语言模型在复杂编程任务中效果不佳的问题，提出了一种新颖的结果优化过程监督框架，将结果优化作为监督过程，借助具体的执行信号来引导推理步骤，并通过树状结构探索同时维护多个解决方案轨迹。实验表明，该方法显著提高了模型的准确性和效率，对竞争性编程任务的表现取得了显著的改进，突显出提供结构化推理空间与具体验证信号的重要性。

代码生成的结果优化过程监督