Large language models (LLMs) exhibit impressive language understanding and
in-context learning abilities including natural language processing (NLP) tasks
and challenging mathematical reasoning. However, due to the lack of
process-supervision, applying PLMs to mathematical reasoning tasks often fail
to generate correct reasoning steps and final answer even though solutions have
high probabilities. To unleash the mathematical reasoning of finetuned-LLMs
without any further fineutuning steps, we propose a method to endow LLMs with
immediate reaction and delicate reasoning system via Monte Carlo Tree
Search(MCTS) and a light energy function to rank the decision steps. In
particular, We first re-formalize the finetuned-LLMs to a Residual-based Energy
Model~(Residual-EBM) and apply noise contrastive estimation to estimate the
parameters of energy function . Then we use MCTS with energy function as path
verifier to search the output space and evaluating the reasoning path. Through
extensive experiments on two mathematical reasoning benchmarks, namely GSM8k
and MATH, we reveal the extraordinary capabilities of our method that improve
the pass@1 of the finetuned-model without further finetuning or RLHF alignment
by a substantial margin.

通过使用蒙特卡洛树搜索和轻能量函数，我们对经过微调的大型语言模型进行了改进，提高了数学推理的正确性和步骤，从而在不需要进一步微调或 RLHF 对齐的情况下，将经过微调的模型的一次通过率显著提高。