Large language models are typically fine-tuned to align with human preferences, but tuning large models is computationally intensive and complex. In this work, we introduce $\textit{Integrated Value Guidance}$ (IVG), a method that uses implicit and explicit value functions to guide language model decoding at token and chunk-level respectively, efficiently aligning large language models purely at inference time. This approach circumvents the complexities of direct fine-tuning and outperforms traditional methods. Empirically, we demonstrate the versatility of IVG across various tasks. In controlled sentiment generation and summarization tasks, our method significantly improves the alignment of large models using inference-time guidance from $\texttt{gpt2}$-based value functions. Moreover, in a more challenging instruction-following benchmark AlpacaEval 2.0, we show that both specifically tuned and off-the-shelf value functions greatly improve the length-controlled win rates of large models against $\texttt{gpt-4-turbo}$ (e.g., $19.51\% \rightarrow 26.51\%$ for $\texttt{Mistral-7B-Instruct-v0.2}$ and $25.58\% \rightarrow 33.75\%$ for $\texttt{Mixtral-8x7B-Instruct-v0.1}$ with Tulu guidance).

本研究针对大语言模型在对齐人类偏好时面临的计算复杂性问题，提出了一种新方法“集成价值引导”（IVG）。该方法通过在推理阶段利用隐式和显式价值函数引导语言模型解码，从而实现高效对齐，显著提升了模型在情感生成和总结任务中的表现，并在指令跟随基准测试中验证了其有效性。

集成价值引导下的推理时语言模型对齐