Aligning Large Language Models (LLMs) with human intentions and values is
crucial yet challenging. Current methods primarily rely on human preferences,
which are costly and insufficient in capturing nuanced feedback expressed in
natural language. In this paper, we present Self-Refinement Tuning (SRT), a
method that leverages model feedback for alignment, thereby reducing reliance
on human annotations. SRT uses a base language model (e.g., Tulu2) to generate
initial responses, which are critiqued and refined by a more advanced model
(e.g., GPT-4-Turbo). This process enables the base model to self-evaluate and
improve its outputs, facilitating continuous learning. SRT further optimizes
the model by learning from its self-generated feedback and refinements,
creating a feedback loop that promotes model improvement. Our empirical
evaluations demonstrate that SRT significantly outperforms strong baselines
across diverse tasks and model sizes. When applied to a 70B parameter model,
SRT increases the win rate from 9.6\% to 25.8\% on the AlpacaEval 2.0
benchmark, surpassing well-established systems such as GPT-4-0314, Claude 2,
and Gemini. Our analysis highlights the crucial role of language feedback in
the success of SRT, suggesting potential for further exploration in this
direction.

本文提出了自我优化调整（SRT）方法，利用模型反馈来对齐大规模语言模型（LLMs），减少对人类注释的依赖，实现基础模型的自我评估和输出改进，从而促进持续学习。经验证明，SRT 在不同任务和模型大小上明显优于强基线模型，特别是在 AlpacaEval 2.0 基准测试上，对于 70B 参数模型，胜率从 9.6% 提升至 25.8%，超过 GPT-4-0314、Claude 2 和 Gemini 等已建立的系统，语言反馈在 SRT 的成功中起着关键作用。