The burgeoning capabilities of large language models (LLMs) have underscored the need for alignment to ensure these models act in accordance with human values and intentions. Existing alignment frameworks present constraints either in the form of expensive human effort or high computational costs. This paper explores a promising middle ground, where we employ a weak LLM that is significantly less resource-intensive than top-tier models, yet offers more automation than purely human feedback. We present a systematic study to evaluate and understand weak LLM's ability to generate feedback for alignment. Our empirical findings demonstrate that weak LLMs can provide feedback that rivals or even exceeds that of fully human-annotated data. Our study indicates a minimized impact of model size on feedback efficacy, shedding light on a scalable and sustainable alignment strategy. To deepen our understanding of alignment under weak LLM feedback, we conduct a series of qualitative and quantitative analyses, offering novel insights into the quality discrepancies between human feedback vs. weak LLM feedback.

本研究针对大型语言模型（LLM）在对齐方面的挑战，提出了一种利用弱LLM的创新方法。实验结果显示，弱LLM能够生成与全人类标注数据相媲美甚至优于的反馈，揭示了模型规模对反馈有效性影响的最小化，从而为可扩展和可持续的对齐策略提供了新视角。

你弱的LLM秘密地是一个强大的对齐教师