Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.

本研究针对大型语言模型（LLMs）与人类偏好对齐中的方法复杂性和研究分散性问题，提出了一种统一的框架，通过将现有的偏好学习策略分解为模型、数据、反馈和算法四个组件，深入分析现有的对齐算法。此研究不仅增进了对不同策略之间关系的理解，也为未来的研究提供了新的方向，促进了跨方法的优势互补。

面向大型语言模型的偏好学习统一视角：一项综述