Efforts to align Large Language Models (LLMs) are mainly conducted via Reinforcement Learning from Human Feedback (RLHF) methods. However, RLHF encounters major challenges including training reward models, actor-critic engineering, and importantly, it requires access to LLM parameters. Here we introduce Aligner, a new efficient alignment paradigm that bypasses the whole RLHF process by learning the correctional residuals between the aligned and the unaligned answers. Our Aligner offers several key advantages. Firstly, it is an autoregressive seq2seq model that is trained on the query-answer-correction dataset via supervised learning; this offers a parameter-efficient alignment solution with minimal resources. Secondly, the Aligner facilitates weak-to-strong generalization; finetuning large pretrained models by Aligner's supervisory signals demonstrates strong performance boost. Thirdly, Aligner functions as a model-agnostic plug-and-play module, allowing for its direct application on different open-source and API-based models. Remarkably, Aligner-7B improves 11 different LLMs by 18% in helpfulness and 23% in harmlessness on average (GPT-4 by 26.9% and 17.5%). When finetuning (strong) Llama2-70B with (weak) Aligner-13B's supervision, we can improve Llama2 by 8.2% in helpfulness and 61.6% in harmlessness. See our dataset and code at https://aligner2024.github.io

通过强化学习从人类反馈中对齐大型语言模型的努力，介绍了一种新的高效对齐方式Aligner，通过学习对齐与未对齐答案之间的校正残差，绕过了强化学习过程，通过有监督学习在查询-答案-校正数据集上训练的自回归seq2seq模型实现了参数高效的对齐解决方案，可以将强大的预训练模型通过Aligner的监督信号进行微调，进而应用于不同的开源和API-based模型。此外，Aligner提供了很大的性能提升，如对11种不同的LLMs平均提升18％的有用性和23％的无害性（GPT-4提升26.9％和17.5％），对Llama2-70B使用Aligner-7B的监督进行微调，可以提高Llama2的有用性8.2％和无害性61.6％。

对齐器：通过弱到强的校正实现高效对齐