BriefGPT.xyz
May, 2023
自主驱动的语言模型从零开始的最小人工监督自我对齐
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
HTML
PDF
Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen...
TL;DR
研究提出了SELF-ALIGN方法,利用少量人工监督和结合原理驱动推理和LLM的生成能力,实现AI助手的自我对齐,减少人工监督的依赖,获得更好的性能,开发了Dromedary AI助手。
Abstract
Recent
ai-assistant agents
, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (
→