BriefGPT.xyz
Oct, 2023
CycleAlign:迭代从黑盒模型到白盒模型的蒸馏,用于更好的人类对齐
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
HTML
PDF
Jixiang Hong, Quan Tu, Changyu Chen, Xing Gao, Ji Zhang...
TL;DR
通过CycleAlign框架,将白盒模型和黑盒模型在低资源情况下有效对齐,通过多次迭代相互作用,动态更新上下文演示,提高黑盒模型的偏好排序能力,实现与人类价值的最先进对齐性能。
Abstract
language models
trained on large-scale corpus often generate content that is harmful, toxic, or contrary to human preferences, making their
alignment
with human values a critical concern.
→