Chanwoo Park, Mingyang Liu, Kaiqing Zhang, Asuman Ozdaglar
TL;DR利用个性化和聚合两个框架解决存在异质人类反馈的增强学习中的问题并确保较高的样本效率。
Abstract
reinforcement learning from human feedback (RLHF) has been an effective
technique for aligning AI systems with human values, with remarkable successes
in fine-tuning large-language models recently. Most existing