BriefGPT.xyz
Jan, 2023
论学习奖励函数的脆弱性
On The Fragility of Learned Reward Functions
HTML
PDF
Lev McKinney, Yawen Duan, David Krueger, Adam Gleave
TL;DR
本文研究了基于奖励学习的优化过程中,由于训练数据集的变化或奖励模型的设计问题导致重新学习变得困难的问题,强调了需要在文献中加入更多的基于重新训练的评估方法。
Abstract
Reward functions are notoriously difficult to specify, especially for tasks with complex goals.
reward learning
approaches attempt to infer reward functions from
human feedback
and preferences. Prior works on
→