Jacek Karwowski, Oliver Hayman, Xingjian Bai, Klaus Kiendlhofer, Charlie Griffin...
TL;DR奖励函数、古哈特法则、优化、提前终止方法和强化学习是本研究的关键词汇和主题。
Abstract
Implementing a reward function that perfectly captures a complex task in the real world is impractical. As a result, it is often appropriate to think of the reward function as a proxy for the true objective rathe