BriefGPT.xyz
Dec, 2022
解决奖励假设
Settling the Reward Hypothesis
HTML
PDF
Michael Bowling, John D. Martin, David Abel, Will Dabney
TL;DR
该研究从回报假说出发,探讨了目标和目的的最大化与累积奖励信号、期望价值等方面的关系,并指出了假说成立的隐含要求。
Abstract
The
reward hypothesis
posits that, "all of what we mean by
goals
and purposes can be well thought of as maximization of the
expected value
→