解决奖励假设

Dec, 2022

Settling the Reward Hypothesis

Michael Bowling, John D. Martin, David Abel, Will Dabney

TL;DR该研究从回报假说出发，探讨了目标和目的的最大化与累积奖励信号、期望价值等方面的关系，并指出了假说成立的隐含要求。

Abstract

The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value